Algorithms and Data Structures

Algorithms and Data Structures
With Applications to Graphics and Geometry

This book is licensed under a Creative Commons Attribution 3.0 License
Algorithms and Data Structures
With Applications to Graphics and Geometry
Jurg Nievergelt
Klaus Hinrichs
Copyright 2011 urg !ievergelt
"ditor#$n#Chie%& urg !ievergelt
Associate "ditor& 'arisa (re)el *lrich
"ditorial Assistants& on (urden+ Tessa ,reenlea%+ -ristyna 'auch .elph+ "rnesto .errano
/or any 0uestions about this te)t+ please email& dre)el1uga.edu
The ,lobal Te)t 2ro3ect is %unded by the acobs /oundation+ 4urich+ .5it6erland.
This book is licensed under a Creative Commons Attribution 3.0 License.
Algorithms and Data Structures 2 A ,lobal Te)t
Table of ontents
2art $& 2rogramming environments %or motion+ graphics+ and geometry................................. 7
1. 8educing a task to given primitives& programming motion................................................... 9
A robot car+ its capabilities+ and the task to be per%ormed.............................................................................. 9
:all#%ollo5ing algorithm described in%ormally............................................................................................. 10
Algorithm speci%ied in a high#level language.................................................................................................. 11
Algorithm programmed in the robot;s language............................................................................................ 12
The robot;s program optimi6ed...................................................................................................................... 12
2. ,raphics primitives and environments.................................................................................1<
Turtle graphics& a basic environment............................................................................................................. 1<
=uick(ra5& a graphics toolbo) ...................................................................................................................... 1>
A graphics %rame program.............................................................................................................................. 19
3. Algorithm animation............................................................................................................. 2<
Computer#driven visuali6ation& characteristics and techni0ues................................................................... 2<
A gallery o% algorithm snapshots.................................................................................................................... 27
2art $$& 2rogramming concepts& beyond notation....................................................................33
<. Algorithms and programs as literature& substance and %orm.............................................. 3<
2rogramming in the large versus programming in the small........................................................................ 3<
(ocumentation versus literature& is it meant to be read?.............................................................................. 3@
2ascal and its dialects& lingua %ranca o% computer science............................................................................ <0
@. (ivide#and#con0uer and recursion....................................................................................... <@
An algorithmic principle................................................................................................................................. <@
(ivide#and#con0uer e)pressed as a diagram& merge sort............................................................................. <>
8ecursively de%ined trees................................................................................................................................ <7
8ecursive tree traversal.................................................................................................................................. <9
8ecursion versus iteration& the To5er o% Aanoi............................................................................................ @0
The %lag o% Al%anumerica& an algorithmic novel on iteration and recursion................................................. @2
>. .ynta)..................................................................................................................................... @3
.ynta) and semantics..................................................................................................................................... @3
,rammars and their representation& synta) diagrams and "B!/................................................................ @<
An overly simple synta) %or simple e)pressions............................................................................................. @7
2arenthesis#%ree notation %or arithmetic e)pressions.................................................................................... @9
7. .ynta) analysis.......................................................................................................................>2
The role o% synta) analysis.............................................................................................................................. >2
.ynta) analysis o% parenthesis#%ree e)pressions by counting........................................................................ >3
Analysis by recursive descent......................................................................................................................... ><
Turning synta) diagrams into a parser.......................................................................................................... >@
2art $$$& Cb3ects+ algorithms+ programs....................................................................................>7
D. Truth values+ the data type ;set;+ and bit acrobatics............................................................. >9
Bits and boolean %unctions............................................................................................................................. >9
.5apping and crossovers& the versatile e)clusive#or..................................................................................... 70
The bit sum or Epopulation countE.................................................................................................................. 71
9. Crdered sets........................................................................................................................... 7D
.e0uential search............................................................................................................................................ 7D
Binary search.................................................................................................................................................. 79
$n#place permutation...................................................................................................................................... D2
10. .trings ..................................................................................................................................D7
8ecogni6ing a pattern consisting o% a single string........................................................................................ D7
8ecogni6ing a set o% strings& a %inite#state#machine interpreter................................................................... DD
11. 'atrices and graphs& transitive closure...............................................................................93
3
2aths in a graph.............................................................................................................................................. 93
Boolean matri) multiplication....................................................................................................................... 9<
:arshall;s algorithm....................................................................................................................................... 9@
'inimum spanning tree in a graph................................................................................................................ 97
12. $ntegers...............................................................................................................................100
Cperations on integers................................................................................................................................. 100
The "uclidean algorithm.............................................................................................................................. 102
The prime number sieve o% "ratosthenes..................................................................................................... 103
Large integers............................................................................................................................................... 10<
'odular number systems& the poor man;s large integers............................................................................ 10@
8andom numbers.......................................................................................................................................... 107
13. 8eals.................................................................................................................................... 110
/loating#point numbers................................................................................................................................ 110
.ome dangers................................................................................................................................................. 112
Aorner;s method............................................................................................................................................ 113
Bisection........................................................................................................................................................ 11<
!e5ton;s method %or computing the s0uare root......................................................................................... 11@
1<. .traight lines and circles.....................................................................................................119
$ntersection.................................................................................................................................................... 119
Clipping......................................................................................................................................................... 122
(ra5ing digiti6ed lines................................................................................................................................. 123
The riddle o% the braiding straight lines....................................................................................................... 12>
(igiti6ed circles ............................................................................................................................................. 131
2art $F& Comple)ity o% problems and algorithms................................................................... 13<
1@. Computability and comple)ity........................................................................................... 13@
'odels o% computation& the ultimate 8$.C.................................................................................................. 13@
Almost nothing is computable...................................................................................................................... 13D
The halting problem is undecidable............................................................................................................. 139
Computable+ yet unkno5n............................................................................................................................ 1<0
'ultiplication o% comple) numbers............................................................................................................. 1<2
Comple)ity o% matri) multiplication............................................................................................................ 1<2
1>. The mathematics o% algorithm analysis.............................................................................1<>
,ro5th rates and orders o% magnitude......................................................................................................... 1<>
Asymptotics................................................................................................................................................... 1<7
.ummation %ormulas.................................................................................................................................... 1<D
8ecurrence relations..................................................................................................................................... 1@0
Asymptotic per%ormance o% divide#and#con0uer algorithms....................................................................... 1@3
2ermutations................................................................................................................................................. 1@<
Trees.............................................................................................................................................................. 1@@
17. .orting and its comple)ity..................................................................................................1@D
:hat is sorting? Ao5 di%%icult is it?............................................................................................................. 1@D
Types o% sorting algorithms.......................................................................................................................... 1>0
.imple sorting algorithms that 5ork in time GHn2I..................................................................................... 1>3
A lo5er bound JHn K log nI............................................................................................................................ 1>@
=uicksort....................................................................................................................................................... 1>>
Analysis %or three cases& best+ EtypicalE+ and 5orst...................................................................................... 1>9
$s it possible to sort in linear time?............................................................................................................... 17<
.orting net5orks........................................................................................................................................... 17<
2art F& (ata structures.............................................................................................................179
1D. :hat is a data structure?................................................................................................... 1D0
(ata structures old and ne5......................................................................................................................... 1D0
Algorithms and Data Structures < A ,lobal Te)t
The range o% data structures studied............................................................................................................ 1D1
2er%ormance criteria and measures.............................................................................................................. 1D2
19. Abstract data types............................................................................................................. 1D<
Concepts& :hat and 5hy?............................................................................................................................ 1D<
.tack.............................................................................................................................................................. 1D@
/irst#in#%irst#out 0ueue................................................................................................................................. 1D9
2riority 0ueue............................................................................................................................................... 190
(ictionary...................................................................................................................................................... 191
20. $mplicit data structures..................................................................................................... 19>
:hat is an implicit data structure?.............................................................................................................. 19>
Array storage................................................................................................................................................. 197
$mplementation o% the %i)ed#length %i%o 0ueue as a circular bu%%er............................................................ 202
$mplementation o% the %i)ed#length priority 0ueue as a heap..................................................................... 20@
Aeapsort ...................................................................................................................................................... 209
21. List structures..................................................................................................................... 211
Lists+ memory management+ pointer variables ............................................................................................ 211
The %i%o 0ueue implemented as a one#5ay list ............................................................................................ 21<
Tree traversal................................................................................................................................................ 21<
Binary search trees....................................................................................................................................... 223
Aeight#balanced trees.................................................................................................................................. 22D
22. Address computation ........................................................................................................239
Concepts and terminology............................................................................................................................ 239
The special case o% small key domains ........................................................................................................ 2<0
The special case o% per%ect hashing& table contents kno5n a priori ............................................................ 2<1
Conventional hash tables& collision resolution ........................................................................................... 2<2
Choice o% hash %unction& randomi6ation...................................................................................................... 2<>
2er%ormance analysis .................................................................................................................................. 2<D
")tendible hashing ...................................................................................................................................... 2<9
A virtual radi) tree& order#preserving e)tendible hashing.......................................................................... 2@1
23. 'etric data structures....................................................................................................... 2@<
Crgani6ing the embedding space versus organi6ing its contents................................................................ 2@<
8adi) trees+ tries .......................................................................................................................................... 2@@
=uadtrees and octtrees ................................................................................................................................ 2@@
.patial data structures& ob3ectives and constraints...................................................................................... 2@7
The grid %ile................................................................................................................................................... 2@9
.imple geometric ob3ects and their parameter spaces................................................................................ 2>3
8egion 0ueries o% arbitrary shape................................................................................................................ 2><
"valuating region 0ueries 5ith a grid %ile.................................................................................................... 2>7
$nteraction bet5een 0uery processing and data access............................................................................... 2>7
2art F$& $nteraction bet5een algorithms and data structures& case studies in geometric
computation................................................................................................................................271
2<. .ample problems and algorithms..................................................................................... 272
,eometry and geometric computation......................................................................................................... 272
Conve) hull& a multitude o% algorithms........................................................................................................ 273
The uses o% conve)ity& basic operations on polygons................................................................................... 277
Fisibility in the plane& a simple algorithm 5hose analysis is not................................................................ 279
2@. 2lane#s5eep& a general#purpose algorithm %or t5o#dimensional problems illustrated using
line segment intersection.......................................................................................................... 2D>
The line segment intersection test............................................................................................................... 2D>
The skeleton& Turning a space dimension into a time dimension.............................................................. 2DD
(ata structures............................................................................................................................................. 2DD
@
*pdating the y#table and detecting an intersection.................................................................................... 2D9
.5eeping across intersections ..................................................................................................................... 290
(egenerate con%igurations+ numerical errors+ robustness........................................................................... 291
2>. The closest pair.................................................................................................................. 293
The problem.................................................................................................................................................. 293
2lane#s5eep applied to the closest pair problem......................................................................................... 29<
$mplementation............................................................................................................................................ 29@
Analysis......................................................................................................................................................... 297
.5eeping in three or more dimensions....................................................................................................... 29D
Algorithms and Data Structures > A ,lobal Te)t
!art "# !rogramming
environments for motion$
graphics$ and geometry
2art $ o% this te)t book 5ill discuss&
simple programming environments
program design
in%ormal versus %ormal notations
reducing a solution to primitive operations+ and programming as an activity independent o% language.
The purpose o% an arti%icial programming environment
A program can be designed 5ith the barest o% tools+ paper and pencil+ or in the programmer;s head. $n the realm
o% such in%ormal environments+ a program design may contain vague concepts e)pressed in an in%ormal notation.
Be%ore he or she can e)ecute this program+ the programmer needs a programming environment+ typically a
comple) system 5ith many distinct components& a computer and its operating system+ utilities+ and program
librariesL te)t and program editorsL various programming languages and their processors. .uch real programming
environments %orce programmers to e)press themselves in %ormal notations.
Programming is the reali6ation o% a solution to a problem+ e)pressed in terms o% those operations provided by a
given programming environment. 'ost programmers 5ork in environments that provide very po5er%ul operations
and tools.
The more po5er%ul a programming environment+ the simpler the programming task+ at least to the e)pert 5ho
has achieved mastery o% this environment. "ven an e)perienced programmer may need several months to master a
ne5 programming environment+ and a novice may give up in %rustration at the multitude o% concepts and details he
or she must understand be%ore 5riting the simplest program.
The simpler a programming environment+ the easier it is to 5rite and run small programs+ and the more 5ork it
is to 5rite substantial+ use%ul programs. $n the early days o% computing+ be%ore the proli%eration o% programming
languages during the 19>0s+ most programmers 5orked in environments that 5ere e)ceedingly simple by modern
standards& Ac0uaintance 5ith an assembler+ a loader+ and a small program library su%%iced. The programs they
5rote 5ere small compared to 5hat a pro%essional programmer 5rites today. The simpler a programming
environment is+ the better suited it is %or learning to program. Alas+ today simple environments are hard to %indM
"ven a home computer is e0uipped 5ith comple) so%t5are that is not easily ignored or bypassed. /or the sake o%
education it is use%ul to invent arti%icial programming environments. Their only purpose is to illustrate some
important concepts in the simplest possible setting and to %acilitate insight. 2art $ o% this book introduces such a toy
7
programming environment suitable %or programming graphics and motion+ and illustrates ho5 it can gradually be
enriched to approach a simple but use%ul graphics environment.
Textbooks on computer graphics. The computer#driven graphics screen is a po5er%ul ne5 medium %or
communication. Fisuali6ation o%ten makes it possible to present the results o% a computation in intuitively
appealing 5ays that convey insights not easily gained in any other manner. To e)ploit this medium+ every
programmer must master basic visuali6ation techni0ues. :e re%er the reader interested in a systematic
introduction to computer graphics to such e)cellent te)tbooks as NB, D9O+ N/(/A 90O+ N!. 79O+ N8og D@O+ N:at D9O+
and N:ol D9O.
Algorithms and Data Structures D A ,lobal Te)t
%& 'educing a tas( to given
primitives# programming
motion
Learning ob3ectives&
primitives %or speci%ying motion
e)pressing an algorithm in in%ormal notations and in high# and lo5#level programming languages
program veri%ication
program optimi6ation
A robot car$ its capabilities$ and the tas( to be performed
.ome aspects o% programming can be learned 5ithout a computer+ by inventing an arti%icial programming
environment as a purely mental e)ercise. The e)ample o% a vehicle that moves under program control in a %ictitious
landscape is a microcosmos o% programming lore. $n this section 5e introduce important concepts that 5ill
reappear later in more elaborate settings.
The environment. Consider a t5o#dimensional s0uare grid+ a portion o% 5hich is enclosed by a 5all made up
o% hori6ontal and vertical line segments that run hal%5ay bet5een the grid points H")hibit 1.1I. A robot car enclosed
5ithin the 5all moves along this grid under computer control+ one step at a time+ %rom grid point to ad3acent grid
point. Be%ore and a%ter each step+ the robot;s state is described by a location Hgrid pointI and a direction Hnorth+ east+
south+ or 5estI.
")hibit 1.1& The robot;s crosshairs sho5 its current location on the grid.
The robot is controlled by a program that uses the %ollo5ing commands&
left Turn 90 degrees counterclockwise.
right Turn 90 degrees clockwise.
forward Move one step, to the next grid point in front of
you
goto # Send program control to the lael #.
if touch goto # !f you are touching a wall to your front, send
program control to the lael #.
1. Reducing a task to given primitives: programming motion
A program %or the robot is a se0uence o% commands 5ith distinct labels. The labels serve merely to identi%y the
commands and need not be arranged either consecutively or in increasing order. ")ecution begins 5ith the %irst
command and proceeds to successive commands in the order in 5hich they appear+ e)cept 5hen %lo5 o% control is
redirected by either o% the goto commands.
")ample
The %ollo5ing program moves the robot %or5ard until it bumps into a 5all&
" if touch goto #
$ forward
% goto "
# { there is no command here; just a label }
$n developing programs %or the robot+ 5e %eel %ree to use any high#level language 5e pre%er+ and embed robot
commands in it. Thus 5e might have e)pressed our 5all#%inding program by the simpler statement
while not touch do forward&
and then translated it into the robot;s language.
A program %or this robot car to patrol the 5alls o% a city consists o% t5o parts& /irst+ %ind a 5all+ the problem 5e
3ust solved. .econd+ move along the 5all %orever 5hile maintaining t5o conditions&
1. !ever lose touch 5ith the 5allL at all times+ keep 5ithin one step o% it.
2. Fisit every spot along the 5all in a monotonic progression.
The mental image o% 5alking around a room 5ith eyes closed+ le%t arm e)tended+ and the le%t hand touching the
5all at all times 5ill prove use%ul. To mirror this solution 5e start the robot so that it has a 5all on its immediate le%t
rather than in %ront. As the robot has no sensor on its le%t side+ 5e 5ill let it turn le%t at every step to sense the 5all
5ith its %ront bumper+ then turn right to resume its position 5ith the 5all to its le%t.
Wall)follo*ing algorithm described informally
Idea of solution: Touch the 5all 5ith your le%t handL move %or5ard+ turning le%t or right as re0uired to keep
touching the 5all.
Wall-following algorithm described in English: Clock5ise+ starting at le%t+ look %or the %irst direction not
blocked by a 5all+ and i% %ound+ take a step in that direction.
Let us test this algorithm on some critical con%igurations. The robot inside a unit s0uare turns %orever+ never
%inding a direction to take a step H")hibit 1.2I. $n ")hibit 1.3 the robot negotiates a le%t#hand spike. A%ter each step+
there is a 5all to its le%t#rear. $n ")hibit 1.< the robot enters a blind alley. At the end o% the alley+ it turns clock5ise
t5ice+ then e)its by the route it entered.
")hibit 1.2& 8obot in a bo) spins on its heels.
10
")hibit 1.3& The robot turns around a spike.
")hibit 1.<& Backing up in a blind alley.
Algorithm specified in a high)level language
The ideas presented in%ormally in above section are made precise in the %ollo5ing elegant+ concise program&
{ wall to left-rear }
loop
left&
{ wall to left-front }
while touch do
{ wall to right-front }
right&
endwhile&
forward&
forever&
Program verification. The comments in braces are program invariants& Assertions about the state o% the
robot that are true every time the %lo5 o% control reaches the place in the program 5here they are 5ritten. :e need
three types o% invariants to veri%y the 5all#%ollo5ing program& E5all to le%t#rearE+ E5all to le%t#%rontE+ and E5all to
right#%rontE. The relationships bet5een the robot;s position and the presence o% a nearby 5all that must hold %or
each assertion to be true are illustrated in ")hibit 1.@. .haded circles indicate points through 5hich a 5all must
pass. "ach robot command trans%orms its precondition Hi.e. the assertion true be%ore the command is e)ecutedI
into its postcondition Hi.e. the assertion true a%ter its e)ecutionI. Thus each o% the commands ;le%t;+ ;right;+ and
;%or5ard; is a predicate transformer+ as suggested in ")hibit 1.>.
")hibit 1.@& Three types o% invariants relate the positions o% robot and 5all.
1. Reducing a task to given primitives: programming motion
")hibit 1.>& 8obot motions as predicate trans%ormers.
Algorithm programmed in the robot+s language
A straight%or5ard translation %rom the high#level program into the robot;s lo5#level language yields the
%ollo5ing seven#line 5all#%ollo5ing program&
loop
left& " left
while touch do $ if touch goto #
% goto '
right& # right
endwhile& ( goto $
forward& ' forward
forever& ) goto "
The robot+s program optimi,ed
$n designing a program it is best to %ollo5 simple+ general ideas+ and to decide on details in the most
straight%or5ard manner+ 5ithout regard %or the many alternative 5ays that are al5ays available %or handling
details. Cnce a program is proven correct+ and runs+ then 5e may try to improve its e%%iciency+ measured by time
and memory re0uirements. This process o% program transformation can o%ten be done syntactically+ that is merely
by considering the de%inition o% individual statements+ not the algorithm as a 5hole. As an e)ample+ 5e derive a
%ive#line version o% the 5all#%ollo5ing program by trans%orming the seven#line program in t5o steps.
$% 5e have the complementary primitive ;i% not touch goto P;+ 5e can simpli%y the %lo5 o% the program at the le%t
as sho5n on the right side.
{ wall to left-rear } { wall to left-rear }
" left " left
$ if touch goto # $ if not touch goto '
% goto '
{ wall to right-front } { wall to right-front }
# right # right
12
( goto $ ( goto $
' forward ' forward
) goto " ) goto "
An optimi6ation techni0ue called loop rotation allo5s us to shorten this program by yet another instruction. $t
changes the structure o% the program signi%icantly+ as 5e see %rom the 5ay the labels have been permuted. The
assertion E5all to right#%rontE attached to line < serves as an invariant of the loop Ekeep turning right 5hile you
can;t advanceE.
{ wall to right-front }
# right
$ if touch goto #
' forward
" left
) goto $
2rogramming pro3ects
1. (esign a data structure suitable %or storing a 5all made up o% hori6ontal and vertical line segments in a
s0uare grid o% bounded si6e. :rite a E5all#editorE+ i.e. an interactive program that lets the user de%ine and
modi%y an instance o% such a 5all.
2. 2rogram the 5all#%ollo5ing algorithm and animate its e)ecution 5hen tracking a 5all entered 5ith the 5all#
editor. .peci%ically+ sho5 the robot;s position and orientation a%ter each change o% state.
-& Graphics primitives and
environments
turtle graphics
=uick(ra5& A graphics toolbo)
%rame program
interactive graphics inputQoutput
e)ample& polyline input
Turtle graphics# a basic environment
.eymour 2apert N2apD0O introduced the term turtle graphics to denote a set o% primitives %or line dra5ing.
Criginally implemented in the programming language Logo+ turtle graphics primitives are no5 available %or several
computer systems and languages. They come in di%%erent versions+ but the essential point is the same as that
introduced in the e)ample o% the robot car& The pen Hor EturtleEI is a device that has a state Hposition+ directionI and
is driven by incremental operations RmoveS and RturnS that trans%orm the turtle to a ne5 state depending on its
current state&
move*s+ { take s unit steps in the direction you are facing }
turn*d+ { turn counterclockwise d degrees }
The turtle,s initial state is set y the following operations-
moveto*x,y+ { move to the position (x,y in absolute coordinates }
turnto*d+ { face d degrees from due east }
$n addition+ 5e can speci%y the color o% the trail dra5n by the moving pen&
pencolor*c+ { where c ! white, black, none, etc" }
")ample
The %ollo5ing program %ragment appro)imates a circle tangential to the )#a)is at the origin by dra5ing a 3>#
sided polygon&
moveto*0, 0+& { position pen at origin }
turnto*0+& { face east }
step -. )& { arbitrarily chosen step length }
do %' times { #$ sides % &'( ! #$'( }
{ move*step+& turn*"0+ } { &' degrees counterclockwise }
$n graphics programming 5e are likely to use basic %igures+ such as circles+ over and over again+ each time 5ith a
di%%erent si6e and position. Thus 5e 5ish to turn a program %ragment such as the circle appro)imation above into a
reusable procedure.
Algorithms and Data Structures 1< A ,lobal Te)t
2. Graphics primitives and environments
2rocedures as building blocks
A program is built %rom components at many di%%erent levels o% comple)ity. At the lo5est level 5e have the
constructs provided by the language 5e use& constants+ variables+ operators+ e)pressions+ and simple HunstructuredI
statements. At the ne)t higher level 5e have procedures& they let us re%er to a program %ragment o% arbitrary si6e
and comple)ity as a single entity+ and build hierarchically nested structures. 'odern programming languages
provide yet another level o% packaging& modules+ or packages+ use%ul %or grouping related data and procedures. :e
limit our discussion to the use o% procedures.
2rogrammers accumulate their o5n collection o% use%ul program %ragments. 2rogramming languages provide
the concept o% a procedure as the ma3or tool %or turning %ragments into reusable building blocks. A procedure
consists o% t5o parts 5ith distinct purposes&
1. The heading speci%ies an important part o% the procedure;s e)ternal behavior through the list o% formal
parameters& namely+ 5hat type o% data moves in and out o% the procedure.
2. The bod implements the action per%ormed by the procedure+ processing the input data and generating the
output data.
A program %ragment that embodies a single coherent concept is best 5ritten as a procedure. This is particularly
true i% 5e e)pect to use this %ragment again in a di%%erent conte)t. The 0uestion o% ho5 general 5e 5ant a procedure
to be deserves care%ul thought. $% the procedure is too speci%ic+ it 5ill rarely be use%ul. $% it is too general+ it may be
un5ieldy& too large+ too slo5+ or 3ust too di%%icult to understand. The generality o% a procedure depends primarily on
the choice o% %ormal parameters.
")ample& the long road to5ard a procedure RcircleS
Let us illustrate these issues by discussing design considerations %or a procedure that dra5s a circle on the
screen. The program %ragment above %or dra5ing a regular polygon is easily turned into
procedure ngon*n,s- integer+& { n ! number of sides, s ! step
si)e }
var i,/- integer&
egin
/ -. %'0 div n&
for i -. " to n do { move*s+& turn*/+ }
end&
But+ a use%ul procedure to dra5 a circle re0uires additional arguments. Let us start 5ith the %ollo5ing&
procedure circle*x, y, r, n- integer+&
{ centered at (x, y; r ! radius; n ! number of sides }
var a, s, i- integer& { angle, step, counter }
egin
moveto*x, y 0 r+& { bottom of circle }
turnto*0+& { east }
a -. %'0 div n&
s -. r 1 sin*a+& { between inscribed and circumscribed polygons }
for i -. " to n do { move*s+& turn*a+ }
end&
This procedure places the burden o% choosing n on the programmer. A more sophisticated+ EadaptiveE version
might choose the number o% sides on its o5n as a %unction o% the radius o% the circle to be dra5n. :e assume that
lengths are measured in terms o% pi)els Hpicture elementsI on the screen. :e observe that a circle o% radius r is o%
1@
length 2Tr. :e appro)imate it by dra5ing short#line segments+ about 3 pi)els long+ thus needing about 2Kr line
segments.
procedure circle*x, y, r- integer+& { centered at (x, y; radius
r}
var a, s, i- integer& { angle, step, counter }
egin
moveto*x, y 0 r+& { bottom of circle }
turnto*0+& { east }
a -. "20 div r& { #$' * (+ of line segments }
s -. r 1 sin*a+& { between inscribed and circumscribed polygons }
for i -. " to $ 1 r do { move*s+& turn*a+ }
end&
This circle procedure still su%%ers %rom severe shortcomings&
1. $% 5e discreti6e a circle by a set o% pi)els+ it is an unnecessary detour to do this in t5o steps as done above&
%irst+ discreti6e the circle by a polygonL second+ discreti6e the polygon by pi)els. This t5o#step process is a
source o% unnecessary 5ork and errors.
2. The appro)imation o% the circle by a polygon computed %rom verte) to verte) leads to rounding errors that
accumulate. Thus the polygon may %ail to close+ in particular 5hen using integer computation 5ith its
inherent large rounding error.
3. The procedure attempts to dra5 its circle on an in%inite screen. Computer screens are %inite+ and attempted
dra5ing beyond the screen boundary may or may not cause an error. Thus the circle ought to be clipped at
the boundaries o% an arbitrarily speci%ied rectangle.
:riting a good circle procedure is a demanding task %or pro%essionals. :e started this discussion o% desiderata
and di%%iculties o% a simple library procedure so that the reader may appreciate the thought and e%%ort that go into
building a use%ul programming environment. $n chapter 1< 5e return to this problem and present one possible goal
o% Ethe long road to5ard a procedure ;circle;E. :e no5 make a huge 3ump %rom the arti%icially small environments
discussed so %ar to one o% today;s realistic programming environments %or graphics
.uic(Dra*# a graphics toolbo/
/or the sake o% concreteness+ the ne)t %e5 sections sho5 programs 5ritten %or a speci%ic programming
environment& 'ac2ascal using the =uick(ra5 library o% graphics routines NApp D@O. $t is not our purpose to
duplicate a manual+ but only to convey the %lavor o% a realistic graphics package and to e)plain enough about
=uick(ra5 %or the reader to understand the %e5 programs that %ollo5. .o our treatment is highly selective and
biased.
Concerning the circle that 5e attempted to program above+ =uick(ra5 o%%ers %ive procedures %or dra5ing circles
and related %igures&
procedure 3rame4val*r- 5ect+&
procedure 6aint4val*r- 5ect+&
procedure 7rase4val*r- 5ect+&
procedure !nvert4val*r- 5ect+&
procedure 3ill4val*r- 5ect& pat- 6attern+&
"ach one inscribes an oval in an aligned rectangle r Hsides parallel to the a)esI so as to touch the %our sides o% r.
$% r is a s0uare+ the oval becomes a circle. :e 0uote %rom NApp D@O&
Algorithms and Data Structures 1> A ,lobal Te)t
!rame"val draws an outline #ust inside the oval that fits inside the specified rectangle$ using the current
grafPort%s pen pattern$ mode$ and si&e' (he outline is as wide as the pen width and as tall as the pen height'
It%s drawn with the pnPat$ according to the pattern transfer mode specified b pn)ode' (he pen location is
not changed b this procedure'
8ight a5ay 5e notice a trade#o%% 5hen comparing =uick(ra5 to the simple turtle graphics environment 5e
introduced earlier. At one stroke+ R/rameCvalS appears to be able to produce many di%%erent pictures+ but be%ore 5e
can e)ploit this po5er+ 5e have to learn about gra%2orts+ pen 5idth+ pen height+ pen patterns+ and pattern trans%er
modes. ;/rameCval; dra5s the perimeter o% an oval+ ;2aintCval; paints the interior as 5ell+ ;"raseCval; paints an oval
5ith the current gra%2ort;s background pattern+ ;$nvertCval; complements the pi)els& ;5hite; becomes ;black;+ and
vice versa. ;/illCval; has an additional argument that speci%ies a pen pattern used %or painting the interior.
:e may not need to kno5 all o% this in order to use one o% these procedures+ but 5e do need to kno5 ho5 to
speci%y a rectangle. =uick(ra5 has prede%ined a type ;8ect; that+ some5hat ambiguously at the programmer;s
choice+ has either o% the %ollo5ing t5o interpretations&
type 5ect . record top, left, ottom, right- integer end&
type 5ect . record top8eft, ot5ight- 6oint end&
with one of the interpretations of type ,6oint, eing
type 6oint . record v, h- integer end&
")hibit 2.1 illustrates and provides more in%ormation about these concepts. $t sho5s a plane 5ith %irst
coordinate v that runs %rom top to bottom+ and a second coordinate h that runs %rom le%t to right. HThe reason %or v
running %rom top to bottom+ rather than vice versa as used in math books+ is compatibility 5ith te)t coordinates
5here lines are naturally numbered %rom top to bottom.I The domain o% v and h are the integers %rom U2
1@
V U327>D
to 2
1@
U 1 V 327>7. The points thus addressed on the screen are sho5n as intersections o% grid lines. These lines and
grid points are in%initely thin # they have no e)tension. The pi)els are the unit s0uares bet5een them. "ach pi)el is
paired 5ith its top le%t grid point. This may be enough in%ormation to let us dra5 a slightly %at point o% radius 3
pi)els at the grid point 5ith integer coordinates Hv+ hI by calling
6aint4val*v 0 %, h 0 %, v 9 %, h 9 %+&
")hibit 2.1& .creen coordinates de%ine the location o% pi)els.
To understand the procedures o% this section+ the reader has to understand a %e5 details about t5o key aspects o%
interactive graphics&
timing and synchroni6ation o% devices and program e)ecution
ho5 screen pictures are controlled at the pi)el level
17
.ynchroni6ation
$n interactive applications 5e o%ten 5ish to speci%y a grid point by letting the user point the mouse#driven cursor
to some spot on the screen. The ;procedure ,et'ouseHv+ hI; returns the coordinates o% the grid point 5here the
cursor is located at the moment ;,et'ouse; is e)ecuted. Thus 5e can track and paint the path o% the mouse by a
loop such as
repeat :etMouse*v, h+& 6aint4val*v 0 %, h 0 %, v 9 %, h 9 %+
until stop&
This does not give the user any timing control over 5hen he or she 5ants the computer to read the coordinates
o% the mouse cursor. Clicking the mouse button is the usual 5ay to tell the computer E!o5ME. A prede%ined boolean
%unction ;Button; returns ;true; 5hen the mouse button is depressed+ ;%alse; 5hen not. :e o%ten synchroni6e
program e)ecution 5ith the user;s clicks by programming bus waiting loops&
repeat until ;utton& { waits for the button to be pressed }
while ;utton do& { waits for the button to be released }
The following procedure waits for the next click-
procedure wait3or<lick&
egin repeat until ;utton& while ;utton do end&
2i)el acrobatics
The =uick(ra5 pen has %our parameters that can be set to dra5 lines or paint te)tures o% great visual variety&
pen location ;pnLoc;+ pen si6e ;pn.i6e; Ha rectangle o% given height and 5idthI+ a pen pattern ;pn2at;+ and a dra5ing
mode ;pn'ode;. The pi)els a%%ected by a motion o% the pen are sho5n in ")hibit 2.2.
")hibit 2.2& /ootprint o% the pen.
2rede%ined values o% ;pn2at; include ;black;+ ;gray;+ and ;5hite;. ;pn2at; is set by calling the prede%ined ;procedure
2en2atHpat& 2atternI; Ne.g. ;2en2atHgrayI;O. As ;5hite; is the de%ault background+ dra5ing in ;5hite; usually serves %or
erasing.
The result o% dra5ing also depends critically on the trans%er mode ;pn'ode;+ 5hose values include ;patCopy;+
;patCr;+ and ;patWor;. A trans%er mode is a boolean operation e)ecuted in parallel on each pair o% pi)els in
corresponding positions+ one on the screen and one in the pen pattern.
;patCopy; uses the pattern pi)el to over5rite the screen pi)el+ ignoring the latter;s previous valueL it is the
de%ault and most %re0uently used trans%er mode.
;patCr; paints a black pi)el i% either or both the screen pi)el or the pattern pi)el 5ere blackL it progressively
blackens the screen.
Algorithms and Data Structures 1D A ,lobal Te)t
;patWor; He*clusive-or+ also kno5n as Eodd parityEI sets the result to black i%% e)actly one o% Hscreen pi)el+
pattern pi)elI is black. A 5hite pi)el in the pen leaves the underlying screen pi)el unchangedL a black pi)el
complements it. Thus a black pen inverts the screen.
;pn'ode; is set by calling the prede%ined ;procedure 2en'odeHmode& integerI; Ne.g. ;2en'odeHpatWorI;O.
The meaning o% the remaining prede%ined procedures our e)amples use+ such as ;'oveTo; and ;LineTo;+ is easily
guessed. .o 5e terminate our peep into some key details o% a po5er%ul graphics package+ and turn to e)amples o% its
use.
A graphics frame program
+eusable software is a time saving concept that can be practiced pro%itably in the small. :e keep a program that
contains nothing but a %e5 o% the most use%ul inputQoutput procedures+ displays samples o% their results+ and
conducts a minimal dialog so that the user can step through its e)ecution. :e call this a frame program because its
real purpose is to %acilitate development and testing o% ne5 procedures by embedding them in a ready#made+ tested
environment. A simple %rame program like the one belo5 makes it very easy %or a novice to 5rite his %irst interactive
graphics program.
This particular %rame program contains procedures ;,et2oint;+ ;(ra52oint;+ ;Click2oint;+ ;(ra5Line;+ ;(ragLine;+
;(ra5Circle;+ and ;(ragCircle; %or input and display o% points+ lines+ and circles on a screen ideali6ed as a part o% a
"uclidean plane+ disregarding the discreti6ation due to the raster screen. .ome o% these procedures are so short that
one asks 5hy they are introduced at all. ;,et2oint;+ %or e)ample+ only converts integer mouse coordinates v+ h into a
point p 5ith real coordinates. $t enables us to re%er to a point p 5ithout mentioning its coordinates e)plicitly. Thus+
by bringing us closer to standard geometric notation+ ;,et2oint; makes programs more readable.
The procedure ;(ragLine;+ on the other hand+ is a very use%ul routine %or interactive input o% line segments. $t
uses the rubber-band techni,ue+ 5hich is %amiliar to users o% graphics editors. The user presses the mouse button
to %i) the %irst endpoint o% a line segment+ and keeps it depressed 5hile moving the mouse to the desired second
endpoint. At all times during this motion the program keeps displaying the line segment as it 5ould look i% the
button 5ere released at that moment. This rubber band keeps getting dra5n and erased as it moves across other
ob3ects on the screen. The user should study a key detail in the procedure ;(ragLine; that prevents other ob3ects
%rom being erased or modi%ied as they collide 5ith the ever#re%reshed rubber band& :e temporarily set
;2en'odeHpatWorI;. :e encourage you to e)periment by modi%ying this procedure in t5o 5ays&
1. Change the %irst call o% the ;procedure (ra5LineHL.p1+ L.p2+ blackI; to ;(ra5LineHL.p1+ L.p2+ 5hiteI;. Xou 5ill
have turned the procedure ;(ragLine; into an art%ul+ i% some5hat random+ painting brush.
2. 8emove the call ;2en'odeHpatWorI; Hthus reestablishing the de%ault ;pn'ode V patCopy;I+ but leave the %irst
;(ra5LineHL.p1+ L.p2+ 5hiteI;+ %ollo5ed by the second ;(ra5LineHL.p1+ L.p2+ blackI;. Xou no5 have a naive
rubber#band routine& $t alternates erasing Hdra5 ;5hite;I and dra5ing Hdra5 ;black;I the current rubber
band+ but in so doing it modi%ies other ob3ects that share pi)els 5ith the rubber band. This is our %irst
e)ample o% the use o% the versatile e*clusive-orL others 5ill %ollo5 later in the book.
program 3rame&
= provides mouse input and drawing of points, line segments,
circles >
type point . record x, y- real end&
lineSegment . record p", p$- point = endpoints > end&
19
var c, p- point&
r- real& = radius of a circle >
8- lineSegment&
procedure ?ait3or<lick&
procedure :et6oint *var p- point+&
var v, h- integer&
egin
:etMouse*v, h+&
p.x -. v& p.y -. h = convert integer to real >
end&
procedure @raw6oint*p- point& pat- 6attern+&
const t . %& = radius of a point >
egin
6en6at*pat+&
6aint4val*round*p.y+ 0 t, round*p.x+ 0 t, round*p.y+ 9 t,
round*p.x+ 9 t+
end&
procedure <lick6oint*var p- point+&
egin ?ait3or<lick& :et6oint*p+& @raw6oint*p, ;lack+ end&
function @ist*p, A- point+- real&
egin @ist -. sArt*sAr*p.x 0 A.x+ 9 sAr*p.y 0 A.y++ end&
procedure @raw8ine*p", p$- point& pat- 6attern+&
egin
6en6at*pat+&
MoveTo*round*p".x+, round*p".y++&
8ineTo*round*p$.x+, round*p$.y++
end&
procedure @rag8ine*var 8- lineSegment+&
egin
repeat until ;utton& :et6oint*8.p"+& 8.p$ -. 8.p"&
6enMode*patBor+&
while ;utton do egin
@raw8ine*8.p", 8.p$, lack+&
= replace ,lack, y ,white, aove to get an artistic drawing
tool >
:et6oint*8.p$+&
@raw8ine*8.p", 8.p$, lack+
end&
6enMode*pat<opy+
end& = @rag8ine >
procedure @raw<ircle*c- point& r- real& pat- 6attern+&
egin
6en6at*pat+&
3rame4val*round*c.y 0 r+, round*c.x 0 r+, round*c.y 9 r+,
round*c.x 9 r++
end&
procedure @rag<ircle*var c- point& var r- real+&
var p- point&
egin
repeat until ;utton& :et6oint*c+& r -. 0.0& 6enMode*patBor+&
while ;utton do egin
@raw<ircle*c, r, lack+&
:et6oint*p+&
r -. @ist*c, p+&
@raw<ircle*c, r, lack+&
end&
6enMode*pat<opy+
end& = @rag<ircle >
procedure Title&
egin
ShowText& = make sure the text window and C >
Show@rawing& = C the graphics window show on the screen >
?rite8n*,3rame program,+&
?rite8n*,with simple graphics and interaction routines.,+&
?rite8n*,<lick to proceed.,+&
?ait3or<lick
end& = Title >
procedure ?hat&
egin
?rite8n*,<lick a point in the drawing window.,+&
<lick6oint*p+&
?rite8n*,@rag mouse to enter a line segment.,+&
@rag8ine*8+&
?rite8n*,<lick center of a circle and drag its radius,+&
@rag<ircle*c, r+
end& = ?hat >
procedure 7pilog&
egin ?rite8n*,;ye.,+ end&
egin = 3rame >
Title& ?hat& 7pilog
end. = 3rame >
")ample o% a graphics routine& polyline input
Let us illustrate the use o% the %rame program above in developing a ne5 graphics procedure. :e choose
interactive polyline input as an e)ample. A polline is a chain o% directed straight#line segmentsYthe starting point
o% the ne)t segment coincides 5ith the endpoint o% the previous one. ;2olyline; is the most use%ul tool %or interactive
input o% most dra5ings made up o% straight lines. The user clicks a starting point+ and each subse0uent click
e)tends the polyline by another line segment. A double click terminates the polyline.
:e developed ;2olyLine; starting %rom the %rame program above+ in particular the procedure ;(ragLine;+
modi%ying and adding a %e5 procedures. Cnce ;2olyline; 5orked+ 5e simpli%ied the %rame program a bit. /or
e)ample+ the original %rame program uses reals to represent coordinates o% points+ because most geometric
computation is done that 5ay. A polyline on a graphics screen only needs integers+ so 5e changed the type ;point; to
integer coordinates. At the moment+ the code %or polyline input is partly in the procedure ;!e)tLine.egment; and in
the procedure ;:hat;. $n the ne)t iteration+ it 5ould probably be combined into a single sel%#contained procedure+
5ith all the subprocedures it needs+ and the %rame program 5ould be tossed outYit has served its purpose as a
development tool.
program 6oly8ine&
{ enter a chain of line segments and compute total length }
{ stop on double click }
type point . record x, y- integer& end&
var stop- oolean&
length- real&
21
p, A- point&
function 7A6oints *p, A- point+- oolean&
egin 7A6oints -. *p.x . A.x+ and *p.y . A.y+ end&
function @ist *p, A- point+- real&
egin @ist -. sArt*sAr*p.x 0 A.x+ 9 sAr*p.y 0 A.y++ end&
procedure @raw8ine *p, A- point& c- 6attern+&
egin 6en6at*c+& MoveTo*p.x, p.y+& 8ineTo*A.x, A.y+ end&
procedure ?ait3or<lick&
procedure Dext8ineSegment *var stp, endp- point+&
egin
endp -. stp&
repeat
@raw8ine*stp, endp, lack+& { ,ry -white- to generate artful
pictures. }
:etMouse*endp.x, endp.y+&
@raw8ine*stp, endp, lack+
until ;utton&
while ;utton do
end& { /ext0ine1egment }
procedure Title&
egin
ShowText& Show@rawing&
?rite8n*,<lick to start a polyline.,+&
?rite8n*,<lick to end each segment.,+&
?rite8n*,@oule click to stop.,+
end& { ,itle }
procedure ?hat&
egin
?ait3or<lick& :etMouse*p.x, p.y+&
stop -. false& length -. 0.0&
6enMode*patBor+&
while not stop do egin
Dext8ineSegment*p, A+&
stop -. 7A6oints*p, A+& length -. length 9 @ist*p, A+& p -. A
end
end& { 2hat }
procedure 7pilog&
egin ?rite8n*,8ength of polyline . ,, length+& ?rite8n*,;ye.,+
end&
egin { 3oly0ine }
Title& ?hat& 7pilog
end. { 3oly0ine }
1. $mplement a simple package o% turtle graphics operations on top o% the graphics environment available on
your computer.
2. *se this package to implement and test a procedure ;circle; that meets the re0uirements listed at the end o%
the section RTurtle graphics& a basic environmentS.
3. $mplement your personal graphics %rame program as described in RA graphics %rame programS. Xour e%%ort
5ill pay o%% in time saved later+ as you 5ill be using this program throughout the entire course.
23
0& Algorithm animation
$ hear and $ %orget+ $ see and $ remember+ $ do and $ understand.
A picture is worth a thousand words-the art of presenting information in visual form'
adding animation code to a program
e)amples o% algorithm snapshots
omputer)driven visuali,ation# characteristics and techni1ues
The computer#driven graphics screen is a po5er%ul ne5 communications mediumL indeed+ it is the only t5o#5ay
mass communications medium 5e kno5. Cther mass communications mediaUthe printed e.g. recorded audio and
videoYare one#5ay streets suitable %or delivering a monolog. The uni0ue strength o% our ne5 medium is interactive
presentation o% in%ormation. $deally+ the vie5er drives the presentation+ not 3ust by pushing a start button and
turning a channel selector+ but controls the presentation at every step. Ae controls the %lo5 not only 5ith
commands such as E%asterE+ Eslo5erE+ ErepeatE+ EskipE+ Eplay this back5ardsE+ but more important+ 5ith a barrage o%
E5hat i%?E 0uestions. :hat i% the area o% this triangle becomes 6ero? :hat i% 5e double the load on this beam? :hat
i% 5orld population gro5s a bit %aster? This po5er%ul ne5 medium challenges us to use it 5ell.
:hen using any medium+ 5e must ask& :hat can it do 5ell+ and 5hat does it do poorly? The computer#driven
screen is ideally suited %or rapid and accurate display o% in%ormation that can be deduced %rom large amounts o%
data by means o% straight%or5ard algorithms and lengthy computation. $t can do so in response to a variety o% user
inputs as long as this variety is contained in an algorithmically tractable+ narro5 domain o% discourse. $t is not
adept at tasks that re0uire 3udgment+ e)perience+ or insight. By comparison+ a speaker at the blackboard is slo5 and
inaccurate and can only call upon small amounts o% data and tiny computationsL 5e hope she makes up %or this
technical shortcoming by good 3udgment+ teaching e)perience+ and insight into the sub3ect. By 5ay o% another
comparison+ books and %ilms may accurately and rapidly present results based on much data and computation+ but
they lack the ability to react to a user;s input.
Algorithm animation+ the techni0ue o% displaying the state o% programs in e)ecution+ is ideally suited %or
presentation on a graphics screen. There is a need %or this type o% computation+ and there are techni0ues %or
producing them. The reasons %or animating programs in e)ecution %all into t5o ma3or categories+ 5hich 5e label
checking and e*ploring.
Checking
To understand an algorithm 5ell+ it is use%ul to understand it %rom several distinct points o% vie5. Cne o% them is
the static point o% vie5 on 5hich correctness proo%s are based& /ormulate invariants on the data and sho5 that
these are preserved under the program;s operations. This abstract approach appeals to our rational mind. A second+
e0ually important point o% vie5+ is dynamic& :atch the algorithm go through its paces on a variety o% input data.
This concrete approach appeals to our intuition. :hereas the static approach relies mainly on EthinkingE+ the
dynamic approach calls mostly %or EdoingE and EperceivingE+ and thus is a prime candidate %or visual human#
3. Algorithm animation
computer interaction. $n this use o% algorithm animation+ the user may be checking his understanding o% the
algorithm+ or may be checking the algorithm;s correctnessYin principle+ he could reason this out+ but in practice+ it
is %aster and sa%er to have the computer animation as a double check.
")ploring
$n a gro5ing number o% applications+ computer visuali6ation cannot be replaced by any other techni0ue. This is
the case+ %or e)ample+ in e)ploratory data analysis+ 5here a scientist may not kno5 a priori 5hat she is looking %or+
and the only 5ay to look at a mass o% data is to generate pictures %rom it Hsee a special issue on scienti%ic
visuali6ation N!ie D9OI. At times static pictures 5ill do+ but in simulations He.g. o% the onset o% turbulent %lo5I 5e
pre%er to see an animation over time.
Turning to the techni,ues of animation+ computer technology is in the midst o% e)tremely rapid evolution
to5ard ever#higher#0uality interactive image generation on po5er%ul graphics 5orkstations Hsee N8! 91O %or a
survey o% the state o% the artI. /ortunately+ animating algorithms such as those presented in this book can be done
ade0uately 5ith the graphics tools available on lo5#cost 5orkstations. These algorithms operate on discrete data
con%igurations Hsuch as matrices+ trees+ graphsI+ and use standard data structures+ such as arrays and lists. /or such
limited classes o% algorithms+ there are so%t5are packages that help produce animations based on speci%ications+
5ith a minimum o% e)tra programming re0uired. An e)ample o% an algorithm animation environment is the BAL.A
system NBro DD+ B. D@O. A more recent e)ample is the WX4 ,eoBench+ 5hich animates geometric algorithms
N!.(AB 91O.
$n our e)perience+ the bottleneck o% algorithm animation is not the e)tra code re0uired+ but graphic design.
What do ou want to show$ and how do ou displa it$ keeping in mind the limitations of the sstem ou have to
work with. The key point to consider is that data does not look like anything until 5e have de%ined a mapping %rom
the data space into visual space. (e%ining such a mapping ranges %rom trivial to practically impossible.
1. /or some kinds o% data+ such as geometric data in t5o# and three#dimensional space+ or real#valued
%unctions o% one or t5o real variables+ there are natural mappings that 5e learned in school. These help us
greatly in getting a %eel %or the data.
2. 'ultidimensional data Hdimension Z 3I can be displayed on a t5o#dimensional screen using a number o%
straight %or5ard techni0ues+ such as pro3ections into a subspace+ or using color or gray level as a %ourth
dimension. But our po5er o% perception diminishes rapidly 5ith increasing dimensionality.
3. /or discrete combinatorial data there is o%ten no natural or accepted visual representation. As an e)ample+
5e o%ten dra5 a graph by mapping nodes into points and edges into lines. This representation is natural %or
graphs that are embedded in "uclidean space+ such as a road net5ork+ and 5e can readily make sense o% a
map 5ith thousands o% cities and road links. :hen 5e e)tend it to arbitrary graphs by placing a node
any5here on the screen+ on the other hand+ 5e get a random crisscrossing o% lines o% little intuitive value.
$n addition to such inherent problems o% visual representation+ practical di%%iculties o% the most varied type
abound. E*amples:
.ome screens are a5%ully small+ and some data sets are a5%ully large %or display even on the largest screens.
An animation has to run 5ithin a narro5 speed range. $% it is too %ast+ 5e %ail to %ollo5+ or the screen may
%licker disturbinglyL i% too slo5+ 5e may lack the time to observe it.
2@
$n conclusion+ 5e hold that it is not too di%%icult to animate simple algorithms as discussed here by interspersing
dra5ing statements into the normal code. $ndependent o% the algorithm to be animated+ you can call on your o5n
collection o% display and interaction procedures that you have built up in your %rame program Hin the section EA
graphics %rame programI. But designing an ade0uate graphic representation is hard and re0uires a creative e%%ort
%or each algorithmYthat is 5here animatorsQprogrammers 5ill spend the bulk o% their e%%ort. 'ore on this topic in
N!FA D>O.
")ample& the conve) hull o% points in the plane
The %ollo5ing program is an illustrative e)ample %or algorithm animation. ;Conve)Aull; animates an on-line
algorithm that constructs hal% the conve) hull Hsay+ the upper hal%I o% a set o% points presented incrementally. $t
accepts one point at a time+ 5hich must lie to the right o% all preceding ones+ and immediately e)tends the conve)
hull. The algorithm is e)plained in detail in Rsample problems and algorithmsS.
program <onvexEull& { of n 4 5' points in two dimensions }
const nmax . "9& { max number of points }
r . %& { radius of point plot }
var x, y, dx, dy- arrayF0 .. nmaxG of integer&
- arrayF0 .. nmaxG of integer& { backpointer }
n- integer& { number of points entered so far }
px, py- integer& { new point }
procedure 6ointHero&
egin
n -. 0&
xF0G -. (& yF0G -. $0& { the first point at fixed location }
dxF0G -. 0& dyF0G -. "& { assume vertical tangent }
F0G -. 0& { points back to itself }
6aint4val*yF0G 0 r, xF0G 0 r, yF0G 9 r, xF0G 9 r+
end&
function Dext5ight- oolean&
egin
if n I nmax then
Dext5ight -. false
else egin
repeat until ;utton&
while ;utton do :etMouse*px, py+&
if px J xFnG then
Dext5ight -. false
else egin
6aint4val*py 0 r, px 0 r, py 9 r, px 9 r+&
n -. n 9 "& xFnG -. px& yFnG -. py&
dxFnG -. xFnG 0 xFn 0 "G& { dx 6 ' } dyFnG -. yFnG 0 yFn 0"G&
FnG -. n 0 "&
MoveTo*px, py+& 8ine*0dxFnG, 0dyFnG+& Dext5ight -. true
end
end
end&
procedure <omputeTangent&
var i- integer&
egin
i -. FnG&
while dyFnG 1 dxFiG K dyFiG 1 dxFnG do egin { dy7n8*dx7n8 6
dy7i8*dx7i8 }
i -. FiG&
dxFnG -. xFnG 0 xFiG& dyFnG -. yFnG 0 yFiG&
MoveTo*px, py+& 8ine*0dxFnG, 0dyFnG+&
FnG -. i
end&
MoveTo*px, py+& 6enSiLe*$, $+& 8ine*0dxFnG, 0dyFnG+& 6enDormal
end&
procedure Title&
egin
ShowText& Show@rawing& { make sure windows lie on top }
?rite8n*,The convex hull,+&
?rite8n*,of n points in the plane sorted y xMcoordinate,+&
?rite8n*,is computed in linear time.,+&
?rite*,<lick next point to the right, or <lick left to Auit.,+
end&
egin { 9onvex:ull }
Title& 6ointHero&
while Dext5ight do <omputeTangent&
?rite*,That,s itN,+
end.
A gallery of algorithm snapshots
The screen dumps sho5n in ")hibit 3.1 5ere taken %rom demonstration programs that 5e use to illustrate topics
discussed in class. Although snapshots cannot convey the in%ormation and the impact o% animations+ they may give
the reader ideas to try out. :e select t5o standard algorithm animation topics Hsorting and random number
generationI+ and an e)ample sho5ing the e%%ect o% cumulative rounding errors.
")hibit 3.1& $nitial con%iguration o% data+ [
27
")hibit 3.1& [ and snapshots %rom t5o sorting algorithms.
Fisual test %or randomness
Cur visual system is ama6ingly po5er%ul at detecting patterns o% certain kinds in the midst o% noise. 8andom
number generators H8!,sI are intended to simulate EnoiseE by means o% simple %ormulas. :hen patterns appear in
the visual representation o% supposedly random numbers+ chances are that this 8!, 5ill also %ail more rigorous
statistical tests. The eyes; pattern detection ability serves 5ell to dis0uali%y a %aulty 8!, but cannot certi%y one as
ade0uate. ")hibit 3.2 sho5s a simulation o% the ,alton board. $n theory+ the resulting density diagram should
appro)imate a bellshaped ,aussian distribution. Cbviously+ the 8!, used %alls short o% e)pectations.
")hibit 3.2& Cne look su%%ices to unmask a bad 8!,.
!umerics o% chaos+ or chaos o% numerical computation?
The %ollo5ing e)ample sho5s the e%%ect o% rounding errors and precision in linear recurrence relations. The d#
step linear recurrence 5ith constant coe%%icients in the domain o% real or comple) numbers+
is one o% the most %re0uent %ormulas evaluated in scienti%ic and technical computation He.g. %or the solution o%
di%%erential e0uationsI. By proper choice o% the constants ci and o% initial values 60+ 61+ [ + 6dU1 5e can generate
se0uences 6k that 5hen plotted in the plane o% comple) numbers %orm many di%%erent %igures. :ith dV 1 and \ 1\V 1+
%or e)ample+ 5e generate circles. The pictures in ")hibit 3.3 5ere all generated 5ith d V 3 and conditions that
determine a curve that is most easily described as a circle 3 running around the perimeter o% another circle 2 that
runs around a stationary circle 1. :e per%ormed this computation 5ith a %loating#point package that lets us pick
precision 2 Hi.e. the number o% bits in the mantissaI. The resulting pictures look a bit chaotic+ 5ith a behavior 5e
have come to associate 5ith %ractalsYeven i% the mathematics o% generating them is completely di%%erent+ and linear
recurrences computed 5ithout error 5ould look much more regular. !otice that the %irst t5o images are generated
by the same %ormula+ 5ith a single bit o% di%%erence in the precision used. The 5him o% this 1#bit di%%erence in
precision changes the image entirely.
29
")hibit 3.3& The e%%ect o% rounding errors in linear recurrence relations.
1. *se your personal graphics %rame program Hthe programming pro3ect o% Rgraphics primitives and
environmentsSI to implement and animate the conve) hull algorithm e)ample.
31
2. *se your graphics %rame program to implement and animate the behavior o% recurrence relations as
discussed in the section RA gallery o% algorithm snapshotsS.
3. ")tend your graphics %rame program 5ith a set o% dialog control operations su%%icient to guide the user
through the various steps o% the animation o% recurrence relations& in particular+ to give him the options+ at
any time+ to enter a ne5 set o% parameters+ then e)ecute the algorithm and animate it in either ;movie
mode; Hit runs at a predetermined speed until stopped by the userI+ or ;step mode; Nthe display changes only
5hen the user enters a logical command ;ne)t; He.g. by clicking the mouse or hitting a speci%ic keyIO.
!art ""# !rogramming
concepts# beyond notation
Thoughts on the role o% programming notations
A programming language is the main inter%ace bet5een a programmer and the physical machine+ and a novice
programmer 5ill tend to identi%y EprogrammingE 5ith Eprogramming in the particular language she has learnedE.
The reali6ation that there is much to programming Ebeyond notationE Hi.e. principles that transcend any one
languageI is a big step %or5ard in a programmer;s development.
2art $$ aims to help the reader take this step %or5ard. :e present e)amples that are best understood by %ocusing
on abstract principles o% algorithm design+ and only later do 5e grope %or suitable notations to turn this principle
into an algorithm e)pressed in su%%icient detail to become e)ecutable. $n keeping 5ith our predilection %or graphic
communication+ the %irst in%ormal e)pression o% an algorithmic idea is o%ten pictorial. :e sho5 by e)ample ho5
such representations+ although they may be incomplete+ can be turned into programs in a %ormal notation.
The literature on programming and languages. There are many books that present principles o%
programming and o% programming languages %rom a higher level o% abstraction. The principles highlighted di%%er
%rom author to author+ ranging %rom intuitive understanding to complete %ormality. The %ollo5ing te)tbooks
provide an e)cellent sample %rom the broad spectrum o% approaches& NA.. D<O+ NA.* D>O+ NBen D2O+ NBen D@O+ NBen
DDO+ N(i3 7>O+ N(/ DDO+ N,ri D1O+ and N'ey 90O.
2& Algorithms and programs
as literature# substance and
form
programming in the large versus programming in the small
large %lat programs versus small deep programs
programs as literature
%ractal pictures& sno5%lakes and Ailbert;s space#%illing curve
recursive de%inition o% %ractals by production or re5rite rules
2ascal and programming notations
!rogramming in the large versus programming in the small
$n studying and discussing the art o% programming it is use%ul to distinguish bet5een large programs and small
programs+ since these t5o types impose %undamentally di%%erent demands on the programmer.
2rogramming in the large
Large programs He.g. operating systems+ database systems+ compilers+ application packagesI ta) our
organi&ational abilit. The most important issues to be dealt 5ith include re0uirements analysis+ %unctional
speci%ication+ compatibility 5ith other systems+ ho5 to break a large program into modules o% manageable si6e+
documentation+ adaptability to ne5 systems and ne5 re0uirements+ ho5 to organi6e the team o% programmers+ and
ho5 to test the so%t5are. These issues are the staple o% so%t5are engineering. :hen compared to the daunting
managerial and design challenges+ the task o% actual coding is relatively simple. Large programs are o%ten flat: 'ost
o% the listing consists o% comments+ inter%ace speci%ications+ de%initions+ declarations+ initiali6ations+ and a lot o%
code that is e)ecuted only rarely. Although the %unction o% any single page o% source code may be rather trivial 5hen
considered by itsel%+ it is di%%icult to understand the entire program+ as you need a lot o% in%ormation to understand
ho5 this page relates to the 5hole. The classic book on programming in the large is NBro 7@O.
2rogramming in the small
.mall programs+ o% the kind discussed in this book+ challenge our technical kno5#ho5 and inventiveness.
Algorithmic issues dominate the programmer;s thinking& Among several algorithms that all solve the same
problem+ 5hich is the most e%%icient under the given circumstances? Ao5 much time and space does it take? :hat
data structures do 5e use? $n contrast to large programs+ small programs are usually deep+ consisting o% short+
compact code many o% 5hose statements are e)ecuted very o%ten. *nderstanding a small program may also be
di%%icult+ at least initially+ since the chain o% thought is o%ten subtle. Cnce you understand it thoroughly+ you can
reproduce it at any time 5ith much less e%%ort than 5as %irst re0uired. 'astery o% interesting small programs is the
4. Algorithms and programs as literature: substance and form
best 5ay to get started in computer science. :e encourage the reader to 5ork out all the details o% the e)amples 5e
present.
This book is concerned only with programming in the small. This decision determines our choice o%
topics to be presented+ our style o% presentation+ and the notation 5e use to e)press programs+ e)planations+ and
proo%s+ and heavily in%luences our comments on techni0ues o% programming. Cur style o% presentation appeals to
the reader;s intuition more than to %ormal rigor. :e aim at highlighting the key idea o% any argument that 5e make
rather than belaboring the details. :e take the liberty o% using a %ree notation that suits the purpose o% any speci%ic
argument 5e 5ish to make+ trusting that the reader understands our small programs so 5ell that he can translate
them into the programming language o% his choice. $n a nut shell+ 5e emphasi6e substance over %orm.
The purpose o% 2art $$ is to help engender a %luency in using di%%erent notations. :e provide yet other e)amples
o% unconventional notations that match the nature o% the problem they are intended to describe+ and 5e sho5 ho5
to translate them into 2ascal#like programs. .ince much o% the di%%erence bet5een programming languages is
merely syntactic+ 5e include t5o chapters that cover the basics o% synta) and synta) analysis. These topics are
important in their o5n rightL 5e present them early in the hope that they 5ill help the student see through
di%%erences o% notation that are merely Esyntactic sugarE.
Documentation versus literature# is it meant to be read3
$t is instructive to distinguish t5o types o% 5ritten materials+ and t5o corresponding types o% 5riting tasks&
documents and literature. Documents are constrained by re0uirements o% many kinds+ are read 5hen a speci%ic
need arises Hrarely %or pleasureI+ and their 0uality is 3udged by criteria such as %ormality+ con%ormity to a standard+
completeness+ accuracy+ and consistency. /iterature is a %orm o% art %ree %rom conventions+ read %or education or
entertainment+ and its 0uality is 3udged by aesthetic criteria much harder to enumerate than the ones above. The
touchstone is the 0uestion& $s it meant to be read? $% the ans5er is Eonly i% necessaryE+ then it;s a document+ not
literature.
As the name implies+ the documentation o% large programs is a typical document#5riting chore. 'uch has been
5ritten in so%t5are engineering about documentation+ a topic 5hose importance gro5s 5ith the si6e and comple)ity
o% the system to be documented. :e hold that small programs are not documented+ they are e)plained. As such+
they are literature+ or ought to be. The idea o% programs as literature is 5idely held Hsee+ e.g. N-nu D<OI. The key idea
is that an algorithm or program is part o% the te)t and melts into the te)t in the same 5ay as a paragraph+ a %ormula+
or a picture does. There are also %ormal notations and systems designed to support a style o% programming that
integrates te)t and code to %orm a package that is both readable %or humans and e)ecutable by machines N-nu D3O.
:hatever notation is used %or literate programming+ it has to describe all phases o% a program;s evolution+ %rom
idea to speci%ication to algorithm to program. (etails o% a good program cannot be understood+ or at least not
appreciated+ 5ithout an a5areness o% the grand design that guided the programmer. :hereas details are usually
5ell e)pressed in some %ormal notation+ grand designs are not. /or this reason 5e renounce %ormality and attempt
to convey ideas in 5hatever notation suits our purpose o% insight%ul e)planation. Let us illustrate this philosophy
5ith some e)amples.
3@
A sno5%lake
!ractal pictures are intuitively characteri6ed by the re0uirement that any part o% the picture+ o% any si6e+ 5hen
su%%iciently magni%ied+ looks like the 5hole picture. T5o pieces o% in%ormation are re0uired to de%ine a speci%ic
%ractal&
1. A picture primitive that serves as a building#block& 'any copies o% this primitive+ scaled to many di%%erent
si6es+ are composed to generate the picture.
2. A recursive rule that de%ines the relative position o% the primitives o% di%%erent si6e.
A picture primitive is surely best de%ined by a dra5ing+ and the manner o% composing primitives in space again
calls %or a pictorial representation+ perhaps augmented by a verbal e)planation. $n this style 5e de%ine the %ractal
;.no5%lake; by the %ollo5ing production rule+ 5hich 5e read as %ollo5s& A line segment+ as sho5n on the le%t#hand
side+ must be replaced by a polyline+ a chain o% %our shorter segments+ as sho5n at the right#hand side H")hibit <.1I.
:e start 5ith an initial con%iguration Hthe 6ero#generationI consisting o% a single segment H")hibit <.2I. $% 5e apply
the production rule 3ust once to every segment o% the current generation+ 5e obtain successively a %irst+ second+ and
third generation+ as sho5n in ")hibit <.3. /urther generations 0uickly e)haust the resolution o% a graphics screen or
the printed page+ so 5e stop dra5ing them. The curve obtained as the limit 5hen this process is continued
inde%initely is a fractal. Although 5e cannot dra5 it e)actly+ one can study it as a mathematical ob3ect and prove
theorems about it.
")hibit <.1& 2roduction %or replacing a straight#line segment by a polyline
")hibit <.2& The simplest initial con%iguration
")hibit <.3& The %irst three generations
The production rule dra5n above is the essence o% this %ractal+ and o% the se0uence o% pictures that lead up to it.
The initial con%iguration+ on the other hand+ is 0uite arbitrary& $% 5e had started 5ith a regular he)agon+ rather than
a single line segment+ the pictures obtained 5ould really have lived up to their name+ sno5%lake. Any other initial
con%iguration still generates curves 5ith the unmistakable pattern o% sno5%lakes+ as the reader is encouraged to
veri%y.
A%ter having %amiliari6ed ourselves 5ith the ob3ects described+ let us turn our attention to the method o%
description and raise three 0uestions about the %ormality and e)ecutability o% such notations.
1. $s our notation su%%iciently %ormal to serve as a program %or a computer to dra5 the %amily o% generations o%
sno5%lakes? Certainly not+ as 5e stated certain rules in collo0uial language and le%t others completely
unsaid+ implying them only by sample dra5ings. As an e)ample o% the latter+ consider the 0uestion& $% a
segment is to be replaced by a Eplain 5ith a mountain in the centerE+ on 5hich side o% the segment should
the peak point? The dra5ings above suggest that all peaks stick out on the same side o% the curve+ the
outside.
2. Could our method o% description be e)tended and %ormali6ed to serve as a programming language %or
%ractals? C% course. As an e)ample+ the production sho5n in ")hibit <.< speci%ies the side on 5hich the
peak is to point. "very segment no5 has a ] side and a U side. The production above states that the ne5
peak is to gro5 over the ] side o% the original segment and speci%ies the ] sides and U sides o% each o% the
%our ne5 segments. /or every other aspect that our description may have le%t unspeci%ied+ such as
placement on the screen+ some notation could readily be designed to speci%y every detail 5ith complete
rigor. $n R.ynta)S and R.ynta) analysisS 5e introduce some o% the basic techni0ues %or designing and using
%ormal notations.
")hibit <.<& 8e%ining the description to speci%y a Ele%t#rightE orientation.
3. .hould 5e %ormali6e this method o% description and turn it into a machine#e)ecutable notation? $t depends
on the purpose %or 5hich 5e plan to use it. C%ten in this book 5e present 3ust one or a %e5 e)amples that
share a common design. Cur goal is %or the reader to understand these %e5 e)amples+ not to practice the
design o% arti%icial programming languages. To avoid being sidetracked by a pedantic insistence on rigorous
notation+ 5ith its inevitable overhead o% introducing %ormalisms needed to de%ine all details+ 5e pre%er to
stop 5hen 5e have given enough in%ormation %or an attentive reader to grasp the main idea o% each
e)ample.
Ailbert;s space#%illing curve
.pace#%illing curves have been an ob3ect o% mathematical curiosity since the nineteenth century+ as they can be
used to prove that the cardinality o% an interval+ considered as a set o% points+ e0uals the cardinality o% a s0uare Hor
any other %inite t5o#dimensional regionI. The term space-filling describes the surprising %act that such a curve
visits every point 5ithin a s0uare. $n mathematics+ space#%illing curves are constructed as the limit to 5hich an
in%inite se0uence o% curves C
i
converges. Cn a discreti6ed plane+ such as a raster#scanned screen+ no limiting
process is needed+ and typically one o% the %irst do6en curves in the se0uence already paints every pi)el+ so the term
space-filling is 0uickly seen to be appropriate.
Let us illustrate this phenomenon using Ailbert;s space#%illing curve H(avid Ailbert+ 1D>2U19<3I+ 5hose %irst si)
appro)imations are sho5n in ")hibit <.@. As the pictures suggest+ Ailbert curves are best#described recursively+ but
the composition rule is more complicated than the one %or sno5%lakes. :e propose the t5o productions sho5n in
")hibit <.> to capture the essence o% Ailbert Hand similarI curves. This pictorial program re0uires e)planation+ but
5e hope the reader 5ho has once understood it 5ill %ind this notation use%ul %or inventing %ractals o% her o5n. As
al5ays+ a production is read& ETo obtain an instance o% the le%t#handside+ get instances o% all the things listed on the
right#handsideE+ or e0uivalently+ Eto do the task speci%ied by the le%t#hand side+ do all the tasks listed on the right#
hand sideE.
37
")hibit <.@& .i) generations o% the %amily o% Ailbert curves
")hibit <.>& 2roductions %or painting a s0uare in terms o% its 0uadrants
The le%t#hand side o% the %irst production stands %or the task& paint a s0uare o% given si6e+ assuming that you
enter at the lo5er le%t corner %acing in the direction indicated by the arro5 and must leave in the upper le%t corner+
again %acing in the direction indicated by that arro5. :e assume turtle graphics primitives+ 5here the state o% the
brush is given by a position and a direction. The hatching indicates the area to be painted. $t lies to the right o% the
line that connects entry and e)it corners+ 5hich 5e read as Epaint 5ith your right handE+ and the hatching is in thick
strokes. The le%t#hand side o% the second production is similar& 2aint a s0uare E5ith your le%t handE Hhatching is in
thin strokesI+ entering and e)iting as indicated by the arro5s.
The right#hand sides o% the productions are no5 easily e)plained. They say that in order to paint a s0uare you
must paint each o% its 0uadrants+ in the order indicated. They give e)plicit instructions on 5here to enter and e)it+
5hat direction to %ace+ and 5hether you are painting 5ith your right or le%t hand. The last detail is to make sure that
5hen the brush e)its %rom one 0uadrant it gets into the correct state %or entering the ne)t. This re0uires the brush
to turn by 90^+ either le%t or right+ as the curved arro5s in the pictures indicate. $n the continuous plane 5e imagine
the brush to Eturn on its heelsE+ 5hereas on a discrete grid it also moves to the %irst grid point o% the ad3acent
0uadrant.
These productions omit any rule %or termination+ thus simulating the limiting process o% true space#%illing
curves. To dra5 anything on the screen 5e need to add some termination rules that speci%y t5o things& H1I 5hen to
invoke the termination rule He.g. at some %i)ed depth o% recursionI+ and H2I ho5 to paint the s0uare that invokes the
termination rule He.g. paint it all blackI. As 5as the case 5ith sno5%lakes and 5ith all %ractals+ the primitive pictures
are much less important than the composition rule+ so 5e omit it.
The %ollo5ing program implements a speci%ic version o% the t5o pictorial productions sho5n above. The
procedure ;:alk; implements the curved arro5s in the productions& the brush turns by ;hal%Turn;+ takes a step o%
length s+ and turns again by ;hal%Turn;. The parameter ;hal%Turn; is introduced to sho5 the e%%ect o% cumulative
small errors in recursive procedures. ;hal%Turn V <@; causes the brush to make right#angle turns and yields Ailbert
curves. The reader is encouraged to e)periment 5ith ;hal%Turn V <3+ <<+ <>+ <7;+ and other values.
program 6aintOnd?alk&
const pi . %."#"(9& s . %& { step si)e of walk }
var turtleEeading- real& { counterclockwise, radians }
halfTurn, depth- integer& { recursive depth of painting }
procedure TurtleTurn*angle- real+&
{ turn the turtle angle degrees counterclockwise }
egin { angle is converted to radian before adding }
turtleEeading -. turtleEeading 9 angle 1 pi P "20.0
end& { ,urtle,urn }
procedure Turtle8ine*dist- real+&
{ draws a straight line, dist units long }
egin
8ine*round*dist 1 cos*turtleEeading++, round*0dist1sin*turtle
Eeading+++
end& { ,urtle0ine }
procedure ?alk *halfTurn- integer+&
egin TurtleTurn*halfTurn+& Turtle8ine*s+& TurtleTurn*halfTurn+
end&
procedure Qpaint *level- integer& halfTurn- integer+&
egin
if level . 0 then
TurtleTurn*$ 1 halfTurn+
else egin
Qpaint*level 0 ", 0halfTurn+&
?alk*halfTurn+&
Qpaint*level 0 ", halfTurn+&
?alk*0halfTurn+&
Qpaint*level 0 ", halfTurn+&
?alk*halfTurn+&
Qpaint*level 0 ", 0halfTurn+
end
end& { ;paint }
egin { 3aint<nd2alk }
39
ShowText& Show@rawing&
MoveTo*"00, "00+& turtleEeading -. 0& { initiali)e turtle
state }
?rite8n*,7nter halfTurn 0 .. %(9 *#( for Eilert curves+- ,+&
5ead8n*halfTurn+&
TurtleTurn*0halfTurn+& { init turtle turning angle }
?rite*,7nter depth " .. '- ,+& 5ead8n*depth+&
Qpaint*depth, halfTurn+
end. { 3aint<nd2alk }
As a summary o% this discourse on notation+ 5e point to the %act that an e)ecutable program necessarily has to
speci%y many details that are irrelevant %rom the point o% vie5 o% human understanding. This book assumes that the
reader has learned the basic steps o% programming+ o% thinking up such details+ and being able to e)press them
%ormally in a programming language. Compare the verbosity o% the one#page program above 5ith the clarity and
conciseness o% the t5o pictorial productions above. The latter state the essentials o% the recursive construction+ and
no more+ in a manner that a human can understand Eat a glanceE. :e aim our notation to appeal to a human mind+
not necessarily to a computer+ and choose our notation accordingly.
!ascal and its dialects# lingua franca of computer science
/ingua franca H1>19I&
1. A common language that consists o% $talian mi)ed 5ith /rench+ .panish+ ,reek and Arabic and is spoken
in 'editerranean ports
2. Any o% various languages used as common or commercial tongues among peoples o% diverse speech
3. .omething resembling a common language
H/rom Webster%s 0ollegiate Dictionar1
2ascal as representative o% today;s programming languages
The de%inition above %its 2ascal 5ell& $n the mainstream o% the development o% programming languages %or a
couple o% decades+ 2ascal embodies+ in a simple design+ some o% the most important language %eatures that became
commonly accepted in the 1970s. This simplicity+ combined 5ith 2ascal;s pre%erence %or language %eatures that are
no5 5ell understood+ makes 2ascal a 5idely understood programming notation. A %e5 highlights in the
development o% programming languages may e)plain ho5 2ascal got to be a lingua %ranca o% computer science.
/ortran emerged in 19@< as the %irst high#level programming language to gain acceptance and became the
programming language o% the 19@0s and early 19>0s. $ts appearance generated great activity in language design+
and suddenly+ around 19>0+ do6ens o% programming languages emerged. Three among these+ Algol >0+ CCBCL+ and
Lisp+ became milestones in the development o% programming languages+ each in its o5n 5ay. :hereas CCBCL
became the most 5idely used language o% the 19>0s and 1970s+ and Lisp perhaps the most innovative+ Algol >0
became the most in%luential in several respects& it set ne5 standards o% rigor %or the de%inition and description o% a
language+ it pioneered hierarchical block structure as the ma3or techni0ue %or organi6ing large programs+ and
through these ma3or technical contributions became the %irst o% a %amily o% mainstream programming languages
that includes 2LQ1+ Algol >D+ 2ascal+ 'odula#2+ and Ada.
The decade o% the 19>0s remained one o% great %erment and productivity in the %ield o% programming languages.
2LQ1 and Algol >D+ t5o ambitious pro3ects that attempted to integrate many recent advances in programming
language technology and theory+ captured the lion;s share o% attention %or several years. 2ascal+ a much smaller
Algorithms and Data Structures <0 A ,lobal Te)t
pro3ect and language designed by !iklaus :irth during the 19>0s+ ended up eclipsing both o% these ma3or e%%orts.
2ascal took the best o% Algol >0+ in streamlined %orm+ and added 3ust one ma3or e)tension+ the then novel type
de%initions NAoa 72O. This light5eight edi%ice made it possible to implement e%%icient 2ascal compilers on the
microcomputers that mushroomed during the mid 1970s He.g. *C.( 2ascalI+ 5hich opened the doors to
universities and high schools. Thus 2ascal became the programming language most 5idely used in introductory
computer science education+ and every computer science student must be %luent in it.
Because 2ascal is so 5idely understood+ 5e base our programming notation on it but do not adhere to it
slavishly. 2ascal is more than 20 years old+ and many o% its key ideas are 30 years old. :ith today;s insights into
programming languages+ many details 5ould probably be chosen di%%erently. $ndeed+ there are many EdialectsE o%
2ascal+ 5hich typically e)tend the standard de%ined in 19>9 N:ir 71O in di%%erent directions. Cne e)tension relevant
%or a publication language is that 5ith today;s hard5are that supports large character sets and many di%%erent %onts
and styles+ a greater variety o% symbols can be used to make the source more readable. The %ollo5ing e)amples
introduce some o% the conventions that 5e use o%ten.
E.yntactic sugarE& the look o% programming notations
2ascal statements lack an e)plicit terminator. This makes the %re0uent use o% begin#end brackets necessary+ as in
the %ollo5ing program %ragment+ 5hich implements the insertion sort algorithm Hsee chapter 17 and the section
E.imple sorting algorithms that 5ork in timeEIL U_ denotes a constant ` any key value&
OF0G -. 0R&
for i -. $ to n do egin
/ -. i&
while OF/G S OF/ 0 "G do
egin t -. OF/G& OF/G -. OF/ 0 "G& OF/ 0 "G -. t& / -. / 0 " end&
end&
:e aim at brevity and readability but 5ish to retain the %lavor o% 2ascal to the e)tent that any ne5 notation 5e
introduce can be translated routinely into standard 2ascal. Thus 5e 5rite the statements above as %ollo5s&
OF0G -. 0R&
/ -. i& { comments appear in italics }
while OF/G S OF/ 0 "G do { OF/G -.- OF/ 0 "G& / -. / 0 " }
{ braces serve as general-purpose brackets, including begin-end }
{ =!= denotes the exchange operator }
end&
Borro5ing heavily %rom standard mathematical notation+ 5e use conventional mathematical signs to denote
operators 5hose 2ascal designation 5as constrained by the small character sets typical o% the early days+ such as&
T J I T U V WxW instead of
<> <= >= <> not and or in not in 1 9 0 as*x+ respectively
?e also use signs that may have no direct counterpart in 6ascal, such
as-
SetMtheoretic relations
!nfinity, often used for a

XsentinelX *i.e. a numer
larger than all numers to
e processed in a given
<1
application+
6lusMorMminus, used to
define an interval Fof
uncertaintyG
Sum and product

x <eiling of a real numer x
*i.e. the smallest integer
I x+
x 3loor of a real numer x *i.e. the
largest integer J x+
SAuare root
log 8ogarithm to the ase $
ln Datural logarithm, to the ase e
iff !f and only if
Although 5e may take a cavalier attitude to5ard notational di%%erences+ and readily use concise notations such
as %or the more verbose ;and;+ ;or;+ 5e 5ill try to remind readers e)plicitly about our assumptions 5hen there is a
0uestion about semantics. As an e)ample+ 5e assume that the boolean operators and are conditional+ also called
;cand; and ;cor;& An e)pression containing these operators is evaluated %rom le%t to right+ and the evaluation stops as
soon as the result is kno5n. $n the e)pression ) y+ %or e)ample+ ) is evaluated %irst. $% ) evaluates to ;%alse;+ the
entire e)pression is ;%alse; 5ithout y ever being evaluated. This convention makes it possible to leave y unde%ined
5hen ) is ;%alse;. Cnly i% ) evaluates to ;true; do 5e proceed to evaluate y. An analogous convention applies to ) y.
2rogram structure
:hereas the concise notations introduced above to denote operators can be translated almost one#to#one into a
single line o% standard 2ascal+ 5e also introduce a %e5 e)tensions that may a%%ect the program structure. $n our vie5
these changes make programs more elegant and easier to understand. Borro5ing %rom many modern languages+ 5e
introduce a ;returnHI; statement to e)it %rom procedures and %unctions and to return the value computed by a
%unction.
")ample
function gcd*u, v- integer+- integer&
{ computes the greatest common divisor (gcd of u and v }
egin if v . 0 then return*u+ else return*gcd*v, u mod v++
end&
$n this e)ample+ ;returnHI; merely replaces the 2ascal assignments ;gcd &V u; and ;gcd &V gcdHv+ u mod vI;. The
latter in particular illustrates ho5 ;returnHI; avoids a notational blemish in 2ascal& Cn the le%t o% the second
assignment+ ;gcd; denotes a variable+ on the right a %unction. ;8eturnHI; also has the more drastic conse0uence that it
causes control to e)it %rom the surrounding procedure or %unction as soon as it is e)ecuted. :ithout entering into a
controversy over the general advantages and disadvantages o% this E%lo5 o% controlE mechanism+ let us present one
e)ample+ typical o% many search procedures+ 5here ;returnHI; greatly simpli%ies coding. The point is that a search
routine terminates in one o% Hat leastI t5o di%%erent 5ays& success%ully+ by having %ound the item in 0uestion+ or
unsuccess%ully+ because o% a number o% reasons Hthe item is not present+ and some inde) is about to %all outside the
range o% a tableL 5e cannot insert an item because the table is %ull+ or 5e cannot pop a stack because it is empty+
etc.I. /or the sake o% e%%iciency as 5ell as readability 5e pre%er to e)it %rom the routine as soon as a case has been
identi%ied and dealt 5ith+ as the %ollo5ing e)ample %rom RAddress computation&S illustrates&
function insertMintoMhashMtale*x- key+- addr&
var a- addr&
egin
a -. h*x+& { locate the home address of the item x to be
inserted }
while TFaG T empty do egin
{ skipping over cells that are already occupied }
if TFaG . x then return*a+& { x is already present; return
its address }
a -. *a 9 "+ mod m { keep searching at the next address }
end&
{ we-ve found an empty cell; see if there is room for x to be
inserted }
if n S m 0 " then { n -. n 9 "& TFaG -. x } else errM
msg*,tale is full,+&
return*a+ { return the address where x was inserted }
end&
This code can only be appreciated by comparing it 5ith alternatives that avoid the use o% ;returnHI;. :e
encourage readers to try their hands at this challenge. !otice the three di%%erent 5ays this procedure can terminate&
H1I no need to insert ) because ) is already in the table+ H2I impossible to insert ) because the table is %ull+ and H3I
the normal case 5hen ) is inserted. .tandard 2ascal incorporates no %acilities %or Ee)ception handlingE He.g. to
cover the %irst t5o cases that should occur only rarelyI and %orces all three outcomes to e)it the procedure at its
te)tual end.
Let us 3ust mention a %e5 other liberties that 5e may take. :hereas 2ascal limits results o% %unctions to certain
simple types+ 5e 5ill let them be o% an type& in particular+ structured types+ such as records and arrays. 8ather
than nesting i%#then#else statements in order to discriminate among more than t5o mutually e)clusive cases+ 5e use
the E%latE and more legible control structure&
if ;
"
then S
"
elsif ;
$
then S
$
elsif C else S
n
&
Cur sample programs do not return dynamically allocated storage e)plicitly. They rely on a memory
management system that retrieves %ree storage through Egarbage collectionE. 'any implementations o% 2ascal avoid
garbage collection and instead provide a procedure ;disposeH[I; %or the programmer to e)plicitly return unneeded
cells. $% you 5ork 5ith such a version o% 2ascal and 5rite list#processing programs that use signi%icant amounts o%
memory+ you must insert calls to ;disposeH[I; in appropriate places in your programs.
The list above is not intended to be e)haustive+ and neither do 5e argue that the constructs 5e use are
necessarily superior to others commonly available. Cur reason %or e)tending the notation o% 2ascal Hor any other
programming language 5e might have chosen as a starting pointI is the %ollo5ing& in addressing human readers+ 5e
believe an open#ended+ some5hat in%ormal notation is pre%erable to the straight3acket o% any one programming
language. The latter becomes necessary i% and 5hen 5e e)ecute a program+ but during the incubation period 5hen
<3
our understanding slo5ly gro5s to5ard a %irm grasp o% an idea+ supporting intuition is much more important than
%ormality. Thus 5e describe data structures and algorithms 5ith the help o% %igures+ 5ords+ and programs as 5e see
%it in any particular instance.
2rogramming pro3ect
1. *se your graphics %rame program o% R,raphics primitives and environmentsS to implement an editor %or
simple graphics productions such as those used to de%ine sno5%lakes He.g. ;any line segment gets replaced
by a speci%ied se0uence o% line segments;I+ and an interpreter that dra5s successive generations o% the
%ractals de%ined by these productions.
Algorithms and Data Structures << A ,lobal Te)t
4& Divide)and)con1uer and
recursion
The algorithmic principle o% divide#and#con0uer leads directly to recursive procedures.
")amples& 'erge sort+ tree traversal. 8ecursion and iteration.
'y %riend liked to claim E$;m 2Q3 Cherokee.E *ntil someone 5ould challenge him ET5o# thirds? Xou mean
1Q2 + or+ or maybe 3QD+ ho5 on earth can you be 2Q3 o% anything?E E$t;s easy+E said im+ Eboth my parents are
2Q3.E
An algorithmic principle
Let AH(I denote the application o% an algorithm A to a set o% data (+ producing a result 8. An important class o%
algorithms+ o% a type called divide#and#con0uer+ processes data in t5o distinct 5ays+ according to 5hether the data
is small or large&
$% the set ( is small+ andQor o% simple structure+ 5e invoke a simple algorithm A0 5hose application A0H(I
yields 8.
$% the set ( is large+ andQor o% comple) structure+ 5e partition it into smaller subsets (1+ [ + (k. /or each i+
apply AH(iI to yield a result 8i. Combine the results 81+ [ + 8k to yield 8.
This algorithmic principle o% divide#and#con0uer leads naturally to the notion o% recursive procedures. The
%ollo5ing e)ample outlines the concept in a high#level notation+ highlighting the role o% parameters and local
variables.
procedure O*@- data& var 5- result+&
var @
"
, C , @
k
- data& 5
"
, C , 5
k
- result&
egin
if simple*@+ then 5 -. O
0
*@+
else { @
"
, C , @
k
-. partition*@+&
5
"
-. O*@
"
+& C & 5
k
-. O*@
k
+&
5 -. comine*5
"
, C , 5
k
+ }
end&
!otice ho5 an initial data set ( spa5ns sets (1+ [ + (k 5hich+ in turn+ spa5n children o% their o5n. Thus the
collection o% all data sets generated by the partitioning scheme is a tree 5ith root (. $n order %or the recursive
procedure AH(I to terminate in all cases+ the partitioning %unction must meet the %ollo5ing condition& "ach branch
o% the partitioning tree+ starting %rom the root (+ eventually terminates 5ith a data set (0 that satis%ies the predicate
;simpleH(0I;+ to 5hich 5e can apply the algorithm.
(ivide#and#con0uer reduces a problem on data set ( to k instances o% the same problem on ne5 sets (1+ [ + (k
that are EsimplerE than the original set (. .impler o%ten means Ehas %e5er elementsE+ but any measure o%
Algorithms and Data Structures <@ A ,lobal Te)t
5. ivide!and!con"uer and recursion
EsimplicityE that monotonically heads %or the predicate ;simple; 5ill do+ 5hen algorithm A0 5ill %inish the 3ob. E( is
simpleE may mean E( has no elementsE+ in 5hich case A0 may have to do nothing at allL or it may mean E( has
e)actly one elementE+ and A0 may 3ust mark this element as having been visited.
The %ollo5ing sections sho5 e)amples o% divide#and#con0uer algorithms. As 5e 5ill see+ the actual 5orkload is
sometimes distributed une0ually among di%%erent parts o% the algorithm. $n the sorting e)ample+ the step
;8&VcombineH81+ [ + 8kI; re0uires most o% the 5orkL in the ETo5er o% AanoiE problem+ the application o% algorithm
A0 takes the most e%%ort.
Divide)and)con1uer e/pressed as a diagram# merge sort
.uppose that 5e 5ish to sort a se0uence o% names alphabetically+ as sho5n in ")hibit @.1. :e make use o% the
divide#and#con0uer strategy by partitioning a ElargeE se0uence ( into t5o subse0uences (1 and (2+ sorting each
subse0uence+ and then merging them back together into sorted order. This is our algorithm AH(I. $% ( contains at
most one element+ 5e do nothing at all. A0 is the identity algorithm+ A0H(I V (.
")hibit @.1& .orting the se0uence a4+ A+ .+ (b by using a divide#and#con0uer scheme
procedure sort*var @- seAuence+&
var @
"
, @
$
- seAuence&
function comine*@
"
, @
$
- seAuence+- seAuence&
egin { combine }
merge the two sorted seAuences @
"
and @
$
into a single sorted seAuence @,&
return*@,+
end& { combine }
egin { sort}
if W@W K " then { split @ into two seAuences @
"
and @
$
of
eAual siLe&
sort*@
"
+& sort*@
$
+& @ -. comine*@
"
, @
$
+ }
{ if >?> 4 &, ? is trivially sorted, do nothing }
end& { sort }
<>
$n the chapter on Rsorting and its comple)ityS+ under the section Rmerging and merge sortsS 5e turn this divide#
and#con0uer scheme into a program.
'ecursively defined trees
A tree+ more precisely+ a rooted+ ordered tree+ is a data type used primarily to model any type o% hierarchical
organi6ation. $ts primitive parts are nodes and leaves. $t has a distinguished node called the root+ 5hich+ in
violation o% nature+ is typically dra5n at the top o% the page+ 5ith the tree gro5ing do5n5ard. "ach node has a
certain number o% children+ either leaves or nodesL leaves have no children. The e)act de%inition o% such trees can
di%%er slightly 5ith respect to details and terminology. :e may de%ine a binar tree+ %or e)ample+ by the condition
that each node has either e)actly+ or at most+ t5o children.
The pictorial grammar sho5n in ")hibit @.2 captures this recursive de%inition o% ;binary tree; and %i)es the
details le%t unspeci%ied by the verbal description above. $t uses an alphabet o% three symbols& the nonterminal ;tree
symbol;+ 5hich is also the start symbolL and t5o terminal symbols+ %or ;node; and %or ;lea%;.
")hibit @.2& The three symbols o% the alphabet o% a tree grammar
There are t5o production or re5riting rules+ p1 and p2 H")hibit @.3I. The derivation sho5n in ")hibit @.<
illustrates the application o% the production rules to generate a tree %rom the nonterminal start symbol.
")hibit @.3& 8ule p
1
generates a lea%+ rule p
2
generates a node and t5o ne5 trees
")hibit @.<& Cne 5ay to derive the tree at right
:e may make the production rules more detailed by e)plicitly naming the coordinates associated 5ith each
symbol. Cn a display device such as a computer screen+ the )# and y#values o% a point are typically Cartesian
coordinates 5ith the origin in the upper#le%t corner. The )#values increase to5ard the bottom and the y#values
increase to5ard the right o% the display. Let H)+ yI denote the screen position associated 5ith a particular symbol+
and let d denote the depth o% a node in the tree. The root has depth 0+ and the children o% a node 5ith depth d have
depth d]1. The di%%erent levels o% the tree are separated by some constant distance s. The separation bet5een
siblings is determined by a Hrapidly decreasingI %unction tHdI 5hich takes as argument the depth o% the siblings and
depends on the dra5ing si6e o% the symbols and the resolution o% the screen. These more detailed productions are
sho5n in ")hibit @.@.
")hibit @.@& Adding coordinate in%ormation to productions in order to control graphic layout
The translation o% these t5o rules into high#level code is no5 plain&
procedure p
"
*x, y- coordinate+&
egin
eraseTreeSymol*x, y+&
draw8eafSymol*x, y+
end&
procedure p
$
*x, y- coordinate& d- level+&
egin
eraseTreeSymol*x, y+&
drawDodeSymol*x, y+&
drawTreeSymol*x 9 s, y 0 t*d 9 "++&
drawTreeSymol*x 9 s, y 9 t*d 9 "++
end&
$% 5e choose t(d) = c 2
d
+ these t5o procedures produce the display sho5n in ")hibit @.> o% the tree generated
in ")hibit @.<.
")hibit @.>& .ample layout obtained by halving hori6ontal displacement at each successive level
(echnical remark about the details of defining binar trees: Cur grammar %orces every node to have e)actly t5o
children& A child may be a node or a lea%. This lets us subsume t5o %re0uently occurring classes o% binary trees
under one common de%inition.
1. 2-3 4binar1 trees' :e may identi%y leaves and nodes+ making no distinction bet5een them Hreplace the
s0uares by circles in ")hibit @.3 and ")hibit @.<I. "very node in the ne5 tree no5 has either 6ero or t5o
children+ but not one. The smallest tree has a single node+ the root.
2. 4Arbitrar1 5inar trees' $gnore the leaves Hdrop the s0uares in ")hibit @.3 and ")hibit @.< and the
branches leading into a s0uareI. "very node in the ne5 tree no5 has either 6ero+ one+ or t5o children. The
smallest tree H5hich consisted o% a single lea%I no5 has no node at allL it is empty.
/or clarity;s sake+ the %ollo5ing e)amples use the terminology o% nodes and leaves introduced in the de%ining
grammar. $n some instances 5e point out 5hat happens under the interpretation that leaves are dropped.
<D
'ecursive tree traversal
8ecursion is a po5er%ul tool %or programming divide#and#con0uer algorithms in a straight%or5ard manner. $n
particular+ 5hen the data to be processed is de%ined recursively+ a recursive processing algorithm that mirrors the
structure o% the data is most natural. The recursive tree traversal procedure belo5 illustrates this point.
(raversing a tree Hin general& a graph+ a data structureI means visiting every node and every lea% in an orderly
se0uence+ beginning and ending at the root. :hat needs to be done at each node and each lea% is o% no concern to
the traversal algorithm+ so 5e merely designate that by a call to a ;procedure visitH I;. Xou may think o% inspecting
the contents o% all nodes andQor leaves+ and 5riting them to a %ile.
8ecursive tree traversals use divide#and#con0uer to decompose a tree into its subtrees& At each node visited
along the 5ay+ the t5o subtrees L and 8 to the le%t and right o% this node must be traversed. There are three natural
5ays to se0uence the node visit and the subtree traversals&
". node& 8& 5 { preorder, or prefix }
$. 8& node& 5 { inorder or infix }
%. 8& 5& node { postorder or suffix }
The %ollo5ing e)ample translates this traversal algorithm into a recursive procedure&
procedure traverse*T- tree+&
{ preorder, inorder, or postorder traversal of tree , with
leaves }
egin
if leaf*T+ then visitleaf*T+
else { , is composite }
{ visit
"
*root*T++&
traverse*leftsutree*T++&
visit
$
*root*T++&
traverse*rightsutree*T+&
visit
%
*root*T++ }
end&
:hen leaves are ignored Hi.e. a tree consisting o% a single lea% is considered to be emptyI+ the procedure body
becomes slightly simpler&
if not empty*T+ then { C }
To accomplish the k#th traversal scheme Hk V 1+ 2+ 3I+ ;visit
k
; per%orms the desired operation on the node+ 5hile
the other t5o visits do nothing. $% all three visits print out the name o% the node+ 5e obtain a se0uence o% node
names called ;triple tree traversal;+ sho5n in ")hibit @.7 along 5ith the three traversal orders o% 5hich it is
composed. (uring the traversal the nodes are visited in the %ollo5ing se0uence&
")hibit @.7& Three standard orders merged into a triple tree traversal
'ecursion versus iteration# the To*er of Hanoi
The ETo5er o% AanoiE is a stack o% n disks o% di%%erent si6es+ held in place by a tall peg H")hibit @.DI. The task is to
trans%er the to5er %rom source peg . to a target peg T via an intermediate peg $+ one disk at a time+ 5ithout ever
placing a larger disk on a smaller one. $n this case the data set ( is a to5er o% n disks+ and the divide#and#con0uer
algorithm A partitions ( asymmetrically into a small Eto5erE consisting o% a single disk Hthe largest+ at the bottom
o% the pileI and another to5er (; Husually larger+ but conceivably emptyI consisting o% the n U 1 topmost disks. The
pu66le is solved recursively in three steps&
". Transfer @, to the intermediate peg !.
$. Move the largest disk to the target peg T.
%. Transfer @, on top of the largest disk at the target peg T.
")hibit @.D& $nitial con%iguration o% the To5er o% Aanoi.
.tep 1 deserves more e)planation. Ao5 do 5e trans%er the n U 1 topmost disks %rom one peg to another? !otice
that they themselves constitute a to5er+ to 5hich 5e may apply the same three#step algorithm. Thus 5e are
presented 5ith successively simpler problems to solve+ namely+ trans%erring the n U 1 topmost disks %rom one peg to
another+ %or decreasing n+ until %inally+ %or n V 0+ 5e do nothing.
procedure Eanoi*n- integer& x, y, L- peg+&
{ transfer a tower with n disks from peg x, via y, to ) }
egin
if n K 0 then { Eanoi*n 0 ", x, L, y+& move*x, L+& Eanoi*n 0
", y, x, L+ }
end&
8ecursion has the advantage o% intuitive clarity. "legant and e%%icient as this solution may be+ there is some
comple)ity hidden in the bookkeeping implied by recursion.
@0
The %ollo5ing procedure is an e0ually elegant and more e%%icient iterative solution to this problem. $t assumes
that the pegs are cyclically ordered+ and the target peg 5here the disks 5ill %irst come to rest depends on this order
and on the parity o% n H")hibit @.9I. /or odd values o% n+ ;$terativeAanoi; moves the to5er to peg $+ %or even values o%
n+ to peg T.
")hibit @.9& Cyclic order o% the pegs.
procedure !terativeEanoi*n- integer+&
var odd- oolean& { odd represents the parity of the move }
egin
odd -. true&
repeat
case odd of
true- transfer smallest disk cyclically to next peg&
false- make the only legal move leaving the smallest in place
end&
odd -. not odd
until entire tower is on target peg
end&
")ercise& recursive or iterative pictures?
Chapter < presented some beauti%ul e)amples o% recursive pictures+ 5hich 5ould be hard to program 5ithout
recursion. But %or simple recursive pictures iteration is 3ust as natural. .peci%y a convenient set o% graphics
primitives and use them to 5rite an iterative procedure to dra5 ")hibit @.10 to a nesting depth given by a
parameter d.
")hibit @.10& $nterleaved circles and e0uilateral triangles cause the radius to be e)actly halved at each step.
.olution
There are many choices o% suitable primitives and many 5ays to program these pictures. .peci%ying an
e0uilateral triangle by its center and the radius o% its circumscribed circle simpli%ies the notation. Assume that 5e
may use the procedures&
procedure circle*x, y, r- real+& { coordinates of center and
radius }
procedure eAuitr*x, y, r- real+& { center and radius of
circumscribed circle}
Algorithms and Data Structures @1 A ,lobal Te)t
procedure citr*x, y, r- real& d- integer+&
var vr- real& { variable radius }
i- integer&
egin
vr -. r&
for i -. " to d do = eAuitr*x, y, vr+& vr -. vrP$& circle*x, y,
vr+ >
{ show that the radius of consecutively nested circles gets
exactly halved at each step }
end&
The flag of Alfanumerica# an algorithmic novel on iteration and recursion
$n the process o% automating its %lag industry+ the *nited .tates o% Al%anumerica announced a competition %or
the most elegant program to print its %lag&
All solutions submitted to the pri6e committee %ell into one o% t5o classes+ the iterative and recursive programs.
The proponents o% these t5o algorithm design principles could not agree on a 5inner+ and the selection process
sparked a civil 5ar that split the nation into t5o& the $terative .tates o% Al%anumerica H$.AI and the 8ecursive .tates
o% Al%anumerica H8.AI. Both nations %ly the same %lag but use entirely di%%erent production algorithms.
1. :rite a
procedure !SO*k- integer+&
to print the $.A %lag+ using an iterative algorithm+ o% course. Assume that k is a po5er o% 2 and k ` Hhal% the
line length o% the printerI.
2. ")plain 5hy the printer industry in 8.A is much more innovative than the one in $.A. All modern 8.A
printers include operations %or positioning the 5riting head any5here 5ithin a line+ and line %eed 5orks
both %or5ard and back5ard.
3. .peci%y the precise operations %or some 8.A printer o% your design. *sing these operations+ 5rite a
recursive
procedure 5SO*k- integer+&
to print the 8.A %lag.
<. ")plain an un%oreseen conse0uence o% this drive to automate the %lag industry o% Al%anumerica& $n both $.A
and 8.A+ a gro5ing number o% %lags can be seen %luttering in the bree6e turned around by 90^.
")ercises
1. :hereas divide#and#con0uer algorithms usually attempt to divide the data in e0ual halves+ the recursive
To5er o% Aanoi procedure presented in the section ;8ecursion versus iteration& The To5er o% AanoiE
divides the data in a very asymmetric manner& a single disk versus n U 1 disks. :hy?
2. 2rove by induction on n that the iterative program ;$terativeAanoi; solves the problem in 2
n
U1 iterations.
@2
****************
******** ********
**** **** **** ****
** ** ** ** ** ** ** **
* * * * * * * * * * * * * * * *
k blanks followed by k stars
twice (k/2 blanks followed by k/2 stars)
continue doubling and halving

down to runs length of 1.
5& Synta/
synta) and semantics
synta) diagrams and "B!/ describe conte)t#%ree grammars
terminal and nonterminal symbols
productions
de%inition o% "B!/ by itsel%
parse tree
grammars must avoid ambiguities
in%i)+ pre%i)+ and post%i) notation %or arithmetic e)pressions
pre%i) and post%i) notation do not need parentheses
Synta/ and semantics
Computer science has borro5ed some important concepts %rom the study o% natural languages He.g. the notions
o% synta) and semanticsI. Snta* rules prescribe ho5 the sentences o% a language are %ormed+ independently o%
their meaning. Semantics deals 5ith their meaning. The t5o sentences EThe child dra5s the horseE and EThe horse
dra5s the childE are both syntactically correct according to the accepted rules o% grammar. The %irst sentence clearly
makes sense+ 5hereas the second sentence is ba%%ling& perhaps senseless Hi% Edra5E means Edra5ing a pictureEI+
perhaps meaning%ul Hi% Edra5E means EpullEI. .emantic aspectsY5hether a sentence is meaning%ul or not+ and i% so+
5hat it meansYare much more di%%icult to %ormali6e and decide than syntactic issues.
Ao5ever+ the analogy bet5een natural languages and programming languages does not go very %ar. The choice
o% "nglish 5ords and phrases such as EbeginE+ EendE+ EgotoE+ Ei%#then#elseE lends a programming language a
super%icial similarity to natural language+ but no more. The possibility o% verbal encoding o% mathematical %ormulas
into pseudo#"nglish has deliberately been built into CCBCLL %or e)ample+ Ecompute velocity times time giving
distanceE is nothing but sntactic sugar %or Edistance &V velocity K timeE. 'uch more important is the distinction
that natural languages are not rigorously de%ined Hneither the vocabulary+ nor the synta)+ and certainly not the
semanticsI+ 5hereas programming languages should be de%ined according to a rigorous %ormalism. 2rogramming
languages are much closer to the formal notations o% mathematics than to natural languages+ and programming
notation 5ould be a more accurate term.
The le*ical part o% a modern programming language Nthe alphabet+ the set o% reserved 5ords+ the construction
rules %or the identi%iers Hi.e. the e0uivalent to the vocabulary o% a natural languageI and the snta* are usually
de%ined %ormally. Ao5ever+ system#dependent di%%erences are not al5ays described precisely. The compiler o%ten
determines in detail the syntactic correctness o% a program 5ith respect to a certain system Hcomputer and
operating systemI. The semantics o% a programming language could also be de%ined %ormally+ but this is rarely
done+ because %ormal semantic de%initions are e)tensive and di%%icult to read.
#. $%nta&
The synta) o% a programming language is not as important as the semantics+ but good understanding o% the
synta) o%ten helps in understanding the language. :ith some practice one can o%ten guess the semantics %rom the
synta)+ since the synta) o% a 5ell#designed programming language is the %rame that supports the semantics.
Grammars and their representation# synta/ diagrams and 67N8
The synta) o% modern programming languages is de%ined by grammars. These are mostly o% a type called
conte*t-free grammars+ or close variants thereo%+ and can be given in di%%erent notations. 5ackus-6aur form
456!1+ a milestone in the development o% programming languages+ 5as introduced in 19>0 to de%ine the synta) o%
Algol. $t is the basis %or other notations used today+ such as E56! 4e*tended 56!1 and graphical representations
such as snta* diagrams. "B!/ and synta) diagrams are syntactic notations that describe e)actly the conte*t-free
grammars o% %ormal language theory.
8ecursion is a central theme o% all these notations& the syntactic correctness and structure o% a large program
te)t are reduced to the syntactic correctness and structure o% its te)tual components. Cther common notions
include& terminal symbol+ nonterminal symbol+ and productions or rewriting rules that describe ho5 nonterminal
symbols generate strings o% symbols.
The set o% terminal symbols %orms the alphabet o% a language+ the symbols %rom 5hich the sentences are built. $n
"B!/ a terminal symbol is enclosed in single 0uotation marksL in synta) diagrams a terminal symbol is represented
by 5riting it in an oval&
!onterminal symbols represent syntactic entities& statements+ declarations+ or e)pressions. "ach nonterminal
symbol is given a name consisting o% a se0uence o% letters and digits+ 5here the %irst character must be a letter. $n
synta) diagrams a nonterminal symbol is represented by 5riting its name in a rectangular bo)&
$% a construct consists o% the catenation o% constructs A and B+ this is e)pressed by
$% a construct consists o% either A or B+ this is denoted by
$% a construct may be either construct A or nothing+ this is e)pressed by
$% a construct consists o% the catenation o% any number o% A;s Hincluding noneI+ this is denoted by
$n "B!/ parentheses may be used to group entities Ne.g. H A \ B IO.
@<
/or each nonterminal symbol there must be at least one production that describes ho5 this syntactic entity is
%ormed %rom other terminal or nonterminal symbols using the composition constructs above&
The %ollo5ing e)amples sho5 productions and the constructs they generate. A+ B+ C+ ( may denote terminal or
nonterminal symbols.
"B!/ is a %ormal language over a %inite alphabet o% symbols introduced above+ built according to the rules
e)plained above. Thus it is no great surprise that "B!/ can be used to de%ine itsel%. :e use the %ollo5ing names %or
syntactic entities&
stmt A syntactic e0uation.
e)pr A list o% alternative terms.
term A concatenation o% %actors.
%actor A single syntactic entity or parenthesi6ed e)pression.
nts !onterminal symbol that denotes a syntactic entity. $t consists o% a se0uence o% letters and digits
5here the %irst character must be a letter.
ts Terminal symbol that belongs to the de%ined language;s vocabulary. .ince the vocabulary
depends on the language to be de%ined there is no production %or ts.
"B!/ is no5 de%ined by the %ollo5ing productions&
"B!/V a stmt b .
Algorithms and Data Structures @@ A ,lobal Te)t
#. $%nta&
stmt V nts ;V; e)pr ;.; .
e)pr V term a ;\; term b .
term V %actor a %actor b .
%actor V nts \ ts \ ;H; e)pr ;I; \ ;N; e)pr ;O; \ ;a; e)pr ;b; .
ntsV letter a letter \ digit b .
")ample& synta) o% simple e)pressions
The %ollo5ing productions %or the three nonterminals "H)pressionI+ THermI+ and /HactorI can be traced back to
Algol >0. They %orm the core o% all grammars %or arithmetic e)pressions. :e have simpli%ied this grammar to de%ine
a class o% e)pressions that lacks+ %or e)ample+ a unary minus operator and many other convenient notations. These
details are but not important %or our purpose& namely+ understanding ho5 this grammar assigns the correct
structure to each e)pression. :e have %urther simpli%ied the grammar so that constants and variables are replaced
by the single terminal symbol P H")hibit >.1I&
7 . T = * ,9, W ,0, + T > .
T . 3 = * ,1, W ,P, + 3 > .
3 . ,#, W ,*, 7 ,+, .
")hibit >.1& .ynta) diagrams %or simple arithmetic e)pressions.
/rom the nonterminal " 5e can derive di%%erent e)pressions. $n the opposite direction 5e start 5ith a se0uence
o% terminal symbols and check by sntactic analsis+ or parsing+ 5hether a given se0uence is a valid e)pression. $%
this is the case the grammar assigns to this e)pression a uni0ue tree structure+ the parse tree H")hibit >.2I.
@>
")hibit >.2& 2arse tree %or the e)pression P K H P I ] P Q P .
")ercise& synta) diagrams %or palindromes
A palindrome is a string that reads the same 5hen read %or5ard or back5ard. E*amples: 0110 and 01010. 01 is
not a palindrome+ as it di%%ers %rom its reverse 10.
1. :hat is the shortest palindrome?
2. .peci%y the synta) o% palindromes over the alphabet a0+ 1b in "B!/#notation+ and by dra5ing synta)
diagrams.
.olution
1. The shortest palindrome is the null or empty string.
2. . V N ;0; \ ;1; O \ ;0; . ;0; \ ;1; . ;1; H")hibit >.3I.
")hibit >.3& .ynta) diagram %or palindromes
An overly simple synta/ for simple e/pressions
:hy does the grammar given in previous section contain term and factor? An e)pression " that involves only
binary operators He.g. ]+ U+ K and QI is either a primitive operand+ abbreviated as P+ or o% the %orm ;" op ";. Consider
a EsimplerE grammar %or simple+ parenthesis#%ree e)pressions H")hibit >.<I&
7 . ,#, W 7 * ,9, W ,0, W ,1, W ,P, + 7 .
#. $%nta&
")hibit >.<& A synta) that generates parse trees o% ambiguous structure
!o5 the e)pression P K P ] P can be derived %rom " in t5o di%%erent 5ays H")hibit >.@I. .uch an ambiguous
grammar is useless since 5e 5ant to derive the semantic interpretation %rom the syntactic structure+ and the tree at
the le%t contradicts the conventional operator precedence o% K over ].
")hibit >.@& T5o incompatible structures %or the e)pression P K P ] P .
7Everthing should be e*plained as simpl as possible$ but not simpler'8
4Albert Einstein1
:e can salvage the idea o% a grammar 5ith a single nonterminal " by enclosing every e)pression o% the %orm ;"
op "; in parentheses+ thus ensuring that every e)pression has a uni0ue structure H")hibit >.>I&
7 . ,#, W ,*, 7 * ,9, W ,0, W ,1, W ,P, + 7 ,+, .
")hibit >.>& 2arentheses serve to restore uni0ue structure.
@D
$n doing so 5e change the language. The more comple) grammar 5ith three nonterminals "H)pression+ THermI+
and /HactorI lets us 5rite e)pressions that are only partially parenthesi6ed and assigns to them a uni0ue structure
compatible 5ith our priority conventions& K and Q have higher priority than ] and U.
")ercise& the ambiguity o% the dangling EelseE
The problem o% the dangling 9else9 is an e)ample o% a synta) chosen to be Etoo simpleE %or the task it is supposed
to handle. The synta) o% several programming languages He.g.+ 2ascalI assigns to nested ;i%#thenN#elseO; statements
an ambiguous structure. $t is le%t to the semantics o% the language to disambiguate.
Let "+ "1+ "2+ [ denote Boolean e)pressions+ .+ .1+ .2+ [ statements. 2ascal synta) allo5s t5o types o% i%
statements&
if 7 then S
and
if 7 then S else S
1. (ra5 one synta) diagram that e)presses both o% these syntactic possibilities.
2. .ho5 all the possible syntactic structures o% the statement
if 7
"
then if 7
$
then S
"
else S
$
3. 2ropose a small modi%ication to the 2ascal language that avoids the syntactic ambiguity o% the dangling
else. .ho5 that in your modi%ied 2ascal any arbitrarily nested structure o% ;i%#then; and ;i%#then#else;
statements must have a uni0ue syntactic structure.
!arenthesis)free notation for arithmetic e/pressions
$n the usual infi* notation %or arithmetic e)pressions a binary operator is 5ritten bet5een its t5o operands.
"ven 5ith operator precedence conventions+ some parentheses are re0uired to guarantee a uni0ue syntactic
structure. The selective use o% parentheses complicates the synta) o% in%i) e)pressions& .ynta) analysis+
interpretative evaluation+ and code generation all become more complicated.
2arenthesis#%ree or 2olish notation Hnamed %or the 2olish logician an Lukasie5ic6I is a simpler notation %or
arithmetic e)pressions. All operators are systematically 5ritten either be%ore Hprefi* notationI or a%ter Hpostfi* or
suffi* notationI the operands to 5hich they apply. :e restrict our e)amples to the binary operators ]+ U+ K and Q.
Cperators 5ith di%%erent arities Hi.e. di%%erent numbers o% argumentsI are easily handled provided that the number
o% arguments used is uni0uely determined by the operator symbol. To introduce the unary minus 5e simply need a
di%%erent symbol than %or the binary minus.
!nfix a9 a9*1c+*a9+1c
6refix 9a 9a1c 19ac
6ostfix a9 ac19 a9c1
2ost%i) notation mirrors the se0uence o% operations per%ormed during the evaluation o% an e)pression. ;ab]; is
interpreted as& load a H%ind %irst operandIL load b H%ind the second operandIL add both. The synta) o% arithmetic
e)pressions in post%i) notation is determined by the %ollo5ing grammar&
S . ,#, W S S * ,9, W ,0, W ,1, W ,P, +
#. $%nta&
")hibit >.7& .u%%i) e)pressions have a uni0ue structure even 5ithout the use o% parentheses.
")ercises
1. Consider the %ollo5ing synta)+ given in "B!/&
. V A.
A V B \ ;$/; A ;TA"!; A ;"L."; A.
B V C \ B ;C8; C.
C V ( \ C ;A!(; (.
( V ;); \ ;H; A ;I; \ ;!CT; (.
HaI (etermine the sets o% terminal and nonterminal symbols.
HbI ,ive the synta) diagrams corresponding to the rules above.
HcI :hich o% the %ollo5ing e)pressions is correct corresponding to the given synta)? /or the correct
e)pressions sho5 ho5 they can be derived %rom the given rules&
) A!( )
) !CT A!( )
H) C8 )I A!( !CT )
$/ ) A!( ) TA"! ) C8 ) "L." !CT )
) A!( C8 )
2. ")tend the grammar o% .ection >.3 to include the ;unary minus; Hi.e. an arithmetic operator that turns any
e)pression into its negative+ as in U)I. (o this under t5o di%%erent assumptions&
HaI The unary minus is denoted by a di%%erent character than the binary minus+ say c.
HbI The character U is ;overloaded; Hi.e. it is used to denote both unary and binary minusI. /or any speci%ic
occurrence o% U+ only the conte)t determines 5hich operator it designates.
3. ")tended Backus#!aur %orm and synta) diagrams
(e%ine each o% the %our languages described belo5 using both "B!/ and synta) diagrams. *se the %ollo5ing
conventions and notations& *ppercase letters denote nonterminal symbols. Lo5ercase letters and the three
separators ;+; ;H; and ;I; denote terminal symbols. EE stands %or the empty or null string. !otice that the blank
character does not occur in these languages+ so 5e use it to separate distinct sentences.
>0
L &&Va \ b \ [ \ 6 Letter
( &&V0 \ 1 \ 2 \ 3 \ < \ @ \ > \ 7 \ D \ 9 (igit
. &&V( a ( b .e0uence o% digits $ &&VL a L \ ( b $denti%ier
4a1 +eal numbers 4constants1 in Pascal
E*amples: U3 ] 3.1< 10eU0> U10.0e> but not 10e>
HbI 6onnested lists of identifiers Hincluding the empty listI
E*amples: HI HaI Hyear+ month+ dayI but not Ha+HbII and not EE
HcI 6ested lists of identifiers Hincluding empty listsI
E*amples: in addition to the e)amples in part HbI+ 5e have lists such as
HHI+HII Ha+ HII Hname+ H%irst+ middle+ lastII but not HaIHbI and not EE
4d1 Parentheses e*pressions
Almost the same problem as part HcI+ e)cept that 5e allo5 the null string+ 5e omit identi%iers and commas+
and 5e allo5 multiple outermost pairs o% parentheses.
E*amples: EE HI HIHI HIHHII HIHHIHIIHIHI
<. *se both synta) diagrams and "B!/ to de%ine the repeated i%#then#else statement&
if ;
"
then S
"
elsif ;
$
then S
$
elsif C else S
Algorithms and Data Structures >1 A ,lobal Te)t
9& Synta/ analysis
synta) is the %rame that carries the semantics o% a language
synta) analysis
synta) tree
top#do5n parser
synta) analysis o% parenthesis#%ree e)pressions by counting
synta) analysis by recursive descent
recursive coroutines
The role of synta/ analysis
The synta) o% a language is the skeleton that carries the semantics. There%ore+ 5e 5ill try to get as much 5ork as
possible done as a side e%%ect o% synta) analysisL %or e)ample+ compiling a program Hi.e. translating it %rom one
language into anotherI is a mainly semantic task. Ao5ever+ a good language and compiler are designed in such a
5ay that synta) analysis determines 5here to start 5ith the translation process. 'any processes in computer
science are synta)#driven in this sense. Aence synta) analysis is important. $n this section 5e derive algorithms %or
synta) analysis directly %rom synta) diagrams. These algorithms re%lect the recursive nature o% the underlying
grammars. A program %or synta) analysis is called a parser.
The composition o% a sentence can be represented by a snta* tree or parse tree. The root o% the tree is the start
symbolL the leaves represent the sentence to be recogni6ed. The tree describes ho5 a syntactically correct sentence
can be derived %rom the start symbol by applying the productions o% the underlying grammar H")hibit 7.1I.
")hibit 7.1& The uni0ue parse tree %or P K P ] P
(op-down parsers begin 5ith the start symbol as the goal o% the analysis. $n our e)ample+ Esearch %or an "E. The
production %or " tells us that 5e obtain an " i% 5e %ind a se0uence o% T;s separated by ] or U. Aence 5e look %or T;s.
The structure tree o% an e)pression gro5s in this 5ay as a se0uence o% goals %rom top Hthe rootI to bottom Hthe
leavesI. :hile satis%ying the goals Hnonterminal symbolsI the parser reads suitable symbols Hterminal symbolsI
%rom le%t to right. $n many practical cases a parser needs no backtrack. !o backtracking is re0uired i% the current
# # + #
F F F
T T
E
'. $%nta& anal%sis
input symbol and the nonterminal to be e)panded determine uni0uely the production to be applied. A recursive#
descent parser uses a set o% recursive procedures to recogni6e its input 5ith no backtracking.
5ottom-up methods build the structure tree %rom the leaves to the root. The te)t is reduced until the start
symbol is obtained.
Synta/ analysis of parenthesis)free e/pressions by counting
.ynta) analysis can be very simple. Arithmetic e)pressions in 2olish notation are analy6ed by counting. /or sake
o% simplicity 5e assume that each operand in an arithmetic e)pression is denoted by the single character P. $n order
to decide 5hether a given string c1 c2 [ cn is a correct e)pression in post%i) notation+ 5e %orm an integer se0uence t0+
t1+ [ + tn according to the %ollo5ing rule&
t
0
. 0.
t
i9"
. t
i
9 ", if i K 0 and c
i9"
is an operand.
t
i9"
. t
i
0 ", if i K 0 and c
i9"
is an operator.
")ample o% a correct e)pression&
# # # # 0 0 9 # 1
c
"
c
$
c
%
c
#
c
(
c
'
c
)
c
2
c
9
t
0
t
"
t
$
t
%
t
#
t
(
t
'
t
)
t
2
t
9
0 " $ % # % $ " $ "
")ample o% an incorrect e)pression Hone operator is missingI&
# # # 9 1 # # P
c
"
c
$
c
%
c
#
c
(
c
'
c
)
c
2
t
0
t
"
t
$
t
%
t
#
t
(
t
'
t
)
t
2
0 " $ % $ " $ % $
Theorem& The string c1 c2 [ cn over the alphabet A V a P + ] + U + K + Q b is a syntactically correct e)pression in
post%i) notation i% and only i% the associated integer se0uence t0+ t1+ [ + tn satis%ies the %ollo5ing conditions&
t
i
K 0 for " J i S n, t
n
. ".
Proof : Let c1

c2

[ cn be a correct arithmetic e)pression in post%i) notation. :e prove by induction on the
length n o% the string that the corresponding integer se0uence satis%ies the conditions.
5ase of induction: /or n V 1 the only correct post%i) e)pression is c1

V P+ and the se0uence t
0

V 0+ t
1

V 1 has the
desired properties.
Induction hpothesis: The theorem is correct %or all e)pressions o% length ` m.
Induction step: Consider a correct post%i) e)pression . o% length m ] 1 d 1 over the given alphabet A. Let s V Hs
i
I 0 ` i
` m]1 be the integer se0uence associated 5ith .. Then . is o% the %orm . V T * Cp+ 5here ;Cp; is an operator and T
and * are correct post%i) e)pressions o% length 3 ` m and length k ` m+ 3 ] k V m. Let t V Ht
i
I 0 ` $ ` 3 and u V Hu
i
I 0 ` i ` k
be the integer se0uences associated 5ith T and *. :e apply the induction hypothesis to T and *. The se0uence s is
composed %rom t and u as %ollo5s&
>3
s . s
0
, s
"
, s
$
, C , s
/
, s
/ 9 "
, s
/ 9 $
, C , s
m
, s
m9"
t
0
, t
"
, t
$
, C , t
/
, u
"
9 " , u
$
9 " , C , u
k
9 " , "
0, C ,", C ,$,"
.ince t ends 5ith 1+ 5e add 1 to each element in u+ and the subse0uence there%ore ends 5ith uk

] 1 V 2. /inally+
the operator ;Cp; decreases this element by 1+ and s there%ore ends 5ith sm]1

V 1. .ince ti d 0 %or 1 ` i e 3 and ui

d 0 %or
1 ì e k+ 5e obtain that si

d 0 %or 1 ` i e k ] 1. Aence s has the desired properties+ and 5e have proved one direction
o% the theorem.
Proof : :e prove by induction on the length n that a string c1

c2

[ cn over A is a correct arithmetic e)pression
in post%i) notation i% the associated integer se0uence satis%ies the conditions stated in the theorem.
5ase of induction: /or n V 1 the only se0uence is t0

V 0+ t1

V 1. $t %ollo5s %rom the de%inition o% the se0uence that
c
1
V P+ 5hich is a correct arithmetic e)pression in post%i) notation.
Induction hpothesis: The theorem is correct %or all e)pressions o% length ` m.
Induction step: Let s V HsiI 0 ` i ` m]1 be the integer se0uence associated 5ith a string . V c1

c2

[ cm]1 o% length m ] 1
d 1 over the given alphabet A 5hich satis%ies the conditions stated in the theorem. Let 3 e m ] 1 be the largest inde)
5ith s
3
V 1. .ince s
1
V 1 such an inde) 3 e)ists. Consider the substrings T V c1

c2

[ c3 and * V c3

c3]1

[ cm. The integer
se0uences HsiI 0 ` i ` 3 and Hsi

U 1I 3 ` i ` m associated 5ith T and * both satis%y the conditions stated in the theorem.
Aence 5e can apply the induction hypothesis and obtain that both T and * are correct post%i) e)pressions. /rom
the de%inition o% the integer se0uence 5e obtain that cm]1 is an operand ;Cp;. .ince T and * are correct post%i)
e)pressions+ . V T * Cp is also a correct post%i) e)pression+ and the theorem is proved.
A similar proo% sho5s that the syntactic structure o% a post%i) e)pression is uni0ue. The integer se0uence
associated 5ith a post%i) e)pression is o% practical importance& The se0uence describes the depth o% the stack during
evaluation o% the e)pression+ and the largest number in the se0uence is there%ore the ma)imum number o% storage
cells needed.
Analysis by recursive descent
:e return to the synta) o% the simple arithmetic e)pressions o% chapter > in the section R")ample& synta) o%
simple e)pressionsS H")hibit 7.2I. *sing the e)pression P K HP U PI as an e)ample+ 5e sho5 ho5 these synta)
diagrams are used to analy6e any e)pressions by means o% a techni0ue called recursive-descent parsing. The
progress o% the analysis depends on the current state and the ne)t symbol to be read& a lookahead o% e)actly one
symbol su%%ices to avoid backtracking. $n ")hibit 7.3 5e move one step to the right a%ter each symbol has been
recogni6ed+ and 5e move vertically to step up or do5n in the recursion.
Algorithms and Data Structures >< A ,lobal Te)t
'. $%nta& anal%sis
")hibit 7.2& .tandard synta) %or simple arithmetic e)pressions Hgraphic does not matchI
")hibit 7.3& Trace o% synta) analysis algorithm parsing the e)pression P K H P U P I.
Turning synta/ diagrams into a parser
$n a programming language that allo5s recursion the three synta) diagrams %or simple arithmetic e)pressions
can be translated directly into procedures. A nonterminal symbol corresponds to a procedure call+ a loop in the
diagram generates a 5hile loop+ and a selection is translated into an i% statement. :hen a procedure 5ants to
delegate a goal it calls another+ in cyclic order& " calls T calls / calls "+ and so on. 2rocedures implementing such a
recursive control structure are o%ten called recursive coroutines.
>@
The procedures that %ollo5 must be embedded into a program that provides the variable ;ch; and the procedures
;read; and ;error;. :e assume that the procedure ;error; prints an error message and terminates the program. $n a
more sophisticated implementation+ ;error; 5ould return a message to the calling procedure He.g. ;%actor;I. Then this
error message is returned up the ladder o% all recursive procedure calls active at the moment.
Be%ore the %irst call o% the procedure ;e)pression;+ a character has to be read into ;ch;. /urthermore+ 5e assume
that a correct e)pression is terminated by a period&
C
read*ch+& expression& if ch T ,., then error&
C
")ercises
1. (esign recursive algorithms to translate the simple arithmetic e)pressions o% chapter > in the section
R")ample& synta) o% a simple e)pressionsS into corresponding pre%i) and post%i) e)pressions as de%ined in
chapter > in the section R2arenthesis#%ree notation %or arithmetic e)pressionsS. .ame %or the inverse
translations.
2. *sing synta) diagrams and "B!/ de%ine a language o% ;correctly nested parentheses e)pressions;. Xou have
a bit o% %reedom Hho5 much?I in de%ining e)actly 5hat is correctly nested and 5hat is not+ but obviously
your de%inition must include e)pressions such as HI+ HHHIII+ HHIHHIII+ and must e)clude strings such as H+ IH+ HII
HI.
3. (esign t5o parsing algorithms %or your class o% correctly nested parentheses e)pressions& one that 5orks by
counting+ the other through recursive descent.
Algorithms and Data Structures >> A ,lobal Te)t
!art """# :b;ects$ algorithms$
programs
Computing 5ith numbers and other ob3ects
.ince the introduction o% computers %our or %ive decades ago the meaning o% the 5ord computation has kept
e)panding. :hereas EcomputationE traditionally implied EnumbersE+ today 5e routinely compute pictures+ te)ts+
and many other types o% ob3ects. :hen classi%ied according to the types o% ob3ects being processed+ three types o%
computer applications stand out prominently 5ith respect to the in%luence they had on the development o%
computer science.
The %irst generation involved numerical computing+ applied mainly to scienti%ic and technical problems. (ata to
be processed consisted almost e)clusively o% numbers+ or sets o% numbers 5ith a simple structure+ such as vectors
and matrices. 2rograms 5ere characteri6ed by long e)ecution times but small sets o% input and output data.
Algorithms 5ere more important than data structures+ and many ne5 numerical algorithms 5ere invented. Lasting
achievements o% this %irst phase o% computer applications include systematic study o% numerical algorithms+ error
analysis+ the concept o% program libraries+ and the %irst high#level programming languages+ /ortran and Algol.
The second generation+ hatched by the needs o% commercial data processing+ leads to the development o% many
ne5 data structures. Business applications thrive on record keeping and updating+ te)t and %orm processing+ and
report generation& there is not much computation in the numeric sense o% the 5ord+ but a lot o% reading+ storing+
moving+ and printing o% data. $n other 5ords+ these applications are data intensive rather than computation
intensive. By %ocusing attention on the problem o% e%%icient management o% large+ dynamically varying data
collections+ this phase created one o% the core disciplines o% computer science& data structures+ and corresponding
algorithms %or managing data+ such as searching and sorting.
:e are no5 in a third generation o% computer applications+ dominated by computing 5ith geometric and
pictorial ob3ects. This change o% emphasis 5as triggered by the advent o% computers 5ith bitmap graphics. $n turn+
this leads to the 5idespread use o% sophisticated user inter%aces that depend on graphics+ and to a rapid increase in
applications such as computer#aided design HCA(I and image processing and pattern recognition Hin medicine+
cartography+ robot controlI. The young discipline o% computational geometry has emerged in response to the
gro5ing importance o% processing geometric and pictorial ob3ects. $t has created novel data structures and
algorithms+ some o% 5hich are presented in 2arts F and F$.
Cur selection o% algorithms in 2art $$$ re%lects the breadth o% applications 5hose history 5e have 3ust sketched.
:e choose the simplest types o% ob3ects %rom each o% these di%%erent domains o% computation and some o% the most
concise and elegant algorithms designed to process them. The study o% typical small programs is an essential part o%
programming. A large part o% computer science consists o% the kno5ledge o% ho5 typical problems can be solvedL
and the best 5ay to gain such kno5ledge is to study the main ideas that make standard programs 5ork.
'. $%nta& anal%sis
Algorithms and programs
Theoretical computer science treats algorithm as a %ormal concept+ rigorously de%ined in a number o% 5ays+ such
as Turing machines or lambda calculus. But in the conte)t o% programming+ algorithm is typically used as an
intuitive concept designed to help people e)press solutions to their problems. The %ormal counterpart o% an
algorithm is a procedure or program H%ragmentI that e)presses the algorithm in a %ormally de%ined programming
language. The process o% %ormali6ing an algorithm as a program typically re0uires many decisions& some super%icial
He.g. 5hat type o% statement is chosen to set up a loopI+ some o% great practical conse0uence He.g. %or a given range
o% values o% n+ is the algorithm;s asymptotic comple)ity analysis relevant or misleading?I.
:e present algorithms in 5hatever notation appears to convey the key ideas most clearly+ and 5e have a clear
pre%erence %or pictures. :e present programs in an e)tended version o% 2ascalL readers should have little di%%iculty
translating this into any programming language o% their choice. 'astery o% interesting small programs is the best
5ay to get started in computer science. :e encourage the reader to 5ork the e)amples in detail.
The literature on algorithms. The development o% ne5 algorithms has been proceeding at a very rapid pace
%or several decades+ and even a specialist can only stay abreast 5ith the state o% the art in some sub%ield+ such as
graph algorithms+ numerical algorithms+ or geometric algorithms. This rapid development is sure to continue
unabated+ particularly in the increasingly important %ield o% parallel algorithms. The cutting edge o% algorithm
research is published in several 3ournals that speciali6e in this research topic+ including the :ournal of Algorithms
and Algorithmica. This literature is generally accessible only a%ter a student has studied a %e5 te)tbooks on
algorithms+ such as NAA* 7@O+ NBaa DDO+ NBB DDO+ NCL8 90O+ N,B 91O+ NA. 7DO+ N-nu 73aO+ N-nu D1O+ N-nu 73bO+
N'an D9O+ N'eh D<aO+ N'eh D<bO+ N'eh D<cO+ N8!( 77O+ N.ed DDO+ N:il D>O+ and N:ir D>O.
>D
<& Truth values$ the data
type +set+$ and bit acrobatics
truth values+ bits
boolean variables and %unctions
bit sum& %our clever algorithms compared
trade#o%% bet5een time and space
7its and boolean functions
The "nglish mathematician ,eorge Boole H1D1@U1D><I became one o% the %ounders o% symbolic logic 5hen he
endeavored to e)press logical arguments in mathematical %orm. The goal o% his 1D@< book (he /aws "f (hought
5as Eto investigate the la5s o% those operations o% the mind by 5hich reasoning is per%ormedL to give e)pression to
them in the symbolic language o% calculus. [E
(ruth values or boolean values+ named in Boole;s honor+ possess the smallest possible use%ul domain& the binary
domain+ represented by yesQno+ 1Q0+ trueQ%alse+ TQ/. $n the late 19<0s+ as the use o% binary arithmetic became
standard and as in%ormation theory came to regard a t5o#valued 0uantity as the natural unit o% in%ormation+ the
concise term bit 5as coined as an abbreviation o% Ebinary digitE. A bit+ by any other name+ is truly a primitive data
elementYat a su%%icient level o% detail+ HalmostI everything that happens in today;s computers is bit manipulation.
ust because bits are simple data 0uantities does not mean that processing them is necessarily simple+ as 5e
illustrate in this section by presenting some clever and e%%icient bit manipulation algorithms.
5oolean variables range over boolean values+ and boolean functions take boolean arguments and produce
boolean results. There are only %our distinct boolean %unctions o% a single boolean variable+ among 5hich ;not; is the
most use%ul& $t yields the complement o% its argument Hi.e. turns 0 into 1+ and vice versaI. The other three are the
identity and the %unctions that yield the constants 0 and 1. There are 1> distinct boolean %unctions o% t5o boolean
variables+ o% 5hich several are %re0uently used+ in particular& ;and;+ ;or;L their negations ;nand;+ ;nor;L the e)clusive#or
;)or;L and the implication ;;. These %unctions are de%ined as %ollo5s&
a b a and b a or b a nand b a nor b a )or b a b
0 0 0 0 1 1 0 1
0 1 0 1 1 0 1 1
1 0 0 1 1 0 1 0
1 1 1 1 0 0 0 1
Bits are the atomic data elements o% today;s computers+ and most programming languages provide a data type
;boolean; and built#in operators %or ;and;+ ;or;+ ;not;. To avoid the necessity %or boolean e)pressions to be %ully
(. )ruth values* the data t%pe +set+* and bit acrobatics
parenthesi6ed+ precedence relations are de%ined on these operators& ;not; takes precedence over ;and;+ 5hich takes
precedence over ;or;. Thus
) and not y or not ) and y HH) and Hnot yII or HHnot )I and yII.
:hat can you compute 5ith boolean variables? Theoretically everything+ since large %inite domains can al5ays
be represented by a su%%icient number o% boolean variables& 1>#bit integers+ %or e)ample+ use 1> boolean variables to
represent the integer domain U2
1@
.. 2
1@
U1. Boolean variables are o%ten used %or program optimi6ation in practical
problems 5here e%%iciency is important.
S*apping and crossovers# the versatile e/clusive)or
Consider the s5ap statement ) &V& y+ 5hich 5e use to abbreviate the cumbersome triple& t &V )L ) &V yL y &V t.
Cn computers that provide bit5ise boolean operations on registers+ the s5ap operator &V& can be implemented
e%%iciently 5ithout the use o% a temporary variable.
The operator e*clusive-or+ o%ten abbreviated as ;)or;+ is de%ined as
) )or y V ) and not y or not ) and y.
$t yields true i%% e)actly one o% its t5o arguments is true.
The bit5ise boolean operation 6&V ) op y on n#bit registers& )N1 .. nO+ yN1 .. nO+ 6N1 .. nO+ is de%ined as
%or i &V 1 to n do 6NiO &V )NiO op yNiO
:ith a bit5ise e)clusive#or+ the s5ap ) &V& y can be programmed as
x -. x xor y& y -. x xor y& x -. x xor y&
$t still takes three statements+ but no temporary variable. ,iven that registers are usually in short supply+ and
that a logical operation on registers is typically 3ust as %ast as an assignment+ the latter code is pre%erable. ")hibit
D.1 traces the e)ecution o% this code on t5o <#bit registers and sho5s e)haustively that the s5ap is per%ormed
correctly %or all possible values o% ) and y.
")hibit D.1& Trace o% registers ) and y under repeated e)clusive#or operations.
")ercise& planar circuits 5ithout crossover o% 5ires
The code above has yet another interpretation& Ao5 should 5e design a logical circuit that e%%ects a logical
crossover o% t5o 5ires ) and y 5hile avoiding any physical crossover? $% 5e had an ;)or; gate+ the circuit diagram
sho5n in ")hibit D.2 5ould solve the problem. ;)or; gates must typically be reali6ed as circuits built %rom simpler
primitives+ such as ;and;+ ;or;+ ;not;. (esign a circuit consisting o% ;and;+ ;or;+ ;not; gates only+ 5hich has the e%%ect o%
crossing 5ires ) and y 5hile avoiding physical crossover.
")hibit D.2& Three e)clusive#or gates in series interchange values on t5o 5ires.
70
The bit sum or =population count=
A computer 5ord is a %i)ed#length se0uence o% bits+ call it a bit vector. Typical 5ord lengths are 1>+ 32+ or ><+ and
most instructions in most computers operate on all the bits in a 5ord at the same time+ in parallel. :hen e%%iciency
is o% great importance+ it is 5orth e)ploiting to the utmost the bit parallelism built into the hard5are o% most
computers. Today;s programming languages o%ten %ail to re%er e)plicitly to hard5are %eatures such as registers or
5ords in memory+ but it is usually possible to access individual bits i% one kno5s the representation o% integers or
other data types. $n this section 5e take the %reedom to drop the constraint o% strong tping built into 2ascal and
other modern languages. :e interpret the content o% a register or a 5ord in memory as it suits the need o% the
moment& a bit string+ an integer+ or a set.
:e are 5ell a5are o% the dangers o% such ambiguous interpretations& 2rograms become system and compiler
dependent+ and thus lose portability. $% such ambiguity is locali6ed in a single+ small procedure+ the danger may be
kept under control+ and the gain in e%%iciency may out5eigh these dra5backs. $n 2ascal+ %or e)ample+ the type ;set; is
especially 5ell suited to operate at the bit level. ;type s V set o% Ha+ b+ cI; consists o% the 2
3
sets that can be %ormed
%rom the three elements a+ b+ c. $% the basic set ' underlying the declaration o%
type . V set o% '
consists o% n elements+ then . has 2
n
elements. *sually+ a value o% type . is internally represented by a vector o% n
contiguously allocated bits+ one bit %or each element o% the set '. :hen computing 5ith values o% type . 5e operate
on single bits using the boolean operators. The union o% t5o sets o% type . is obtained by applying bit5ise ;or;+ the
intersection by applying bit5ise ;and;. The complement o% a set is obtained by applying bit5ise ;not;.
")ample
M . =0, ", C , )>
.et Bit vector
7 > @ < 3 2 1 0 "lements
s
"
=0, %, #, '> 0 " 0 " " 0 0 "
s
$
=0, ", #, (> 0 0 " " 0 0 " "
s
"
s
$
=0, ", %, #, (, '> 0 " " " " 0 " "
s
"
s
$
=0, #> 0 0 0 " 0 0 0 "
s
"
=", $, (, )> " 0 " 0 0 " " 0
$ntegers are represented on many small computers by 1> bits. :e assume that a type ;51>;+ %or E5ord o% length
1>E+ can be de%ined. $n 2ascal+ this might be
type w"' . set of 0 .. "(&
A variable o% type ;51>; is a set o% at most 1> elements represented as a vector o% 1> bits.
Asking %or the number o% elements in a set s is there%ore the same as asking %or the number o% 1;s in the bit
pattern that represents s. The operation that counts the number o% elements in a set+ or the number o% 1;s in a 5ord+
is called the population count or bit sum' The bit sum is %re0uently used in inner loops o% combinatorial
calculations+ and many a programmer has tried to make it as %ast as possible. Let us look at %our o% these tries+
beginning 5ith the obvious.
$nspect every bit
function itsum
0
*w- w"'+- integer&
var i, c- integer&
egin
c -. 0&
for i -. 0 to "( do { inspect every bit }
if i w {w7i8 ! &} then c -. c 9 "& { count the ones}
return*c+
end&
.kip the 6eros
$s there a %aster 5ay? The %ollo5ing algorithm looks mysterious and tricky. The e)pression 5 H5 U 1I contains
both an intersection operation ;;+ 5hich assumes that its operands are sets+ and a subtraction+ 5hich assumes that
5 is an integer&
c -. 0&
while w T 0 do { c -. c 9 "& w -. w *w 0 "+ } &
.uch mi)ing makes sense only i% 5e can rely on an implicit assumption on ho5 sets and integers are represented
as bit vectors. :ith the usual binary number representation+ an e)ample sho5s that 5hen the body o% the loop is
e)ecuted once+ the rightmost 1 o% 5 is replaced by 0&
w "000"000""00"000
w 0 " "000"000""000"""
w *w 0 "+ "000"000""000000
This clever code seems to look at the 1;s only and skip over all the 0;s& $ts loop is e)ecuted only as many times as
there are 1;s in the 5ord. This savings is 5orth5hile %or long+ sparsely populated 5ords H%e5 1;s and many 0;sI.
$n the statement 5 &V 5 H5 U 1I+ 5 is used both as an integer Hin 5 U 1I and as a set Has an operand in the
intersection operation ;;I. .trongly typed languages+ such as 2ascal+ do not allo5 such mi)ing o% types. $n the
%ollo5ing %unction ;bitsum1;+ the conversion routines ;51>toi; and ;ito51>; are introduced to avoid this double
interpretation o% 5. Ao5ever+ ;bitsum1; is o% interest only i% such a type conversion re0uires no e)tra time Hi.e. i% one
kno5s ho5 sets and integers are represented internallyI.
function itsum
"
*w- w"'+- integer&
var c, i- integer& w
0
, w
"
- w"'&
egin
w
0
-. w& c -. 0&
while w
0
T Y { empty set } do egin
i -. w"'toi*w
0
+& { w&$toi converts type w&$ to integer }
i -. i 0 "&
w
"
-. itow"'*i+& { itow&$ converts type integer to w&$ }
w
0
-. w
0
w
"
& { intersection of two sets }
c -. c 9 "
end&
return*c+
end&
72
'ost languages provide some %acility %or permitting purely %ormal type conversions that result in no 5ork&
;"=*$FAL"!C"; statements in /ortran+ ;*!.2"C; in 2LQ1+ variant records in 2ascal. .uch EconversionsE are done
merely by interpreting the contents o% a given storage location in di%%erent 5ays.
Logarithmic bit sum
/or a computer o% 5ord length n+ the %ollo5ing algorithm computes the bit sum o% a 5ord 5 running through its
loop only log2 n times+ as opposed to n times %or ;bitsum0; or up to n times %or ;bitsum1;. The %ollo5ing
description holds %or arbitrary n but is understood most easily i% n V 2h.
The logarithmic bit sum 5orks on the %amiliar principle o% divide#and#con0uer. Let 5 denote a 5ord consisting
o% n V 2h bits+ and let .H5I be the bit sum o% the bit string 5. .plit 5 into t5o halves and denote its le%t part by 5L
and its right part by 58. The bit sum obviously satis%ies the recursive e0uation .H5I V .H5LI ] .H58I. 8epeating
the same argument on the substrings 5L and 58+ and+ in turn+ on the substrings they create+ 5e arrive at a process
to compute .H5I. This process terminates 5hen 5e hit substrings o% length 1 Ni.e. substrings consisting o% a single
bit bL in this case 5e have .HbI V bO. 8epeated halving leads to a recursive decomposition o% 5+ and the bit sum is
computed by a tree o% n U 1 additions as sho5n belo5 %or n V < H")hibit D.3I.
")hibit D.3& Logarithmic bit sum algorithm as a result o% divide#and#con0uer.
This approach o% treating both parts o% 5 symmetrically and repeated halving leads to a computation o% depth h
V log2 n . To obtain a logarithmic bit sum+ 5e apply the additional trick o% per%orming many additions in parallel.
!otice that the total length o% all operands on the same level is al5ays n. Thus 5e can pack them into a single 5ord
and+ i% 5e arrange things cleverly+ per%orm all the additions at the same level in one machine operation+ an addition
o% t5o n#bit 5ords.
")hibit D.< sho5s ho5 a number o% the additions on short strings are carried out by a single addition on long
strings. .H5I no5 denotes not only the bit sum but also its binary representation+ padded 5ith 6eros to the le%t so as
to have the appropriate length. .ince the same algorithm is being applied to 5L and 58+ and since 5L and 58 are o%
e0ual length+ e)actly the same operations are per%ormed at each stage on 5L and its parts as on 58 and its
corresponding parts. Thus i% the operations o% addition and shi%ting operate on 5ords o% length n+ a single one o%
these operations can be interpreted as per%orming many o% the same operations on the shorter parts into 5hich 5
has been split. This logarithmic speedup 5orks up to the 5ord length o% the computer. /or n V ><+ %or e)ample+
recursive splitting generates si) levels and translates into si) iterations o% the loop belo5.
")hibit D.<& All processes generated by divide#and#con0uer are per%ormed in parallel
on shared data registers.
The algorithm is best e)plained 5ith an e)ampleL 5e use n V D.
57 5> 5@ 5< 53 52 51 50
5 1 1 0 1 0 0 0 1
/irst+ e)tract the even#inde)ed bits 5> 5< 52 50 and place a 6ero to the le%t o% each bit to obtain 5even. The ne5ly
inserted 6eros are sho5n in small type.
5> 5< 52 50
5even
0
1
0
1
0
0
0
1
!e)t+ e)tract the odd#inde)ed bits 57 55 53 51 shi%t them right by one place into bit positions 5> 5< 52 50+ and
place a 6ero to the le%t o% each bit to obtain 5odd.
57 5@ 53 51
5odd
0
1
0
0
0
0
0
0
Then+ numerically add 5even and 5odd+ considered as integers 5ritten in base 2+ to obtain 5;.
5;7 5;> 5;@ 5;< 5;3 5;2 5;1 5;0
5even 0 1 0 1 0 0 0 1
5odd 0 1 0 0 0 0 0 0
5; 1 0 0 1 0 0 0 1
!e)t+ 5e inde) not bits+ but pairs o% bits+ %rom right to le%t& H5;1 5;0I is the 6eroth pair+ H5;@ 5;<I is the second pair.
")tract the even#inde)ed pairs 5;@ 5;< and 5;1 5;0+ and place a pair o% 6eros to the le%t o% each pair to obtain 5; even.
7<
5;@ 5;< 5;1 5;0
5;even
0 0
0 1
0 0
0 1
!e)t+ e)tract the odd#inde)ed pairs 5;7 5;> and 5;3 5;2 + shi%t them right by t5o places into bit positions 5; @ 5;<
and 5;1 5;0 + respectively+ and insert a pair o% 6eros to the le%t o% each pair to obtain 5; odd.
5;7 5;> 5;3 5;2
5;odd
0 0
1 0
0 0
0 0
!umerically+ add 5;even and 5;odd to obtain 5E.
5E7 5E> 5E@ 5E< 5E3 5E2 5E1 5E0
5E 0 0 1 1 0 0 0 1
!e)t+ 5e inde) 0uadruples o% bits+ e)tract the 0uadruple 5E3 5E2 5E1 5E0+ and place %our 6eros to the le%t to obtain
5Eeven.
5E3 5E2 5E1 5E0
5Eeven
0 0 0 0
0 0 0 1
")tract the 0uadruple 5E7 5E> 5E@ 5E<+ shi%t it right %our places into bit positions 5E3 5E2 5E1 5E0+ and place %our
6eros to the le%t to obtain 5Eodd.
5E7 5E> 5E@ 5E<
5Eodd
0 0 0 0
0 0 1 1
/inally+ numerically add 5Eeven and 5Eodd to obtain 5;;; V H00000100I+ 5hich is the representation in base 2 o% the
bit sum o% 5 H< in this e)ampleI. The %ollo5ing %unction implements this algorithm.
/ogarithmic bit sum implemented for a ;<-bit computer:
$n ;bitsum2; 5e apply addition and division operations directly to variables o% type ;51>; 5ithout per%orming the
type conversions that 5ould be necessary in a strongly typed language such as 2ascal.
function itsum
$
*w- w"'+- integer&
const maskF0G . ,0"0"0"0"0"0"0"0",&
maskF"G . ,00""00""00""00"",&
maskF$G . ,0000""""0000"""",&
maskF%G . ,00000000"""""""",&
var i, d- integer& w
even
, w
odd
- w"'&
egin
d -. $&
for i -. 0 to % do egin
w
even
-. w maskFiG&
w -. w P d& { shift w right 5
i
bits }
d -. d
$
&
w
odd
-. w maskFiG&
w -. w
even
9 w
odd
end&
return*w+
end&
Algorithms and Data Structures 7@ A ,lobal Te)t
Trade#o%% bet5een time and space& the %astest algorithm
Are there still %aster algorithms %or computing the bit sum o% a 5ord? $s there an optimal algorithm? The
0uestion o% optimality o% algorithms is important+ but it can be ans5ered only in special cases. To sho5 that an
algorithm is optimal+ one must speci%y precisely the class o% algorithms allo5ed and the criterion o% optimality. $n
the case o% bit sum algorithms+ such speci%ications 5ould be complicated and largely arbitrary+ involving speci%ic
details o% ho5 computers 5ork.
Ao5ever+ 5e can make a plausible argument that the %ollo5ing bit sum algorithm is the %astest possible+ since it
uses a table lookup to obtain the result in essentially one operation. The penalty %or this speed is an e)travagant use
o% memory space H2
n
locationsI+ thereby making the algorithm impractical e)cept %or small values o% n. The choice
o% an algorithm almost al5ays involves trade#o%%s among various desirable properties+ and the better an algorithm is
%rom one aspect+ the 5orse it may be %rom another.
The algorithm is based on the idea that 5e can precompute the solutions to all possible 0uestions+ store the
results+ and then simply look them up 5hen needed. As an e)ample+ %or n V 3+ 5e 5ould store the in%ormation
?ord ;it sum
0 0 0 0
0 0 " "
0 " 0 "
0 " " $
" 0 0 "
" 0 " $
" " 0 $
" " " %
:hat is the %astest 5ay o% looking up a 5ord 5 in this table? *nder assumptions similar to those used in the
preceding algorithms+ 5e can interpret 5 as an address o% a memory cell that contains the bit sum o% 5+ thus giving
us an algorithm that re0uires only one memory re%erence.
Table lookup implemented %or a 1>#bit computer&
function itsum
%
*w- w"'+- integer&
const c- arrayF0 .. '((%(G of integer . F0, ", ", $, ", $, $, %,
C , "(, "'G&
egin return*cFwG+ end&
$n concluding this e)ample+ 5e notice the variety o% algorithms that e)ist %or computing the bit sum+ each one
based on entirely di%%erent principles+ giving us a di%%erent trade#o%% bet5een space and time. ;bitsum0; and ;bitsum3;
solve the problem by Ebrute %orceE and are simple to understand& ;bitsum0; looks at each bit and so re0uires much
timeL ;bitsum3; stores the solution %or each separate case and thus re0uires much space. The logarithmic bit sum
algorithm is an elegant compromise& e%%icient 5ith respect to both space and time+ it merely challenges the
programmer;s 5its.
")ercises
1. .ho5 that there are e)actly 1> distinct boolean %unctions o% t5o variables.
2. .ho5 that each o% the boolean %unctions ;nand; and ;nor; is universal in the %ollo5ing sense& Any boolean
%unction %H)+ yI can be 5ritten as a nested e)pression involving only ;nands;+ and it can also be 5ritten using
only ;nors;. .ho5 that no other boolean %unction o% t5o variables is universal.
7>
3. Consider the logarithmic bit sum algorithm+ and sho5 that an strategy %or splitting 5 Hnot 3ust the halving
splitI re0uires n U 1 additions.
>& :rdered sets
searching in ordered sets
se0uential search. proo% o% program correctness
binary search
in#place permutation
nondeterministic algorithms
cycle rotation
cycle clipping
.ets o% elements processed on a computer are al5ays ordered according to some criterion. $n the preceding
e)ample o% the Epopulation countE operation+ a set is ordered arbitrarily and implicitly simply because it is mapped
onto linear storageL a programmer using that set can ignore any order imposed by the implementation and access
the set through %unctions that hide irrelevant details. $n most cases+ ho5ever+ the order imposed on a set is not
accidental+ but is prescribed by the problem to be solved andQor the algorithm to be used. $n such cases the
programmer e)plicitly deals 5ith issues o% ho5 to order a set and ho5 to use any e)isting order to advantage.
.earching in ordered sets is one o% the most %re0uent tasks per%ormed by computers& 5henever 5e operate on a
data item+ that item must be selected %rom a set o% items. .earching is also an ideal ground %or illustrating basic
concepts and techni0ues o% programming.
At times+ ordered sets need to be rearranged HpermutedI. The chapter R.orting and its comple)ityS is dedicated
to the most %re0uent type o% rearrangement& permuting a set o% elements into ascending order. Aere 5e discuss
another type o% rearrangement& reordering a set according to a given permutation.
Se1uential search
Consider the simple case 5here a %i)ed set o% n data elements is given in an array A&
const n . C & { n 6 ' }
type index . 0 .. n& elt . C &
var O- arrayF" .. nG of elt& or var O- arrayF0 .. nG of elt&
.e0uential or linear search is the simplest techni0ue %or determining 5hether A contains a given element ). $t is
a trivial e)ample o% an incremental algorithm+ 5hich processes a set o% data one element at a time. $% the search %or
) is success%ul+ 5e return an inde) i+ 1 ` i ` n+ to point to ). The convention that i V 0 signals unsuccess%ul search is
convenient and e%%icient+ as it encodes all possible outcomes in a single parameter.
function find*x- elt+- index&
var i- index&
egin
i -. n&
while *i K 0+ { can access < } cand *OFiG T x+ { not yet
found } do
*"+ { (& 4 i 4 n ( k, i 4 k= <7k8 @ x }
i -. i 0 "&
,. -rdered sets
*$+ { (k, i A k= <7k8 @ x ((i! ' ((& 4 i 4 n (<7i8 ! x }
return*i+
end&
The ;cand; operator used in the termination condition is the conditional ;and;. "valuation proceeds %rom le%t to
right and stops as soon as the value o% the boolean e)pression is determined& $% i d 0 yields ;%alse;+ 5e immediately
terminate evaluation o% the boolean e)pression 5ithout accessing ANiO+ thus avoiding an out#o%#bounds error.
:e have included t5o assertions+ H1I and H2I+ that e)press the main points necessary %or a %ormal proo% o%
correctness& mainly+ that each iteration o% the loop e)tends by one element the subarray kno5n not to contain the
search argument ). Assertion H1I is trivially true a%ter the initiali6ation i &V n+ and remains true 5henever the body
o% the 5hile loop is about to be e)ecuted. Assertion H2I states that the loop terminates in one o% t5o 5ays&
i V 0 signals that the entire array has been scanned unsuccess%ully.
) has been %ound at inde) i.
A %ormal correctness proo% 5ould have to include an argument that the loop does indeed terminateYa simple
argument here+ since i is initiali6ed to n+ decreases by 1 in each iteration+ and thus 5ill become 0 a%ter a %inite
number o% steps.
The loop is terminated by a Boolean e)pression composed o% t5o terms& reaching the end o% the array+ i V 0+ and
testing the current array element+ ANiO V ). The second term is unavoidable+ but the %irst one can be spared by
making sure that ) is al5ays %ound be%ore the inde) i drops o%% the end o% the array. This is achieved by e)tending
the array by one cell AN0O and placing the search argument ) in it as a sentinel' $% no true element ) stops the scan o%
the array+ the sentinel 5ill. *pon e)it %rom the loop+ the value o% i reveals the outcome o% the search+ 5ith the
convention that 0 signals an unsuccess%ul search&
function find*x- elt+- index&
var i- index&
egin
OF0G -. x& i -. n&
while OFiG T x do i -. i 0 "&
return*i+
end&
Ao5 e%%icient is se0uential search? An unsuccess%ul search al5ays scans the entire array. $% all n array elements
have e0ual probability o% being searched %or+ the average number o% iterations o% the 5hile loop in a success%ul
search is
This algorithm needs time proportional to n in the average and the worst case.
7inary search
$% the data elements stored in the array A are ordered according to the order relation ` de%ined on their domain+
that is
k+ 1 ` k e n& ANkO ` ANk ] 1O
the search %or an element ) can be made much %aster because a comparison o% ) 5ith any array element ANmO
provides more in%ormation than it does in the unordered case. The result ) f ANmO e)cludes not only ANmO+ but also
all elements on one or the other side o% ANmO+ depending on 5hether ) is greater or smaller than ANmO H")hibit 9.1I.
79
")hibit 9.1& Binary search identi%ies regions 5here the search argument is guaranteed to be absent.
The following function exploits this additional information-
const n . C & { n 6 ' }
type index . " .. n& elt . C &
var O- arrayF" .. nG of elt&
function find*x- elt& var m- index+- oolean&
var u, v- index&
egin
u -. "& v -. n&
while u J v do egin
*"+ { (u 4 v ( k, & 4 k A u= <7k8 A x ( k, v A k 4 n= <7k8 6
x }
m -. any value such that u J m J v &
if x S OFmG thenv -. m 0 "
elsif x K OFmG then u -. m 9 "
*$+ else {x ! <7m8 } return*true+
end&
*%+ { (u ! v B & ( k, & 4 k A u= <7k8A x ( k, v A k 4 n=
<7k8 6 x }
return*false+
end&
u and v bound the interval o% uncertainty that might contain ). Assertion H1I states that AN1O+ [ + ANu U 1O are kno5n
to be smaller than )L ANv ] 1O+ [ + ANnO are kno5n to be greater than ). Assertion H2I+ be%ore e)it %rom the %unction+
states that ) has been %ound at inde) m. $n assertion H3I+ u V v ] 1 signals that the interval o% uncertainty has shrunk
to become empty. $% there e)ists more than one match+ this algorithm 5ill %ind one o% them.
This algorithm is correct independently o% the choice o% m but is most e%%icient 5hen m is the midpoint o% the
current search interval&
m -. *u 9 v+ div $&
:ith this choice o% m each comparison either %inds ) or eliminates hal% o% the remaining elements. Thus at most
log2 n iterations o% the loop are per%ormed in the 5orst case.
")ercise& binary search
The array
var O- array F" .. nG of integer&
contains n integers in ascending order& AN1O ` AN2O ` [ ` ANnO.
HaI :rite a recursive binary search
function rs *x, u, v- integer+- integer&
that returns 0 i% ) is not in A+ and an inde) i such that ANiO V ) i% ) is in A.
HbI :hat is the ma)imal depth o% recursive calls o% ;rbs; in terms o% n?
Algorithms and Data Structures D0 A ,lobal Te)t
A[m] < x
m
x < A[m]
m
If
x cannot lie in
,. -rdered sets
HcI (escribe the advantages and disadvantages o% this recursive binary search as compared to the iterative
binary search.
")ercise& searching in a partially ordered t5o#dimensional array
Consider the n by m array&
var O- arrayF" .. n, " .. mG of integer&
and assume that the integers in each ro5 and in each column are in ascending orderL that is+
OFi, /G J OFi, / 9 "Gfor i . ", C , n and / . ", C , m 0 "&
OFi, /G J OFi 9 ", /Gfor i . ", C , n 0 " and / . ", C , m.
HaI (esign an algorithm that determines 5hether a given integer ) is stored in the array A. (escribe your
algorithm in 5ords and %igures. =int: .tart by comparing ) 5ith AN1+ mO H")hibit 9.2I.
")hibit 9.2& Another e)ample o% the idea o% e)cluded regions.
HbI $mplement your algorithm by a
function !s!nOrray *x- integer+- oolean&
HcI .ho5 that your algorithm is correct and terminates+ and determine its 5orst case time comple)ity.
.olution
HaI The algorithm compares ) %irst 5ith AN1+ mO. $% ) is smaller than AN1+ mO+ then ) cannot be contained in the
last column+ and the search process is continued by comparing ) 5ith AN1+ m U 1O. $% ) is greater than AN1+
mO+ then ) cannot be contained in the %irst ro5+ and the search process is continued by comparing ) 5ith
AN2+ mO. ")hibit 9.3 sho5s part o% a typical search process.
")hibit 9.3& ")cluded regions combine to leave only a staircase#shaped strip to e)amine.
*+ function !s!nOrray*x- integer+- oolean&
var r, c- integer&
egin
r -. "& c -. m&
while *r J n+ and *c I "+ do
{&} if x S OFr, cG then c -. c 0 "
elsif x K OFr, cG then r -. r 9 "
D1
else { x ! <7r, c8 } {5} return*true+&
{#} return*false+
end&
HcI At positions a1b+ a2b+ and a3b+ the invariant
i+ 1 ` i ` n+ 3+ 1 ` 3 ` m&
H3 d c ) f ANi+ 3OI Hi e r ) f ANi+ 3O H)
states that the hatched ro5s and columns o% A do not contain ). At a2b+
H1 ` r ` nI H1 ` c ` mI H) V ANr+ cOI
states that r and c are 5ithin inde) range and ) has been %ound at Hr+ cI. At a3b+
Hr V n ] 1I Hc V 0I
states that r or c are outside the inde) range. This coupled 5ith HgI implies that ) is not in A&
Hr V n ] 1I Hc V o) i+ 1 ` i +` n+ 3. 1 ` 3 ` m& ) f ANi+ 3O.
"ach iteration through the loop either decreases c by one or increases r by one. $% ) is not contained in the array+
either c becomes 6ero or r becomes greater than n a%ter a %inite number o% steps+ and the algorithm terminates. $n
each step+ the algorithm eliminates either a ro5 %rom the top or a column %rom the right. $n the 5orst case it 5orks
its 5ay %rom the upper right corner to the lo5er le%t corner in n ] m U 1 steps+ leading to a comple)ity o% Hn ] mI.
"n)place permutation
!epresentations of a permutation. Consider an array (N1 .. nO that holds n data elements o% type ;elt;.
These are ordered by their position in the array and must be rearranged according to a speci%ic permutation given
in another array. ")hibit 9.< sho5s an e)ample %or n V @. Assume that a+ b+ c+ d+ e+ stored in this order+ are to be
rearranged in the order c+ e+ d+ a+ b. This permutation is represented naturally by either o% the t5o permutation
arrays t HtoI or % H%romI declared as
var t, f- arrayF" .. nG of " .. n&
The e)hibit also sho5s a third representation o% the same permutation& the decomposition o% this permutation
into cycles. The element in (N1O moves into (N<O+ the one in (N<O into (N3O+ the one in (N3O into (N1O+ closing a cycle
that 5e abbreviate as H1 < 3I+ or H< 3 1I+ or H3 1 <I. There is another cycle H2 @I+ and the entire permutation is
represented by H1 < 3I H2 @I.
")hibit 9.<& A permutation and its representations in terms o% ;to;+ ;%rom;+ and cycles.
The cycle representation is intuitively most in%ormative+ as it directly re%lects the decomposition o% the problem
into independent subproblems+ and both the ;to; and ;%rom; in%ormation is easily e)tracted %rom it. But ;to; and
;%rom; dispense 5ith parentheses and lead to more concise programs.
,. -rdered sets
Consider the problem o% e)ecuting this permutation in place& Both the given data and the result are stored in the
same array (+ and only a HsmallI constant amount o% au)iliary storage may be used+ independently o% n. Let us use
the e)ample o% in#place permutation to introduce a notation that is %re0uently convenient+ and to illustrate ho5 the
choice o% primitive operations a%%ects the solution.
A multiple assignment statement 5ill do the 3ob+ using either ;to; or ;%rom;&
QQ H1 ` i ` nI " (NtNiOO &V (NiO #
or
QQ H1 ` i ` nI " (NiOb &V (N%NiOO #
The characteristic properties o% a multiple assignment statement are&
The le%t#hand side is a se0uence o% variables+ the right#hand side is a se0uence o% e)pressions+ and the t5o
se0uences are matched according to length and type. The value o% the i#th e)pression on the right is
assigned to the i#th variable on the le%t.
All the e)pressions on the right#hand side are evaluated using the original values o% all variables that occur
in them+ and the resulting values are assigned EsimultaneouslyE to the variables on the le%t#hand side. :e
use the sign QQ to designate concurrent or parallel e)ecution.
/e5 o% today;s programming languages o%%er multiple assignments+ in particular those o% variable length used
above. Breaking a multiple assignment into single assignments usually %orces the programmer to introduce
temporary variables. As an e)ample+ notice that the direct se0uentiali6ation&
%or i &V 1 to n do (NtNiOO &V (NiO
or
%or i &V 1 to n do (NiO &V (N%NiOO
is %aulty+ as some o% the elements in ( 5ill be over5ritten be%ore they can be moved. Cver5riting can be avoided at
the cost o% nearly doubling memory re0uirements by allocating an array AN1 .. nO o% data elements %or temporary
storage&
%or i &V 1 to n do ANtNiOO &V (NiOL
%or i &V 1 to n do (NiO &V ANiOL
This+ ho5ever+ is not an in#place computation+ as the amount o% au)iliary storage gro5s 5ith n. $t is
unnecessarily ine%%icient& There are elegant in#place permutation algorithms based on the conventional primitive o%
the single assignment statement. They all assume that the permutation array may be destroyed as the permutation
is being e)ecuted. $% the representation o% the permutation must be preserved+ additional storage is re0uired %or
bookkeeping+ typically o% a si6e proportional to n. Although this additional space may be as little as n bits+ perhaps
in order to distinguish the elements processed %rom those yet to be moved+ such an algorithm is not technically in
place.
$ondeterministic algorithms. 2roblems o% rearrangement al5ays appear to admit many di%%erent solutions
Ya phenomenon that is most apparent 5hen one considers the multitude o% sorting algorithms in the literature.
The reason is clear& :hen n elements must be moved+ it may not matter much 5hich elements are moved %irst and
5hich ones later. Thus it is use%ul to look %or nondeterministic algorithms that re%rain %rom speci%ying the precise
se0uence o% all actions taken+ and instead merely iterate condition action statements+ 5ith the meaning
E5herever condition applies per%orm the corresponding actionE. These algorithms are nondeterministic because
each o% several distinct conditions may apply at lots o% di%%erent places+ and 5e may E%ireE any action that is
D3
currently enabled. Adding se0uential control to a nondeterministic algorithm turns it into a deterministic
algorithm. Thus a nondeterministic algorithm corresponds to a class o% deterministic ones that share common
invariants+ but di%%er in the order in 5hich steps are e)ecuted. The correctness o% a nondeterministic algorithm
implies the correctness o% all its se0uential instances. Thus it is good algorithm design practice to develop a correct
nondeterministic algorithm %irst+ then turn it into a deterministic one by ordering e)ecution o% its steps 5ith the
goal o% e%%iciency in mind.
(eterministic se0uential algorithms come in a variety o% %orms depending on the choice o% primitive Hassignment
or s5apI+ data representation H;to; or ;%rom;I+ and techni0ue. :e %ocus on the latter and consider t5o techni0ues&
cycle rotation and cycle clipping. 0cle rotation %ollo5s naturally %rom the idea o% decomposing a permutation into
cycles and processing one cycle at a time+ using temporary storage %or a single element. $t %its the ;%rom;
representation some5hat more e%%iciently than the ;to; representation+ as the latter re0uires a s5ap o% t5o elements
5here the %ormer uses an assignment. 0cle clipping uses the primitive ;s5ap t5o elements; so e%%ectively as a step
to5ard e)ecuting a permutation that it needs no temporary storage %or elements. Because no temporary storage is
tied up+ it is not necessary to %inish processing one cycle be%ore starting on the ne)t oneUelements can be clipped
%rom their cycles in any order. Clipping 5orks e%%iciently 5ith either representation+ but is easier to understand 5ith
;to;. :e present cycle rotation 5ith ;%rom; and cycle clipping 5ith ;to; and leave the other t5o algorithms as
e)ercises.
Cycle rotation
A search %or an in#place algorithm naturally leads to the idea o% processing a permutation one cycle at a time&
every element 5e place at its destination bumps another one+ but 5e avoid holding an unbounded number o%
bumped elements in temporary storage by rotating each cycle+ one element at a time. This 5orks best using the
;%rom; representation. The %ollo5ing loop rotates the cycle that passes through an arbitrary inde) i&
8otate the cycle starting at inde) i+ updating %&
/ -. i&{ initiali)e a two-pronged fork to travel along the cycle }
p -. fF/G& { p is j-s predecessor in the cycle }
O -. @F/G& { save a single element in an auxiliary variable < }
while p T i do = @F/G -. @FpG& fF/G -. /& / -. p& p -. fF/G> &
@F/G -. O& { reinsert the saved element into the former cycle C }
fF/G -. /& { C but now it is a fixed point }
This code 5orks trivially %or a cycle o% length 1+ 5here p V %NiO V i guards the body o% the loop %rom ever being
e)ecuted. The statement %N3O &V 3 in the loop is unnecessary %or rotating the cycle. $ts purpose is to identi%y an
element that has been placed at its %inal destination+ so this code can be iterated %or 1 ` i ` n to yield an in#place
permutation algorithm. /or the sake o% e%%iciency 5e add t5o details& H1I :e avoid unnecessary movements A &V
(N3OL (N3O &V A o% a possibly voluminous element by guarding cycles o% length 1 5ith the test ;i f %NiO;+ and H2I 5e
terminate the iteration at n U 1 on the grounds that 5hen n U 1 elements o% a permutation are in their correct place+
the n#th one is also. *sing the code above+ this leads to
%or i &V 1 to n U 1 do i% i f %NiO then rotate the cycle starting at inde) i+ updating %
")ercise
$mplement cycle rotation using the ;to; representation. =int: *se the s5ap primitive rather than element
assignment.
Algorithms and Data Structures D< A ,lobal Te)t
,. -rdered sets
Cycle clipping
Cycle clipping is the key to elegant in#place permutation using the ;to; representation. At each step+ 5e clip an
arbitrary element d out o% an arbitrary cycle o% length d 1+ thus reducing the latter;s length by 1. As sho5n in ")hibit
9.@+ 5e place d at its destination+ 5here it %orms a cycle o% length 1 that needs no %urther processing. The element it
displaces+ c+ can %ind a HtemporaryI home in the cell vacated by d. $t is probably out o% place there+ but no more so
than it 5as at its previous homeL its time 5ill come to be relocated to its %inal destination. .ince 5e have permuted
elements+ 5e must update the permutation array to re%lect accurately the permutation yet to be per%ormed. This is a
local operation in the vicinity o% the t5o elements that 5ere s5apped+ some5hat like tightening a belt by one notch
Yall but t5o o% the elements in the clipped cycle remain una%%ected. The ")hibit belo5 sho5s an e)ample. $n order
to e)ecute the permutation H1 < 3I H2 @I+ 5e clip d %rom its cycle H1 < 3I by placing d at its destination (N3O+ thus
bumping c into the vacant cell (N<O. This amounts to representing the cycle H1 < 3I as a product o% t5o shorter
cycles& the s5ap H3 <I+ 5hich can be done right a5ay+ and the cycle H1 <I to be e)ecuted later. The cycle H2 @I remains
una%%ected. The ovals in ")hibit 9.@ indicate that corresponding entries o% ( and t are moved together. ")hibit 9.>
sho5s 5hat happens to a cycle clipped by a s5ap
PP { tFiG, @FiG -.- tFtFiGG, @FtFiGG }
")hibit 9.@& Clipping one element out o% a cycle o% a permutation.
")hibit 9.>& "%%ect o% a s5ap caused by the condition i f tNiO.
D@
Cycles o% length 1 are le%t alone+ and the absence o% cycles o% length d 1 signals termination. Thus the %ollo5ing
condition action statement+ iterated as long as the condition i f tNiO can be met+ e)ecutes a permutation
represented in the array t&
i-i T tFiG PP = tFiG, @FiG -.- tFtFiGG, @FtFiGG >
:e use the multiple s5ap operator QQ a &V& b 5ith the meaning& evaluate all %our e)pressions using the original
values o% all the variables involved+ then per%orm all %our assignments simultaneously. $t can be implemented using
si) single assignments and t5o au)iliary variables+ one o% type 1 .. n+ the other o% type ;elt;. "ach s5ap places Hat
leastI one element into its %inal position+ say 3+ 5here it is guarded %rom any %urther s5aps by virtue o% 3 V tN3O. Thus
the nondeterministic algorithm above e)ecutes at most n U 1 s5aps& :hen n U 1 elements are in %inal position+ the
n#th one is also.
The conditions on i can be checked in any order+ as long as they are checked e)haustively+ %or e)ample&
{ (' (& 4 j A ' j ! t7j8 }
for i -. " to n 0 " do
{ (& (& 4 j A i j ! t7j8 }
while i T tFiG do PP { tFiG, @FiG -.- tFtFiGG, @FtFiGG }
{ (5 (& 4 j 4 i j ! t7j8 }
{ (# (& 4 j 4 n D & j ! t7j8 }
/or each value o% i+ i is the le%tmost position o% the cycle that passes through i. As the 5hile loop reduces this
cycle to cycles o% length 1+ all s5aps involve i and tNiO d i+ as asserted by the invariant H1I H1 ` 3 e $I 3 V tN3O+ 5hich
precedes the 5hile loop. At completion o% the 5hile loop+ the assertion is strengthened to include i+ as stated in
invariant H2I H1 ` 3 ` $I 3 V tN3O. This reestablishes H1I %or the ne)t higher value o% i. The vacuously true assertion
H0I serves as the basis o% this proo% by induction. The %inal assertion H3I is 3ust a restatement o% assertion H2I %or the
last value o% i. .ince tN1O [ tNnO is a permutation o% 1 [n+ H3I implies that n V tNnO.
")ercise& cycle clipping using the ;%rom; representation
The nondeterministic algorithm e)pressed as a multiple assignment
QQ H1 ` i ` nI a (NiOb &V (N%NiOO b
is e0ually as valid %or the ;%rom; representation as its analog
QQ H1 ` i ` nI a (NtNiOO &V (NiO b
5as %or the ;to; representation. But in contrast to the latter+ the %ormer cannot be translated into a simple iteration
o% the condition action statement&
i& i f %NiO QQ a %NiO+ (NiO &V& %N%NiOO+ (N%NiOO b
:hy not? Can you salvage the idea o% cycle clipping using the ;%rom; representation
")ercises
1. :rite t5o %unctions that implement se0uential search+ one 5ith sentinel as sho5n in the %irst section+
E.e0uential searchE the other 5ithout sentinel. 'easure and compare their running time on random arrays
o% various si6es
2. 'easure and compare the running times o% se0uential search and binary search on random arrays o% si6e n+
%or n V 1 to n V 100. .e0uential search is obviously %aster %or small values o% n+ and binary search %or large n+
but 5here is the crossover? ")plain your observations.
Algorithms and Data Structures D> A ,lobal Te)t
%?& Strings
searching %or patterns in a string
%inite#state machine
'ost programming languages support simple operations on strings He.g. comparison+ concatenation+ e)traction+
searchingI. .earching %or a speci%ied pattern in a string Hte)tI is the computational kernel o% most string processing
operations. .everal e%%icient algorithms have been developed %or this potentially time#consuming operation. The
approach presented here is very generalL it allo5s searching %or a pattern that consists not only o% a single string+
but a set o% strings. The cardinality o% this set in%luences the storage space needed+ but not the time. $t leads us to
the concept o% a finite-state machine H%smI.
'ecogni,ing a pattern consisting of a single string
Problem: ,iven a HlongI string 6 V 61 62

[ 6n o% n characters and a Husually much shorterI string p V p
1

p
2

[ pm o%
m characters Hthe patternI+ %ind all HnonoverlappingI occurrences o% p in 6. By sliding a 5indo5 o% length m %rom
le%t to right along 6 and e)amining most characters 6
i m times 5e solve the problem using m K n comparisons. By
constructing a %inite#state machine %rom the pattern p it su%%ices to e)amine each character 6
i
e)actly once+ as sho5n
in ")hibit 10.1. "ach state corresponds to a pre%i) o% the pattern+ starting 5ith the empty pre%i) and ending 5ith the
complete pattern. The input symbols are the input characters 6
1
+ 6
2+ [ + 6
n
o% 6. $n the 3#th step the input character 6
3
leads %rom a state corresponding to the pre%i) p
1

p
2

[ p
i
to
the state 5ith pre%i) p1 p2 [ pi pi]1 i% 63 V pi]1
a di%%erent state Ho%ten the empty pre%i)+ hI i% 6
3
f p
i]1
")ample
p V barbara H")hibit 10.1I.
")hibit 10.1& .tate diagram sho5ing some o% the transitions. All other state transitions lead back to the
initial state.
!otice that the pattern ;barbara;+ although it sounds repetitive+ cannot overlap 5ith any part o% itsel%.
Constructing a %inite#state machine %or such a pattern is straight%or5ard. But consider a sel%#overlapping pattern
such as ;barbar;+ or ;abracadabra;+ or ;));+ 5here the %irst k d 0 characters are identical 5ith the last& The te)t
;barbarbar; contains t5o overlapping occurrences o% the pattern ;barbar;+ and ;)))); contains three occurrences o%
;));. A %inite#state machine constructed in an analogous %ashion as the one used %or ;barbara; al5ays %inds the %irst o%
1.. $trings
several overlapping occurrences but might miss some o% the later ones. As an e)ercise+ construct %inite#state
machines that detect all occurrences o% sel%#overlapping patterns.
'ecogni,ing a set of strings# a finite)state)machine interpreter
/inite#state machines H%sm+ also called E%inite automataEI are typically used to recogni6e patterns that consist o%
a set o% strings. An ade0uate treatment o% this more general problem re0uires introducing some concepts and
terminology 5idely used in computer science.
,iven a %inite set A o% input symbols+ the alphabet+ Ag denotes the Hin%initeI set o% all H%initeI strings over A+
including the nullstring . Any subset L Ag+ %inite or in%inite+ is called a set o% strings+ or a language$ over A.
+ecogni&ing a language L re%ers to the ability to e)amine any string 6 Ag+ one symbol at a time %rom le%t to right+
and deciding 5hether or not 6 L.
A deterministic %inite#state machine ' is essentially given by a %inite set . o% states+ a %inite alphabet A o% input
smbols+ and a transition function %& . ) A .. The state diagram depicts the states and the inputs+ 5hich lead
%rom one state to anotherL thus a %inite#state machine maps strings over A into se0uences o% states.
:hen treating any speci%ic problem+ it is typically use%ul to e)pand this minimal de%inition by speci%ying one or
more o% the %ollo5ing additional concepts. An initial state s
0
.+ a subset / . o% final or accepting states+ a %inite
alphabet B o% output smbols and an output function g& . B+ 5hich can be used to assign certain actions to the
states in .. :e use the concepts o% initial state s
0
and o% accepting states / to de%ine the notion Erecogni6ing a set o%
stringsE&
A set L Ag o% strings is recogni&ed or accepted by the %inite#state machine ' V H.+ A+ %+ s
0
+ /I i%% all the strings
in L+ and no others+ lead ' %rom s
0
to some state s /.
")ample
")hibit 10.3 sho5s the state diagram o% a %inite#state machine that recogni6es parameter lists as de%ined by the
synta) diagrams in ")hibit 10.2. L HletterI stands %or a character a .. 6+ ( HdigitI %or a digit 0 .. 9.
")hibit 10.2& .ynta) diagram o% simple parameter lists.
DD
")hibit 10.3& .tate diagram o% %inite#state machine to accept parameter lists. The
starting state is ;1;+ the single accepting state is ;D;.
A straight%or5ard implementation o% a %inite#state machine interpreter uses a transition matri) T to represent
the state diagram. /rom the current state s the input symbol c leads to the ne)t state TNs+ cO. $t is convenient to
introduce an error state that captures all illegal transitions. The transition matri) T corresponding to ")hibit 10.3
looks as %ollo5s&
8 represents a character a .. L.
@ represents a digit 0 .. 9.
N represents all characters that are not explicitly mentioned.
( ) : , ; L D !
0
1
2
3
4
5
6
7
8
0 0 0 0 0 0 0 0 0
1 2 0 0 0 0 0 0 0
2
4
4
5
7
7
8
0
0
0
0
0
0
0
0
0
0
0
8
8
0
0
5
5
0
0
0
0
0 0 3 0 0
3
0
6
6
0
0
2
2
0
0
0
0
0
0
2
2
0
0
3
0
0
6
0
0
0
0
0
0
0
0
error state
skip blank
left parenthesis read
reading variable identifier
skip blank
colon read
reading type identifier
skip blank
right parenthesis read
The %ollo5ing is a suitable environment %or programming a %inite#state#machine interpreter&
const nstate . 2& { number of states, without error state }
type state . 0 .. nstate& { ' ! error state, & ! initial state }
inchar . , , .. ,Z,& { $E consecutive <19FF characters }
tmatrix . arrayFstate, incharG of state&
var T- tmatrix&
A%ter initiali6ing the transition matri) T+ the procedure ;silent%sm; interprets the %inite#state machine de%ined by
T. $t processes the se0uence o% input characters and 3umps around in the state space+ but it produces no output.
procedure silentfsm*var T- tmatrix+&
var s- state& c- inchar&
egin
s -. "& { initial state }
while s T 0 do { read*c+& s -. TFs, cG }
1.. $trings
end&
The simple structure o% ;silent%sm; can be employed %or a use%ul %inite#state#machine interpreter in 5hich
initiali6ation+ error condition+ input processing and transitions in the state space are handled by procedures or
%unctions ;init%sm;+ ;alive;+ ;processinput;+ and ;transition; 5hich have to be implemented according to the desired
behavior. The terminating procedure ;terminate; should print a message on the screen that con%irms the correct
termination o% the input or sho5s an error condition.
procedure fsmsim*var T- tmatrix+&
var C &
egin
initfsm&
while alive do { processinput& transition }&
terminate
end&
")ercise& %inite#state recogni6er %or multiples o% 3
Consider the set o% strings over the alphabet a0+ 1b that represent multiples o% 3 5hen interpreted as binary
numbers+ such as& 0+ 00+ 11+ 00011+ 110. (esign t5o %inite#state machines %or recogni6ing this set&
/eft to right: 'lr reads the strings %rom most signi%icant bit to least signi%icant.
+ight to left: 'rl reads the strings %rom least signi%icant bit to most signi%icant.
.olution
/eft to right: Let rk be the number represented by the k le%tmost bits+ and let b be the Hk ] 1I#st bit+ interpreted
as an integer. Then rk]1 V 2Krk

] b. The states correspond to rk mod 3 H")hibit 10.<I. .tarting state and accepting
state& 0;.
")hibit 10.<& /inite#state machine computes remainder modulo 3 le%t to right.
+ight to left: rk]1

V bK2
k
] rk. .ho5 by induction that the po5ers o% 2 are alternatingly congruent to 1 and 2
modulo 3 Hi.e. 2
k
mod 3 V 1 %or k even+ 2
k
mod 3 V 2 %or k oddI. Thus 5e need a modulo 2 counter+ 5hich appears in
")hibit 10.@ as t5o ro5s o% three states each. .tarting state& 0. Accepting states& 0 and 0;.
90
")hibit 10.@& /inite#state machine computes remainder modulo 3 right to le%t.
")ercises and programming pro3ects
1. (ra5 the state diagram o% several %inite#state machines+ each o% 5hich searches a string 6 %or all
occurrences o% an interesting pattern 5ith repetitive parts+ such as ;abaca; or ;Caracas;.
2. (ra5 the state diagram o% %inite#state machines that detect all occurrences o% a sel%#overlapping pattern
such as ;abracadabra;+ ;barbar;+ or ;));.
3. /inite#state recogni6er %or various days&
(esign a %inite#state machine %or automatic recognition o% the set o% nine 5ords&
,monday,,,tuesday,,,wednesday,,,thursday,,
,friday,, ,saturday,, ,sunday,, ,day,, ,daytime,
in a te)t. The underlying alphabet consists o% the lo5ercase letters ;a; .. ;6; and the blank. (ra5 the state
diagram o% the %inite#state machineL identi%y the initial state and indicate accepting states by a double circle.
$t su%%ices to recogni6e membership in the set 5ithout recogni6ing each 5ord individually.
<. $mplementation o% a pattern recogni6er&
.ome use%ul procedures and %unctions re0uire no parameters+ hence most programming languages
incorporate the concept o% an empty parameter list. There are t5o reasonable synta) conventions about
ho5 to 5rite the headers o% parameterless procedures and %unctions&
*"+ procedure p& function f- T&
*$+ procedure p*+&function f*+- T&
")amples& 2ascal uses convention H1IL 'odula#2 allo5s both H1I and H2I %or procedures+ but only H2I %or
%unction procedures.
/or each convention H1I and H2I+ modi%y the synta) diagram in ")hibit 10.2 to allo5 empty parameter lists+
and dra5 the state diagrams o% the corresponding %inite#state machines.
@. .tandard 2ascal de%ines parameter lists by means o% the synta) diagram sho5n in ")hibit 10.>.
1.. $trings
")hibit 10.>& .ynta) diagram %or standard 2ascal parameter lists.
(ra5 a state diagram %or the corresponding %inite#state machine. /or brevity;s sake+ consider the reserved 5ords
;%unction;+ ;var; and ;procedure; to be atomic symbols rather than strings o% characters.
92
%%& @atrices and graphs#
transitive closure
atomic versus structured ob3ects
directed versus undirected graphs
transitive closure
ad3acency and connectivity matri)
boolean matri) multiplication
e%%iciency o% an algorithm. asymptotic notation
:arshallis algorithm
5eighted graph
minimum spanning tree
$n any systematic presentation o% data ob3ects+ it is use%ul to distinguish primitive or atomic ob#ects %rom
composite or structured ob#ects. $n each o% the preceding chapters 5e have seen both types& A bit+ a character+ or an
identi%ier is usually considered primitiveL a 5ord o% bits+ a string o% characters+ an array o% identi%iers is naturally
treated as composite. Be%ore proceeding to the most common primitive ob3ects o% computation+ numbers+ let us
discuss one o% the most important types o% structured ob3ects+ matrices. "ven 5hen matrices are %illed 5ith the
simplest o% primitive ob3ects+ bits+ they generate interesting problems and use%ul algorithms.
!aths in a graph
.ynta) diagrams and state diagrams are e)amples o% a type o% ob3ect that abounds in computer science& A graph
consists o% nodes or vertices+ and o% edges or arcs that connect a pair o% nodes. !odes and edges o%ten have
additional in%ormation attached to them+ such as labels or numbers. $% 5e 5ish to treat graphs mathematically+ 5e
need a de%inition o% these ob3ects.
%irected graph. Let ! be the set o% n elements a1+ 2+ [ + nb and " a binary relation& " ! !+ also denoted by
an arro5+ . Consider ! to be the set o% nodes o% a directed graph ,+ and " the set o% arcs Hdirected edgesI. A
directed graph , may be represented by its ad#acenc matri* A H")hibit 11.1I+ an n n boolean matri) 5hose
elements ANi+ 3O determine the e)istence o% an arc %rom i to 3&
ANi+ 3O V true i%% i 3.
An arc is a path o% length 1. /rom A 5e can derive all paths o% any length. This leads to a relation denoted by a
double arro5+ + called the transitive closure o% "&
i 3+ i%% there e)ists a path %rom i to 3
Hi.e. a se0uence o% arcs i i
1
+ i
1
i
2
+ i
2
i
3
+ [ + i
k
3I. :e accept paths o% length 0 Hi.e. i i %or all iI. This
relation is represented by a matri) CV AH")hibit 11.1I&
11. /atrices and graphs: transitive closure
CNi+ 3O V true i%% i 3.
C stands %or connectivit or reachabilit matri*> C V A
is also called transitive hull or transitive closure+ since

it is the smallest transitive relation that EenclosesE ".
")hibit 11.1& ")ample o% a directed graph 5ith its ad3acency and connectivity matri).
&'ndirected( graph. $% the relation " ! ! is smmetric Ni.e. %or every ordered pair Hi+ 3I o% nodes it also
contains the opposite pair H3+ iIO 5e can identi%y the t5o arcs Hi+ 3I and H3+ iI 5ith a single edge+ the unordered pair Hi+
3I. Books on graph theory typically start 5ith the de%inition o% undirected graphs Hgraphs+ %or shortI+ but 5e treat
them as a special case o% directed graphs because the latter occur much more o%ten in computer science. :hereas
graphs are based on the concept o% an edge bet5een t5o nodes+ directed graphs embody the concept o% one#5ay
arcs leading from a node to another one.
7oolean matri/ multiplication
Let A+ B+ C be n n boolean matrices de%ined by
type nnoolean- arrayF" .. n, " .. nG of oolean&
var O, ;, <- nnoolean&
The boolean matri) multiplication C V A K B is de%ined as and implemented by
and implemented by
procedure mm*var a, , c- nnoolean+&
var i, /, k- integer&
egin
for i -. " to n do
for / -. " to n do egin
cFi, /G -. false&
for k -. " to n do cFi, /G -. cFi, /G or *aFi, kG and
Fk, /G+ *+
end
end&
!emark) 8emember Hin the section+ R2ascal and its dialects& Lingua %ranca o% computer scienceSI that 5e
usually assume the boolean operations ;or; and ;and; to be conditional Hi.e. their arguments are evaluated only as %ar
as necessary to determine the value o% the e)pressionI. An e)tension o% this simple idea leads to an alternative 5ay
o% coding boolean matri) multiplication that speeds up the innermost loop above %or large values o% n. ")plain 5hy
the %ollo5ing code is e0uivalent to HI&
k-."&
9<
1
2
3
4
5
1
2
3
4
5
1 2 3 4 5
T
T
T
T
T
T
T
T
T
T T T
T
T
T
T T
T
T
T T
C
1
2
3
4
5
1 2 3 4 5
T
T
T
T
T
T
A
while not cFi, /G and *k J n+ do = cFi, /G -. aFi, kG and Fk,
/G& k -. k 9 " >
'ultiplication also de%ines po5ers+ and this gives us a %irst solution to the problem o% computing the transitive
closure. $% A
l]1
denotes the L#th po5er o% A+ the %ormula
has a clear interpretation& There e)ists a path o% length L ] 1 %rom i to 3 i%%+ %or some node k+ there e)ists a path o%
length L %rom i to k and a path o% length 1 Ha single arcI %rom k to 3. Thus A
2
represents all paths o% length 2L in
general+ A
L
represents all paths o% length L+ %or L Z 1&
A
L
Ni+ 3O V true i%% there e)ists a path o% length L %rom i to 3.
8ather than dealing directly 5ith the ad3acency matri) A+ it is more convenient to construct the matri) A; V A or
$. The identity matri) $ has the values ;true; along the diagonal+ ;%alse; every5here else. Thus in A; all diagonal
elements A;Ni+ iO V true. Then A;L describes all paths o% length ` L Hinstead o% e)actly e0ual to LI+ %or L Z 0.
There%ore+ the transitive closure is A
= A;
Hn#1I
The e%%iciency o% an algorithm is o%ten measured by the number o% EelementaryE operations that are e)ecuted on
a given data set. The e)ecution time o% an elementary operation Ne.g. the binary boolean operators Hand+ orI used
aboveO does not depend on the operands. To estimate the number o% elementary operations per%ormed in boolean
matri) multiplication as a %unction o% the matri) si6e n+ 5e concentrate on the leading terms and neglect the lesser
terms. Let us use asymptotic notation in an intuitive 5ayL it is de%ined %ormally in 2art $F.
The number o% operations Hand+ orI+ e)ecuted by procedure ;mmb; 5hen multiplying t5o boolean n n matrices
is Hn
3
I since each o% the nested loops is iterated n times. Aence the cost %or computing A;
HnU1I
by repeatedly
multiplying 5ith A; is Hn
<
I. This algorithm can be improved to Hn
3
K log nI by repeatedly s0uaring& A;
2
+ A;
<
+ A;
D
+ [ +
A;
k
5here k is the smallest po5er o% 2 5ith k Z n U 1. $t is not necessary to compute e)actly A;
HnU1I
. $nstead o% A;
13
+ %or
e)ample+ it su%%ices to compute A;
1>
+ the ne)t higher po5er o% 2+ 5hich contains all paths o% length at most 1>. $n a
graph 5ith 1< nodes+ this set is e0ual to the set o% all paths o% length at most 1.
Warshall+s algorithm
$n search o% a %aster algorithm 5e consider other 5ays o% iterating over the set o% all paths. $nstead o% iterating
over paths o% gro5ing length+ 5e iterate over an increasing number o% nodes that may be used along a path %rom
node i to node 3. This idea leads to an elegant algorithm due to :arshall N:ar >2O&
Compute a se0uence o% matrices B0+ B1+ B2+ [ + Bn&
B
0
Ni+ 3O V A;Ni+ 3O V true i%% i V 3 or i 3.
B
1
Ni+ 3O V true i%% i 3 using at most node 1 along the 5ay.
B
2
Ni+ 3O V true i%% i 3 using at most nodes 1 and 2 along the 5ay
[
B
k
Ni+ 3O V true i%% i 3 using at most nodes 1+ 2+ [ + k along the 5ay.
The matrices B0+ B1+ [ e)press the e)istence o% paths that may touch an increasing number o% nodes along the
5ay %rom node i to node 3L thus Bn talks about unrestricted paths and is the connectivity matri) C V Bn.
An iteration step BkU1 Bk is computed by the %ormula
BkNi+ 3O V BkU1Ni+ 3O or HBkU1Ni+ kO and BkU1Nk+ 3OI.
The cost %or per%orming one step is Hn
2
I+ the cost %or computing the connectivity matri) is there%ore Hn
3
I. A
comparison o% the %ormula %or :arshall;s algorithm 5ith the %ormula %or matri) multiplication sho5s that the n#ary
;C8; has been replaced by a binary ;or;.
At %irst sight+ the %ollo5ing procedure appears to e)ecute the algorithm speci%ied above+ but a closer look reveals
that it e)ecutes something else& the assignment in the innermost loop computes ne5 values that are used
immediately+ instead o% the old ones.
procedure warshall*var a- nnoolean+&
egin
for k -. " to n do
for i -. " to n do
for / -. " to n do
aFi, /G -. aFi, /G or *aFi, kG and aFk, /G+
{ this assignment mixes values of the old and new matrix }
end&
A more thorough e)amination+ ho5ever+ sho5s that this EnaivelyE programmed procedure computes the correct
result in#place more e%%iciently than 5ould direct application o% the %ormulas %or the matrices Bk. :e encourage you
to veri%y that the replacement o% old values by ne5 ones leaves intact all values needed %or later stepsL that is+ sho5
that the %ollo5ing e0ualities hold&
B
k
Ni+ kO V B
kU1
Ni+ kO and B
k
Nk+ 3O V B
kU1
Nk+ 3O.
")ercise& distances in a directed graph+ /loyd;s algorithm
'odi%y :arshall;s algorithm so that it computes the shortest distance bet5een any pair o% nodes in a directed
graph 5here each arc is assigned a length Z 0. :e assume that the data is given in an n n array o% reals+ 5here dNi+
3O is the length o% the arc bet5een node i and node 3. $% no arc e)ists+ then dNi+ 3O is set to _+ a constant that is the
largest real number that can be represented on the given computer. :rite a procedure ;dist; that 5orks on an array
d o% type
type nnreal . arrayF" .. n, " .. nG of real&
Think o% the meaning o% the boolean operations ;and; and ;or; in :arshall;s algorithm+ and %ind arithmetic
operations that play an analogous role %or the problem o% computing distances. ")plain your reasoning in 5ords
and pictures.
.olution
The %ollo5ing procedure ;dist; implements /loyd;s algorithm N/lo >2O. :e assume that the length o% a
none)istent arc is _+ that ) ] _ V _+ and that minH)+ _I V ) %or all ).
procedure dist*var d- nnreal+&
egin
for k -. " to n do
for i -. " to n do
for / -. " to n do
dFi, /G -. min*dFi, /G, dFi, kG 9 dFk, /G+
end&
9>
")ercise& shortest paths
$n addition to the distance dNi+ 3O o% the preceding e)ercise+ 5e 5ish to compute a shortest path %rom i to 3 Hi.e.
one that reali6es this distanceI. ")tend the solution above and 5rite a procedure ;shortestpath; that returns its
result in an array ;ne)t; o% type&
type nnn . arrayF" .. n, " .. nG of 0 .. n&
nextFi,/G contains the next node after i on a shortest path from i to
/, or 0 if no such path exists.
.olution
procedure shortestpath*var d- nnreal& var next- nnn+&
egin
for i -. " to n do
for / -. " to n do
if dFi, /G T R then nextFi, /G -. / else nextFi, /G -.
0&
for k -. " to n do
for i -. " to n do
for / -. " to n do
if dFi, kG 9 dFk, /G S dFi, /G then
{ dFi, /G -. dFi, kG 9 dFk, /G& nextFi, /G -. nextFi, kG
}
end&
$t is easy to prove that ne)tNi+ 3O V 0 at the end o% the algorithm i%% dNi+ 3O V _ Hi.e. there is no path %rom i to 3I.
@inimum spanning tree in a graph
Consider a weighted graph , V HF+ "+ 5I+ 5here F V av1+ [+ vnb is the set o% vertices+ " V ae1+ [ + emb is the set o%
edges+ each edge ei is an unordered pair Hv3+ vkI o% vertices+ and 5& " 8 assigns a real number to each edge+ 5hich
5e call its 5eight. :e consider only connected graphs ,+ in the sense that any pair Hv3+ vkI o% vertices is connected by
a se0uence o% edges. $n the %ollo5ing e)ample+ the edges are labeled 5ith their 5eight H")hibit 11.2I.
")hibit 11.2& ")ample o% a minimum spanning tree.
A tree T is a connected graph that contains no circuits& any pair Hv3+ vkI o% vertices in T is connected by a uni0ue
se0uence o% edges. A spanning tree o% a graph , is a subgraph T o% ,+ given by its set o% edges "T

"+ that is a tree
and satis%ies the additional condition o% being ma)imal+ in the sense that no edge in " j "T can be added to T
5ithout destroying the tree property. "bservation: a connected graph , has at least one spanning tree. The weight
o% a spanning tree is the sum o% the 5eights o% all its edges. A minimum spanning tree is a spanning tree o% minimal
5eight. $n ")hibit 11.2+ the bold edges %orm the minimal spanning tree.
Consider the %ollo5ing t5o algorithms&
*row)
"
T
&V L ? initiali&e to empt set @ 5hile T is not a spanning tree do "
T
&V "
T
aa min cost edge that does
not %orm a circuit 5hen added to "
T
b
+hrink)
"
T
&V "L ? initiali&e to set of all edges @ 5hile T is not a spanning tree do "
T
&V "
T
j aa ma) cost edge that
leaves T connected a%ter its removalb
0laim& The Egro5ing algorithmE and Eshrinking algorithmE determine a minimum spanning tree.
$% T is a spanning tree o% , and e V Hv3+ vkI "T+ 5e de%ine CktHe+ TI+ Ethe circuit %ormed by adding e to TE as the
set o% edges in "T that %orm a path %rom v3 to vk. $n the e)ample o% ")hibit 11.2 5ith the spanning tree sho5n in bold
edges 5e obtain CktHHv<+ v@I+ TI V aHv<+ v1I+ Hv1+ v2I+ Hv2+ v@Ib.
")ercise
.ho5 that %or each edge e "T there e)ists e)actly one such circuit. .ho5 that %or any e "T and any t CktHe+
TI the graph %ormed by H"T j atbI aeb is still a spanning tree.
A local minimum spanning tree o% , is a spanning tree T 5ith the property that there e)ist no t5o edges e "T +
t CktHe+ TI 5ith 5HeI e 5HtI.
Consider the %ollo5ing ;e)change algorithm;+ 5hich computes a local minimum spanning tree&
,xchange)
T &V any spanning treeL
5hile there e)ists e "
T
+ t CktHe+ TI 5ith 5HeI e 5HtI do
"
T
&V H"
T
j atbI aebL ? e*change @
Theorem: A local minimum spanning tree %or a graph , is a minimum spanning tree.
/or the proo% o% this theorem 5e need&
Lemma: $% T; and TE are arbitrary spanning trees %or ,+ T; f TE+ then there e)ist eE "
T
;
+ e; "
T
E
+ such that eE
CktHe;+ TEI and e; CktHeE+ T;I.
Proof: .ince T; and TE are spanning trees %or , and T; f TE+ there e)ists eE "
T
E
j "
T
;
. Assume that CktHeE+ T;I
T
E
. Then eE and the edges in CktHeE+ T;I %orm a circuit in TE that contradicts the assumption that TE is a tree. Aence
there must be at least one e; CktHeE+ T;I j "
T
E
.
Assume that %or all e; CktHeE+ T;I j "
T
E
5e have eE CktHe;+ TEI. Then
%orms a circuit in TE that contradicts the proposition that TE is a tree. Aence there must be at least one e; CktHeE+
T;I j "TE 5ith eE CktHe;+ TEI.
9D
Proof of the Theorem: Assume that T; is a local minimum spanning tree. Let TE be a minimum spanning tree.
$% T; f TE the lemma implies the e)istence o% e; CktHeE+ T;I j "TE and eE CktHe;+ TEI j "T;.
$% 5He;I e 5HeEI+ the graph de%ined by the edges H"TE j aeEbI ae;b is a spanning tree 5ith lo5er 5eight than TE.
.ince TE is a minimum spanning tree+ this is impossible and it %ollo5s that
5He;I Z5 HeEI. HI
$% 5He;I d 5HeEI+ the graph de%ined by the edges H"T; j ae;bI aeEb is a spanning tree 5ith lo5er 5eight than T;.
.ince TE is a local minimum spanning tree+ this is impossible and it %ollo5s that
5He;I ` 5HeEI. HI
/rom HI and HI it %ollo5s that 5He;I V 5HeEI must hold. The graph de%ined by the edges H"TE j aeEbI ae;b is
still a spanning tree that has the same 5eight as TE. :e replace TE by this ne5 minimum spanning tree and
continue the replacement process. .ince T; and TE have only %initely many edges the process 5ill terminate and TE
5ill become e0ual to T;. This proves that TE is a minimum spanning tree.
The theorem implies that the tree computed by ;")change; is a minimum spanning tree.
")ercises
1. Consider ho5 to e)tend the transitive closure algorithm based on boolean matri) multiplication so that it
computes HaI distances and HbI a shortest path.
2. 2rove that the algorithms ;,ro5; and ;.hrink; compute local minimum spanning trees. Thus they are
minimum spanning trees by the theorem o% the section entitled R'inimum spanning tree in a graphS.
%-& "ntegers
integers and their operations
"uclidean algorithm
.ieve o% "ratosthenes
large integers
modular arithmetic
Chinese remainder theorem
random numbers and their generators
:perations on integers
/ive basic operations account %or the lion;s share o% integer arithmetic&
] U K div mod
The product ;) K y;+ the 0uotient ;) div y;+ and the remainder ;) mod y; are related through the %ollo5ing div-mod
identit&
H1I H) div yI K y ] H) mod yI V ) %or y f 0.
'any programming languages provide these %ive operations+ but un%ortunately+ ;mod; tends to behave
di%%erently not only bet5een di%%erent languages but also bet5een di%%erent implementations o% the same language.
Ao5 come have 5e not learned in school 5hat the remainder o% a division is?
The div#mod identity+ a cornerstone o% number theory+ de%ines ;mod; assuming that all the other operations are
de%ined. $t is mostly used in the conte)t o% nonnegative integers ) Z 0+ y d 0+ 5here everything is clear+ in particular
the convention 0 ` ) mod y e y. Cne hal% o% the domain o% integers consists o% negative numbers+ and there are good
reasons %or e)tending all %ive basic operations to the domain o% all integers H5ith the possible e)ception o% y V 0I+
such as&
Any operation 5ith an unde%ined result hinders the portability and testing o% programs& i% the E%orbiddenE
operation does get e)ecuted by mistake+ the computation may get into nonrepeatable states. ")ample& %rom
a practical point o% vie5 it is better not to leave ;) div 0; unde%ined+ as is customary in mathematics+ but to
de%ine the result as ;V over%lo5;+ a %eature typically supported in hard5are.
.ome algorithms that 5e usually consider in the conte)t o% nonnegative integers have natural e)tensions
into the domain o% all integers Hsee the %ollo5ing sections on ;gcd; and modular number representationsI.
*n%ortunately+ the attempt to e)tend ;mod; to the domain o% integers runs into the problem mentioned above&
Ao5 should 5e de%ine ;div; and ;mod;? Let;s %ollo5 the standard mathematical approach o% listing desirable
properties these operations might possess. $n addition to the EsacredE div#mod identity H1I 5e consider&
H2I .ymmetry o% div& HU)I div y V ) div HUyI V UH) div yI.
The most plausible 5ay to e)tend ;div; to negative numbers.
12. 0ntegers
H3I A constraint on the possible values assumed by ;) mod y;+ 5hich+ %or y d 0+ reduces to the convention o%
nonnegative remainders&
0 ` ) mod y e y.
This is important because a standard use o% ;mod; is to partition the set o% integers into y residue classes. :e
consider a 5eak and a strict re0uirement&
H3;I !umber o% residue classes V \y\& %or given y and varying )+ ;) mod y; assumes e)actly \y\ distinct values.
H3EI $n addition+ 5e ask %or nonnegative remainders& 0 ` ) mod y e \y\.
2ondering the conse0uences o% these desiderata+ 5e soon reali6e that ;div; cannot be e)tended to negative
arguments by means o% symmetry. "ven the relatively innocuous case o% positive denominator y d 0 makes it
impossible to preserve both H2I and H3EI+ as the %ollo5ing %ailed attempt sho5s&
HHU3I div 2I K 2 ] HHU3I mod 2I ?V? U3 2reserving H1I
HUH3 div 2II K 2 ] 1 ?V? U3 and using H2I and H3EI
HU1I K 2 ] 1 f U3 [ %ailsM
"ven the 5eak condition H3;I+ 5hich 5e consider essential+ is incompatible 5ith H2I. /or y V U2+ it %ollo5s %rom
H1I and H2I that there are three residue classes modulo HU2I& ) mod HU2I yields the values 1+ 0+ U1L %or e)ample+
1 mod HU2I V 1+ 0 mod HU2I V 0+ HU1I mod HU2I V U1.
This does not go 5ith the %act that ;) mod 2; assumes only the t5o values 0+ 1. .ince a reasonable partition into
residue classes is more important than the super%icially appealing symmetry o% ;div;+ 5e have to admit that H2I 5as
3ust 5ish%ul thinking.
:ithout giving any reasons+ N-nu 73aO Hsee the chapter E8educing a task to given primitivesL programming
motionI de%ines ;mod; by means o% the div#mod identity H1I as %ollo5s&
) mod y V ) U y K ) Q y+ i% y f 0L ) mod 0 V )L
Thus he implicitly de%ines ) div y V ) Q y+ 5here 6+ the E%loorE o% 6+ denotes the largest integer ` 6L the EceilingE
6 denotes the smallest integer Z 6. -nuth e)tends the domain o% ;mod; even %urther by de%ining E) mod 0 V )E.
:ith the e)ception o% this special case y V 0+ -nuth;s de%inition satis%ies H3;I& !umber o% residue classes V \y\. The
de%inition does not satis%y H3EI+ but a slightly more complicated condition. /or given y f 0+ 5e have 0 ` ) mod y e y+
i% y d 0L and 0 Z ) mod y d y+ i% y e 0. -nuth;s de%inition o% ;div; and ;mod; has the added advantage that it holds %or
real numbers as 5ell+ 5here ;mod; is a use%ul operation %or e)pressing the periodic behavior o% %unctions Ne.g. tan )
V tan H) mod IO.
")ercise& another de%inition o% ;div; and ;mod;
1. .ho5 that the de%inition
in con3unction 5ith the div#mod identity H1I meets the strict re0uirement H3EI.
101
.olution
")ercise
/ill out comparable tables o% values %or -nuth;s de%inition o% ;div; and ;mod;.
.olution

The 6uclidean algorithm
A %amous algorithm %or computing the greatest common divisor HgcdI o% t5o natural numbers appears in Book 7
o% "uclid;s "lements Hca. 300 BCI. $t is based on the identity gcdHu+ vI V gcdHu U v+ vI+ 5hich can be used %or u d v to
reduce the si6e o% the arguments+ until the smaller one becomes 0.
:e use these properties o% the greatest common divisor o% t5o integers u and v d 0&
gcdHu+ 0I V u By convention this also holds %or u V 0.
gcdHu+ vI V gcdHv+ uI 2ermutation o% arguments+ important %or the termination o% the %ollo5ing procedure.
gcdHu+ vI V gcdHv+ u U 0 K vI /or any integer 0.
The %ormulas above translate directly into a recursive procedure&
12. 0ntegers
egin
if v . 0 then return*u+ else return*gcd*v, u mod v++
end&
A test %or the relative si6e o% u and v is unnecessary. $% initially u e v+ the %irst recursive call permutes the t5o
arguments+ and therea%ter the %irst argument is al5ays larger than the second.
This simple and concise solution has a relatively high implementation cost. A stack+ introduced to manage the
recursive procedure calls+ consumes space and time. $n addition to the operations visible in the code Htest %or
e0uality+ assignment+ and ;mod;I+ hidden stack maintenance operations are e)ecuted. There is an e0ually concise
iterative version that re0uires a bit more thinking and 5riting+ but is signi%icantly more e%%icient&
var r- integer&
egin
while v T 0 do { r -. u mod v& u -. v& v -. r }&
return*u+
end&
The prime number sieve of 6ratosthenes
The oldest and best#kno5n algorithm o% type sieve is named a%ter "ratosthenes Hca. 200 BCI. A set o% elements is
to be separated into t5o classes+ the EgoodE ones and the EbadE ones. As is o%ten the case in li%e+ bad elements are
easier to %ind than good ones. A sieve process successively eliminates elements that have been recogni6ed as badL
each element eliminated helps in identi%ying %urther bad elements. Those elements that survive the epidemic must
be good.
.ieve algorithms are o%ten applicable 5hen there is a striking asymmetry in the comple)ity or length o% the
proo%s o% the t5o assertions Ep is a good elementE and Ep is a bad elementE. This theme occurs prominently in the
comple)ity theory o% problems that appear to admit only algorithms 5hose time re0uirement gro5s %aster than
polynomially in the si6e o% the input H!2 completenessI. Let us illustrate this asymmetry in the case o% prime
numbers+ %or 5hich "ratosthenes; sieve is designed. $n this analogy+ EprimeE is EgoodE and EnonprimeE is EbadE.
A prime is a positive integer greater than 1 that is divisible only by 1 and itsel%. Thus primes are de%ined in terms
o% their lack o% an easily veri%ied property& a prime has no %actors other than the t5o trivial ones. To prove that 1 >7@
307 <19 is not prime+ it su%%ices to e)hibit a pair o% %actors&
1 >7@ 307 <19 V 1 23< @>7 K 1 3@7.
This veri%ication can be done by hand. The proo% that 2
17
U 1 is prime+ on the other hand+ is much more elaborate.
$n general H5ithout kno5ledge o% any special property this particular number might haveI one has to veri%y+ %or
each and every number that 0uali%ies as a candidate %actor+ that it is not a %actor. This is obviously more time
consuming than a mere multiplication.
")hibiting %actors through multiplication is an e)ample o% 5hat is sometimes called a Eone#5ayE or EtrapdoorE
%unction& the %unction is easy to evaluate H3ust one multiplicationI+ but its inverse is hard. $n this conte)t+ the
inverse o% multiplication is not division+ but rather %actori6ation. 'uch o% modern cryptography relies on the
di%%iculty o% %actori6ation.
The prime number sieve o% "ratosthenes 5orks as %ollo5s. :e mark the smallest prime+ 2+ and erase all o% its
multiples 5ithin the desired range 1 .. n. The smallest remaining number must be primeL 5e mark it and erase its
103
multiples. :e repeat this process %or all numbers up to n& $% an integer c e n can be %actored+ c V a K b+ then at least
one o% the %actors is en.
{ sieve of Gratosthenes marks all the primes in & "" n }
const n . C &
var sieve- packed array F$ .. nG of oolean&
p, sArtn, i- integer&
C
egin
for i -. $ to n do sieveFiG -. true& { initiali)e the
sieve }
sArtn -. trunc*sArt*n++&
{ it suffices to consider as divisors the numbers up to n }
p -. $&
while p J sArtn do egin
i -. p 1 p&
while i J n do { sieveFiG -. false& i -. i 9 p }&
repeat p -. p 9 " until sieveFpG&
end&
end&
Aarge integers
The range o% numbers that can be represented directly in hard5are is typically limited by the 5ord length o% the
computer. /or e)ample+ many small computers have a 5ord length o% 1> bits and thus limit integers to the range U
2
1@
` a e ]2
1@
V327>D. :hen the built#in number system is insu%%icient+ a variety o% so%t5are techni0ues are used to
e)tend its range. They di%%er greatly 5ith respect to their properties and intended applications+ but all o% them come
at an additional cost in memory and+ above all+ in the time re0uired %or per%orming arithmetic operations. Let us
mention the most common techni0ues.
%ouble-length or double-precision integers. T5o 5ords are used to hold an integer that s0uares the
available range as compared to integers stored in one 5ord. /or a 1>#bit computer 5e get 32#bit integers+ %or a 32#
bit computer 5e get ><#bit integers. Cperations on double#precision integers are typically slo5er by a %actor o% 2 to
<.
.ariable precision integers. The idea above is e)tended to allocate as many 5ords as necessary to hold a
given integer. This techni0ue is used 5hen the si6e o% intermediate results that arise during the course o% a
computation is unpredictable. $t calls %or list processing techni0ues to manage memory. The time o% an operation
depends on the si6e o% its arguments& linearly %or addition+ mostly 0uadratically %or multiplication.
Packed /0% integers. This is a compromise bet5een double precision and variable precision that comes %rom
commercial data processing. The programmer de%ines the ma)imal si6e o% every integer variable used+ typically by
giving the ma)imal number o% decimal digits that may be needed to e)press it. The compiler allocates an array o%
bytes to this variable that contains the %ollo5ing in%ormation& ma)imal length+ current length+ sign+ and the digits.
The latter are stored in BC( Hbinary#coded decimalI representation& a decimal digit is coded in < bits+ t5o o% them
are packed into a byte. 2acked BC( integers are e)pensive in space because most o% the time there is unused
allocated spaceL and even more so in time+ due to digit#by#digit arithmetic. They are unsuitable %or lengthy
scienti%icQtechnical computations+ but C- %or $QC#intensive data processing applications.
12. 0ntegers
@odular number systems# the poor man+s large integers
'odular arithmetic is a special#purpose techni0ue 5ith a narro5 range o% applications+ but is e)tremely e%%icient
5here it appliesYtypically in combinatorial and number#theoretic problems. $t handles addition+ and particularly
multiplication+ 5ith une0ualed e%%iciency+ but lacks e0ually e%%icient algorithms %or division and comparison.
Certain combinatorial problems that re0uire high precision can be solved 5ithout divisions and 5ith %e5
comparisonsL %or these+ modular numbers are unbeatable.
Chinese Remainder Theorem: Let m1+ m2+ [ + mk be pair5ise relatively prime positive integers+ called
moduli. Let m V m1 K m2 K [ K mk be their product. ,iven k positive integers r1+ r2+ [ + rk+ called residues+ 5ith 0 ` ri

e
mi %or 1 ` i ` rk+ there e)ists e)actly one integer r+ 0 ` r e m+ such that r mod mi

V ri %or 1 ` i ` k.
The Chinese remainder theorem is used to represent integers in the range 0 ` r e m uni0uely as k#tuples o% their
residues modulo mi. :e denote this number representation by
r k Nr1+ r2+ [ + rkO.
The practicality o% modular number systems is based on the %ollo5ing %act& The arithmetic operations H] + U + KI
on integers r in the range 0 ` re m are represented by the same operations+ applied component5ise to k#tuples Nr1+
r2+ [ + rkO. A modular number system replaces a single ]+ U+ or K in a large range by k operations o% the same type in
small ranges.
!f r [ Fr
"
, r
$
, C , r
k
G, s [ Fs
"
, s
$
, C , s
k
G, t [ Ft
"
, t
$
, C , t
k
G,
then-
*r 9 s+mod m . t *r
i
9 s
i
+ mod m
i
. t
i
for " J i J k,
*r 0 s+mod m . t *r
i
0 s
i
+ mod m
i
. t
i
for " J i J k,
*r 1 s+mod m . t *r
i
1 s
i
+ mod m
i
. t
i
for " J i J k.
")ample
m1 V 2 and m2 V @+ hence m V m1 K m2 V 2 K @ V 10. $n the %ollo5ing table the numbers r in the range 0 .. 9 are
represented as pairs modulo 2 and modulo @.
Let r V 2 and s V 3+ hence r K s V >. $n modular representation& r k N0+ 2O+ s k N1+ 3O+ hence r K s k N0+ 1O.
A use%ul modular number system is %ormed by the moduli
m
1
V 99+ m
2
V 100+ m
3
V 101+ hence m V m
1
K m
2
K m
3
V 999900.
!early a million integers in the range 0 ` r e 999900 can be represented. The conversion o% a decimal number
to its modular %orm is easily computed by hand by adding and subtracting pairs o% digits as %ollo5s&
r mod 99& Add pairs o% digits+ and take the resulting sum mod 99.
r mod 100& Take the least signi%icant pair o% digits.
r mod 101& Alternatingly add and subtract pairs o% digits+ and take the result mod 101.
The largest integer produced by operations on components is 100
2
k 2
13
L it is smaller than 2
1@
V 327>D k 32k and
thus causes no over%lo5 on a computer 5ith 1>#bit arithmetic.
10@
")ample
r V 123<@>
r mod 99 V H@> ] 3< ] 12I mod 99 V 3
r mod 100 V @>
r mod 101 V H@> U 3< ] 12I mod 101 V 3<
r k N3+ @>+ 3<O
s V >@<321
s mod 99 V H21 ] <3 ] >@I mod 99 V 30
s mod 100 V 21
s mod 101 V H21 U <3 ] >@I mod 101 V <3
s k N30+ 21+ <3O
r ] s k N3+ @>+ 3<O ] N30+ 21+ <3O V N33+ 77+ 77O
'odular arithmetic has some shortcomings& division+ comparison+ over%lo5 detection+ and conversion to
decimal notation trigger intricate computations.
")ercise& /ibonacci numbers and modular arithmetic
The se0uence o% /ibonacci numbers
0+ 1+ 1+ 2+ 3+ @+ D+ 13+ 21+ 3<+ @@+ D9+ 1<<+ 233+ [
is de%ined by
)0 V 0+ )1 V 1+ )n V )nU1 ] )nU2 %or n Z 2.
:rite HaI a recursive %unction HbI an iterative %unction that computes the n#th element o% this se0uence. *sing
modular arithmetic+ compute /ibonacci numbers up to 10
D
on a computer 5ith 1>#bit integer arithmetic+ 5here the
largest integer is 2
1@
U 1 V 327>7.
HcI *sing moduli m1 V 999+ m2 V 1000+ m3 V 1001+ 5hat is the range o% the integers that can be represented
uni0uely by their residues Nr1+ r2+ r3O 5ith respect to these moduli?
HdI (escribe in 5ords and %ormulas ho5 to compute the triple Nr1+ r1+ r3O that uni0uely represents a number
r in this range.
HeI 'odi%y the %unction in HbI to compute /ibonacci numbers in modular arithmetic 5ith the moduli 999+
1000+ and 1001. *se the declaration
type triple . array F" .. %G of integer&
and write the procedure
procedure modfi*n- integer& var r- triple+&
.olution
*a+ function fi*n- integer+- integer&
egin
if n J " then return*n+ else return*fi*n 0 "+ 9 fi*n 0 $++
end&
*+ function fi*n- integer+- integer&
var p, A, r, i- integer&
egin
if n J " then return*n+
12. 0ntegers
else egin
p -. 0& A -. "&
for i -. $ to n do { r -. p 9 A& p -. A& A -. r }&
return*r+
end
end&
HcI The range is 0 .. m U 1 5ith m V m1 K m2 K m3 V 999 999 000.
HdI r V d1 K 1 000 000 ] d2 K 1000 ] d
3
5ith 0 ` d1+ d2+ d3 ` 999
1 000 000 V 999 999 ] 1V 1001 K 999 ] 1
1000 V 999 ] 1 V 1001 U 1
r1 V r mod 999 V Hd1 ] d2 ] d3I mod 999
r2 V r mod 1000 V d3
r3 V r mod 1001 V Hd1 U d2 ] d3I mod 1001
*e+ procedure modfi*n- integer& var r- triple+&
var p, A- triple&
i, /- integer&
egin
if n J " then
for / -. " to % do rF/G -. n
else egin
for / -. " to % do { pF/G -. 0& AF/G -. " }&
for / -. " to % do r F/G -. *pF/G 9 AF/G+ mod *992 9 /+&
p -. A& A -. r
end
end
end&
'andom numbers
The collo0uial meaning o% the term at random o%ten implies EunpredictableE. But random numbers are used in
scienti%icQtechnical computing in situations 5here unpredictability is neither re0uired nor desirable. :hat is
needed in simulation+ in sampling+ and in the generation o% test data is not unpredictability but certain statistical
properties. A random number generator is a program that generates a se0uence o% numbers that passes a number
o% speci%ied statistical tests. Additional re0uirements include& it runs %ast and uses little memoryL it is portable to
computers that use a di%%erent arithmeticL the se0uence o% random numbers generated can be reproduced Hso that a
test run can be repeated under the same conditionsI.
$n practice+ random numbers are generated by simple %ormulas. The most 5idely used class+ linear congruential
generators+ given by the %ormula
ri]1 V Ha K ri ] cI mod m
are characteri6ed by three integer constants& the multiplier a+ the increment c+ and the modulus m. The se0uence is
initiali6ed 5ith a seed r0.
All these constants must be chosen care%ully. Consider+ as a bad e)ample+ a %ormula designed to generate
random days in the month o% /ebruary&
r0 V 0+ ri]1 V H2 K ri ] 1I mod 2D.
$t generates the se0uence 0+ 1+ 3+ 7+ 1@+ 3+ 7+ 1@+ 3+ [ . .ince 0 ` ri e m+ each generator o% the %orm above
generates a se0uence 5ith a pre%i) o% length e m 5hich is %ollo5ed by a period o% length ` m. $n the e)ample+ the
107
pre%i) 0+ 1 o% length 2 is %ollo5ed by a period 3+ 7+ 1@ o% length 3. *sually 5e 5ant a long period. 8esults %rom
number theory assert that a period o% length m is obtained i% the %ollo5ing conditions are met&
m is chosen as a prime number.
Ha U 1I is a multiple o% m.
m does not divide c.
")ample
r
0
. 0, r
i9"
. *2 1 r
i
9 "+ mod )
generates a seAuence- 0, ", $, %, #, (, ', 0, C with a period of length
).
.hall 5e accept this as a se0uence o% random integers+ and i% not+ 5hy not? .hould 5e pre%er the se0uence <+ 1+ >+
2+ 3+ 0+ @+ <+ [ ?
/or each application o% random numbers+ the programmerQanalyst has to identi%y the important statistical
properties re0uired. *nder normal circumstances these include&
!o periodicity over the length o% the se0uence actually used. E*ample: to generate a se0uence o% 100 random
5eekdays a.u+ 'o+ [ + .atb+ do not pick a generator 5ith modulus 7+ 5hich can generate a period o% length at
most 7L pick one 5ith a period much longer than 100.
A desired distribution+ most o%ten the uni%orm distribution. $% the range 0 .. m U 1 is partitioned into k e0ually
si6ed intervals $1+ $2+ [ + $k+ the numbers generated should be uni%ormly distributed among these intervalsL this must
be the case not only at the end o% the period Hthis is trivially so %or a generator 5ith ma)imal period mI+ but %or any
initial part o% the se0uence.
'any 5ell#kno5n statistical tests are used to check the 0uality o% random number generators. The run test Hthe
lengths o% monotonically increasing and monotonically decreasing subse0uences must occur 5ith the right
%re0uenciesIL the gap test Hgiven a test interval called the EgapE+ ho5 many consecutively generated numbers %all
outside?IL the permutation test Hpartition the se0uence into subse0uences o% t elementsL there are tM possible
relative orderings o% elements 5ithin a subse0uenceL each o% these orderings should occur about e0ually o%tenI.
")ercise& visuali6ation o% random numbers
:rite a program that lets its user enter the constants a+ c+ m+ and the seed r0 %or a linear congruential generator+
then displays the numbers generated as dots on the screen& A pair o% consecutive random numbers is interpreted as
the H)+ yI#coordinates o% the dot. Xou 5ill observe that most generators you enter have obvious %la5s& our visual
system is an e)cellent detector o% regular patterns+ and most regularities correspond to undesirable statistical
properties.
The point made above is substantiated in N2' DDO.
The %ollo5ing simple random number generator and some o% its properties are easily memori6ed&
r0 V 1+ ri]1

V 12@ K ri mod D192.
1. D192 V 2
13
+ hence the remainder mod D192 is represented by the 13 least signi%icant bits.
2. 12@ V 127 U 2 V H1111101I in binary representation.
3. Arithmetic can be done 5ith 1>#bit integers 5ithout over%lo5 and 5ithout regard to the representation o%
negative numbers.
12. 0ntegers
<. The numbers rk generated are e)actly those in the range 0 ` rk

e D192 5ith rk

mod < V 1 Hi.e. the period has
length 211 V 20<DI.
@. $ts statistical properties are described in N-ru >9O+ N-nu D1O contains the most comprehensive treatment o%
the theory o% random number generators.
As a conclusion o% this brie% introduction+ remember an important rule o% thumb&
$ever choose a random number generator at random2
")ercises
1. :ork out the details o% implementing double#precision+ variable#precision+ and BC( integer arithmetic+ and
estimate the time re0uired %or each operation as compared to the time o% the same operation in single
precision. /or variable precision and BC(+ introduce the length L o% the representation as a parameter.
2. The least common multiple HlcmI o% t5o integers u and v is the smallest integer that is a multiple o% u and v.
(esign an algorithm to compute lcmHu+ vI.
3. The prime decomposition o% a natural number n d 0 is the Huni0ueI multiset 2(HnI V Np1+ p2+ [ + pkO o%
primes pi 5hose product is n. A multiset di%%ers %rom a set in that elements may occur repeatedly He.g.
2(H12I V N2+ 2+ 3OI. (esign an algorithm to compute 2(HnI %or a given n d 0.
<. :ork out the details o% modular arithmetic 5ith moduli 9+ 10+ 11.
@. Among the 9@ linear congruential random number generators given by the %ormula ri]1 V a K ri mod m+
5ith prime modulus m V 97 and 1 e a e 97+ %ind out ho5 many get dis0uali%ied Eat %irst sightE by a simple
visual test. Consider that the period o% these 8!,s is at most 97.
109
%0& 'eals
%loating#point numbers and their properties
pit%alls o% numeric computation
Aorner;s method
bisection
!e5ton;s method
8loating)point numbers
+eal numbers+ those declared to be o% type 8"AL in a programming language+ are represented as %loating#point
numbers on most computers. A %loating#point number 6 is represented by a HsignedI mantissa m and a HsignedI
e)ponent e 5ith respect to a base b& 6Vl mKb
le
He.g. 6V]0.11K2
U1
I. This section presents a very brie% introduction to
%loating#point arithmetic. :e recommend N,ol91O as a comprehensive survey.
/loating#point numbers can only appro)imate real numbers+ and in important 5ays+ they behave di%%erently.
The ma3or di%%erence is due to the %act that any %loating#point number system is a finite number sstem+ as the
mantissa m and the e)ponent e lie in a bounded range. Consider+ as a simple e)ample+ the %ollo5ing number
system&
6 V l0.b
1
b
2
K 2
le
+ 5here b
1
+ b
2
+ and e may take the values 0 and 1.
The number representation is not uni,ue& The same real number may have many di%%erent representations+
arranged in the %ollo5ing table by numerical value HlinesI and constant e)ponent HcolumnsI.
1.@ ] 0.11 K 2
]1
1.0 ] 0.10 K 2
]1
0.7@ ] 0.11 K 2
l0
0.@ ] 0.01 K 2
]1
] 0.10 K 2
l0
0.37@ ]0.11 K 2
U1
0.2@ ] 0.01 K 2
l0
]0.10 K 2
U1
0.12@ ]0.01 K 2
U1
0. ]0.00 K 2
]1

] 0.00 K 2
l0
]0.00 K 2
U1
The table is symmetric %or negative numbers. !otice the cluster o% representable numbers around 6ero. There
are only 1@ di%%erent numbers+ but 2
@
V 32 di%%erent representations.
")ercise& a %loating#point number system
Consider %loating#point numbers represented in a >#bit E5ordE as %ollo5s& The %our bits b b
2
b
1
b
0
represent a
signed mantissa+ the t5o bits e e
0
a signed e)ponent to the base 2. "very number has the %orm )Vb b
2
b
1
b
0
K2
ee0
.
13. Reals
Both the e)ponent and the mantissa are integers represented in 3%s complement form. This means that the integer
values U2..1 are assigned to the %our di%%erent representations e e
0
as sho5n&
v e e
0
0 0 0
1 0 1
U2 1 0
U1 1 1
1. Complete the %ollo5ing table o% the values o% the mantissa and their representation+ and 5rite do5n a
%ormula to compute v %rom b b
2

b
1

b
0
.
v b b
2
b
1
b
0
0 0 0 0 0
1 0 0 0 1
[
7 0 1 1 1
UD 1 0 0 0
[
U1 1 1 1 1
2. Ao5 many di%%erent number representations are there in this %loating#point system?
3. Ao5 many di%%erent numbers are there in this system? (ra5 all o% them on an a)is+ each number 5ith all its
representations.
Cn a byte#oriented machine+ %loating#point numbers are o%ten represented by < bytes V32 bits& 2< bits %or the
signed mantissa+ D bits %or the signed e)ponent. The mantissa m is o%ten interpreted as a %raction 0 ` m e 1+ 5hose
precision is bounded by 23 bitsL the D#bit e)ponent permits scaling 5ithin the range
2
U12D
` 2
e
` 2
127
. Because 32# and ><#bit %loating#point number systems are so common+ o%ten coe)isting on the
same hard5are+ these number systems are o%ten identi%ied 5ith Esingle precisionE and Edouble precisionE+
respectively. $n recent years an $""" standard %ormat %or#single precision %loating#point numbers has emerged+
along 5ith standards %or higher precisions& double+ single e)tended+ and double e)tended.
111
The %ollo5ing e)ample sho5s the representation o% the number
]1.011110 [ 0 K 2
U@<
in the $""" %ormat&
Some dangers
/loating#point computation is %raught 5ith problems that are hard to analy6e and control. *ne)pected results
abound+ as the %ollo5ing e)amples sho5. The %irst t5o use a binary %loating#point number system 5ith a signed 2#
bit mantissa and a signed 1#bit e)ponent. 8epresentable numbers lie in the range
U0.11 K 2
]1
` 6 ` ]0.11 K 2
]1
.
")ample& y ] ) V y and ) f 0
$t su%%ices to choose \)\ small as compared to \y\L %or e)ample+
) V 0.01 K 2
U1
+ y V 0.10 K 2
]1
.
The addition %orces the mantissa o% ) to be shi%ted to the right until the e)ponents are e0ual Hi.e. ) is represented
as 0.0001K2
]1
I. "ven i% the sum is computed correctly as 0.1001 K2
]1
in an accumulator o% double length+ storing the
result in memory 5ill %orce rounding& ) ] yV0.10K2
]1
Vy.
")ample& Addition is not associative& H) ] yI ] 6 f ) ] Hy ] 6I
The %ollo5ing values %or )+ y+ and 6 assign di%%erent values to the le%t and right sides.
Le%t side& H0.10 K 2
]1
] 0.10 K 2
U1
I ] 0.10 K 2
U1
V 0.10 K 2
]1
8ight side& 0.10 K 2
]1
] H0.10 K 2
U1
] 0.10 K 2
U1
I V 0.11 K 2
]1
A use%ul rule o% thumb helps prevent the loss o% signi%icant digits& Add the small numbers be%ore adding the large
ones.
,xample) &&x 3 y(
1
4 x
1
4 1xy( 5 y
1
6 17
Let;s evaluate this e)pression %or large \)\ and small \y\ in a %loating#point number system 5ith %ive decimal
digits.
) V 100.00+ y V .01000
) ] y V 100.01
H) ] yI
2
V 10002.0001+ rounded to %ive digits yields 10002.
)
2
V 10000.
H) ] yI
2
U )
2
V 2.???? H%our digits have been lostMI
2)y V 2.0000
H) ] yI
2
U )
2
U 2)y V 2.???? U 2.0000 V 0.?????
!o5 %ive digits have been lost+ and the result is meaningless.
,xample) numerical instability
8ecurrence relations %or se0uences o% numbers are prone to the phenomenon o% numerical instabilit. Consider
the se0uence
)
0
V 1.0+ )
1
V 0.@+ )
n]1 V 2.@ K )
n
U )
nU1
.
13. Reals
:e %irst solve this linear recurrence relation in closed %orm by trying )
i
Vr
i
%or rf0. This leads to r
n]1
V 2.@ K r
n
U
r
nU1
+ and to the 0uadratic e0uation 0 V r
2
U 2.@ K r ] 1+ 5ith the t5o solutions r V 2 and r V 0.@.
The general solution o% the recurrence relation is a linear combination&
)
i V a K 2
i
] b K 2
Ui
.
The starting values )
0
V 1.0 and )
1
V 0.@ determine the coe%%icients aV0 and bV1+ and thus the se0uence is given
e)actly as )
i
V 2
Ui
. $% the se0uence )
i
V 2
Ui
is computed by the recurrence relation above in a %loating#point number
system 5ith one decimal digit+ the %ollo5ing may happen&
)2 V 2.@ K 0.@ U 1 V0.2 Hrounding the e)act value 0.2@I+
)3 V 2.@ K 0.2 U 0.@ V0 Hrepresented e)actly 5ith one decimal digitI+
)< V 2.@ K 0 U 0.2 VU0.2 Hrepresented e)actly 5ith one decimal digitI+
)@ V 2.@ K HU0.2IU0 VU0.@ represented e)actly 5ith one decimal digitI+
)> V 2.@ K HU0.@IUHU0.2I V U1.0@ He)actI V U1.0 HroundedI+
)7 V 2.@ K HU1I U HU0.@I V U2.0 Hrepresented e)actly 5ith one decimal digitI+
)D V 2.@ K HU2IUHU1I V U<.0Hrepresented e)actly 5ith one decimal digitI.
As soon as the %irst rounding error has occurred+ the computed se0uence changes to the alternative solution )
i
V
a K 2
i
+ as can be seen %rom the doubling o% consecutive computed values.
")ercise& %loating#point number systems and calculations
HaI Consider a %loating#point number system 5ith t5o ternary digits t
1
+ t
2
in the mantissa+ and a ternary digit e
in the e)ponent to the base 3. "very number in this system has the %orm ) V .t
1
t
2
K 3
e
+ 5here t
1
+ t
2
+ and e
assume a value chosen amonga0+1+2b. (ra5 a diagram that sho5s all the di%%erent numbers in this system+
and %or each number+ all o% its representations. Ao5 many representations are there? Ao5 many di%%erent
numbers?
HbI 8ecall the series
5hich holds %or \)\ e 1+ %or e)ample+
*se this %ormula to e)press 1Q0.7 as a series o% po5ers.
Horner+s method
A polynomial o% n#th degree He.g. n V 3I is usually represented in the %orm
a
3
K )
3
] a
2
K )
2
] a
1
K ) ] a
0
but is better evaluated in nested %orm+
HHa
3
K ) ] a
2
I K ) ] a
1
I K ) ] a
0
.
113
The %irst %ormula needs n multiplications o% the %orm a
i
K )
i
and+ in addition+ nU1 multiplications to compute the
po5ers o% ). The second %ormula needs only n multiplications in total& The po5ers o% ) are obtained %or %ree as a
side e%%ect o% the coe%%icient multiplications.
The %ollo5ing procedure assumes that the Hn]1I coe%%icients a
i
are stored in a su%%iciently large array a o% type
;coe%%;&
type coeff . arrayF0 .. mG of real&
function horner*var a- coeff& n- integer& x- real+- real&
var i- integer& h- real&
egin
h -. aFnG&
for i -. n 0 " downto 0 do h -. h 1 x 9 aFiG&
return*h+
end&
7isection
Bisection is an iterative method %or solving e0uations o% the %orm %H)I V 0. Assuming that the %unction % & 8 8
is continuous in the interval Na+ bO and that %HaI K %HbI e 0+ a root o% the e0uation %H)I V 0 Ha 6ero o% %I must lie in the
interval Na+ bO H")hibit 13.1I. Let m be the midpoint o% this interval. $% %HmI V 0+ m is a root. $% %HmI K %HaI e 0+ a root
must be contained in Na+ mO+ and 5e proceed 5ith this subintervalL i% %HmI K %HbI e 0+ 5e proceed 5ith Nm+ bO. Thus at
each iteration the interval of uncertaint that must contain a root is hal% the si6e o% the interval produced in the
previous iteration. :e iterate until the interval is smaller than the tolerance 5ithin 5hich the root must be
determined.
")hibit 13.1& As in binary search+ bisection e)cludes hal% o% the interval
under consideration at every step.
function isect*function f- real& a, - real+- real&
const epsilon . "0
0'
&
var m- real& faneg- oolean&
egin
faneg -. f*a+ S 0.0&
repeat
m -. *a 9 + P $.0&
if *f*m+ S 0.0+ . faneg then a -. m else -. m
until Wa 0 W S epsilon&
return*m+
13. Reals
end&
A se,uence )
1
+ )
2
+ )
3
+[ converging to ) converges linearl i% there e)ist a constant c and an inde) i
0
such that %or
all $ d i
0
& \)
i]1
U )\ ` c K \)
i
U )\. An algorithm is said to converge linearly i% the se0uence o% appro)imations
constructed by this algorithm converges linearly. $n a linearly convergent algorithm each iteration adds a constant
number o% signi%icant bits. /or e)ample+ each iteration o% bisection halves the interval o% uncertainty in each
iteration Hi.e. adds one bit o% precision to the resultI. Thus bisection converges linearly 5ith c V 0.@. A se0uence )
1
+
)
2
+ )
3
+[ converges ,uadraticall i% there e)ist a constant c and an inde) i
0
such that %or all i d i
0
& \)
i]1
U )\ ` c K\)
i
U
)\
2
.
Ne*ton+s method for computing the s1uare root
!e5ton;s method %or solving e0uations o% the %orm %H)I V 0 is an e)ample o% an algorithm 5ith 0uadratic
convergence. Let %& 8 8 be a continuous and di%%erentiable %unction. An appro)imation )
i]1
is obtained %rom )
i
by
appro)imating %H)I in the neighborhood o% )
i
by its tangent at the point H)
i
+ %H)
i
II+ and computing the intersection o%
this tangent 5ith the )#a)is H")hibit 13.2I. Aence
x
i
x
i+1
f(x )
i
x
")hibit 13.2& !e5ton;s iteration appro)imates a curve locally by a tangent.
!e5ton;s method is not guaranteed to converge HE*ercise: construct countere)amplesI+ but 5hen it converges+ it
does so 0uadratically and there%ore very %ast+ since each iteration doubles the number o% signi%icant bits.
To compute the s0uare root ) V ma o% a real number a d 0 5e consider the %unction %H)I V )
2
U a and solve the
e0uation )
2
U a V 0. :ith %;H)IV 2 K ) 5e obtain the iteration %ormula&
The %ormula that relates )
i
and )
i]1
can be trans%ormed into an analogous %ormula that determines the
propagation o% the relative error&
11@
.ince
5e obtain %or the relative error&
*sing
5e get a recurrence relation %or the relative error&
$% 5e start 5ith )
0
d 0+ it %ollo5s that 1]8
0
d 0. Aence 5e obtain
8
1
d 8
2
d 8
3
d [ d 0.
As soon as 8
i
becomes small Hi.e. 8
i
n 1I+ 5e have 1 ] 8
i
o 1+ and 5e obtain
8i]1 o o.@ K 8i
2
!e5ton;s method converges 0uadratically as soon as )
i
is close enough to the true solution. :ith a bad initial
guess 8
i
p 1 5e have+ on the other hand+ 1 ] 8
i
o 8
i
+ and 5e obtain 8
i]1
o 0.@ K 8
i
Hi.e. the computation appears to
converge linearly until 8
i

n 1 and proper 0uadratic convergence startsI.
Thus it is highly desirable to start 5ith a good initial appro)imation )
0
and get 0uadratic convergence right %rom
the beginning. :e assume normali6ed binary %loating#point numbers Hi.e. a V m K 2
e
5ith 0.@ ` m e1I. A good
appro)imation o% is obtained by choosing any mantissa c 5ith 0.@ ` c e 1 and halving the e)ponent&
$n order to construct this initial appro)imation )
0
+ the programmer needs read and 5rite access not only to a
Ereal numberE but also to its components+ the mantissa and e)ponent+ %or e)ample+ by procedures such as
procedure mantissa*L- real+- integer&
procedure exponent*L- real+- integer&
procedure uildreal*mant, exp- integer+- real&
Today;s programming languages o%ten lack such %acilities+ and the programmer is %orced to use backdoor tricks
to construct a good initial appro)imation. $% )
0
can be constructed by halving the e)ponent+ 5e obtain the %ollo5ing
upper bounds %or the relative error&
8
1
e 2
U2
+ 8
2
e 2
U@
+ 8
3
e 2
U11
+ 8
<
e 2
U23
+ 8
@
e 2
U<7
+8
>
e 2
U9@
.
13. Reals
$t is remarkable that %our iterations su%%ice to compute an e)act s0uare root %or 32#bit %loating#point numbers+
5here 23 bits are used %or the mantissa+ one bit %or the sign and eight bits %or the e)ponent+ and that si) iterations
5ill do %or a Enumber cruncherE 5ith a 5ord length o% >< bits. The starting value )
0
can be %urther optimi6ed by
choosing c care%ully. $t can be sho5n that the optimal value o% c %or computing the s0uare root o% a real number is c
V 1Q2 o 0.707.
")ercise& s0uare root
Consider a %loating#point number system 5ith t5o decimal digits in the mantissa& "very number has the %orm )
V l .d
1
d
2
K 10
le
.
HaI Ao5 many di%%erent number representations are there in this system?
HbI Ao5 many di%%erent numbers are there in this system? .ho5 your reasoning.
HcI Compute m@0 K 10
2
in this number system using !e5ton;s method 5ith a starting value )
0
V 10. .ho5 every
step o% the calculation. 8ound the result o% any operation to t5o digits immediately.
.olution
HaI A number representation contains t5o sign bits and three decimal digits+ hence there are 22 K 103 V <000
distinct number representations in this system.
HbI There are three sources o% redundancy&
1. 'ultiple representations o% 6ero
2. ")ponent ]0 e0uals e)ponent U0
3. .hi%ted mantissa& l.d0 K 10 leVl.0d K 10 le ] 1
A detailed count reveals that there are 3<39 di%%erent numbers.
4ero has 2
2
K10 V <0 representations+ all o% the %orm l.00K10
le
+ 5ith t5o sign bits and one decimal digit e to be
%reely chosen. There%ore+ r
1

V 39 must be subtracted %rom <000.
$% e V 0+ then l.d
1
d
2
K 10
]0
Vl.d
1
d
2
K 10
U0
. :e assume %urthermore that d
1
d
2

f 00. The case d
1
d
2

V 00 has been
covered above. Then there are 2 K 99 such pairs. There%ore+ r
2
V 19D must be subtracted %rom <000.
$% d
2

V 0+ then l.d
1
0 K 10
le
V l.0d
1

K 10
le]1
. The case d
1
V 0 has been treated above. There%ore+ 5e assume that d
1
f 0. .ince le can assume the 1D di%%erent values U9+ UD+ [ + U1+ ]0+ ]1+ [ ]D+ there are 2 K 9 K 1D such pairs.
There%ore+ r
3

V 32< must be subtracted %rom <000.
There are <000 U r
1
U r
2
U r
3

V 3<39 di%%erent numbers in this system.
HcI Computing !e5ton;s s0uare root algorithm&
)
0
V 10
)
1
V .@0 K H10 ] @0Q10I V .@0 K H10 ] @I V .@0 K 1@ V 7.@
)
2
V .@0 K H7.@ ] @0Q7.@I V .@0 K H7.@ ] >.>I V .@0 K 1< V 7
)
3
V .@0 K V .@0 K H7 ] @0Q7I V H7 ] 7.1I V .@0 K 1< V 7
117
")ercises
1. :rite up all the distinct numbers in the %loating#point system 5ith number representations o% the %orm
6V0.b
1
b
2
K 2
e1e2
+ 5here b
1
+ b
2
and e
1
+ e
2
may take the values 0 and 1+ and mantissa and e)ponent are
represented in 2;s complement notation.
2. 2rovide simple numerical e)amples to illustrate %loating#point arithmetic violations o% mathematical
identities.
%2& Straight lines and circles
intersection o% t5o line segments
degenerate con%igurations
clipping
digiti6ed lines and circles
Bresenham;s algorithms
braiding straight lines
2oints are the simplest geometric ob3ectsL straight lines and line segments come ne)t. Together+ they make up
the lion;s share o% all primitive ob3ects used in t5o#dimensional geometric computation He.g. in computer graphicsI.
*sing these t5o primitives only+ 5e can appro)imate any curve and dra5 any picture that can be mapped onto a
discrete raster. $% 5e do so+ most 0ueries about comple) %igures get reduced to basic 0ueries about points and line
segments+ such as& is a given point to the le%t+ to the right+ or on a given line? (o t5o given line segments intersect?
As simple as these 0uestions appear to be+ they must be handled e%%iciently and care%ully. "%%iciently because these
basic primitives o% geometric computations are likely to be e)ecuted millions o% times in a single program run.
Care%ully because the ubi0uitous phenomenon o% degenerate configurations easily traps the un5ary programmer
into over%lo5 or meaningless results.
"ntersection
The problem o% deciding 5hether t5o line segments intersect is une)pectedly tricky+ as it re0uires a
consideration o% three distinct nondegenerate cases+ as 5ell as hal% a do6en degenerate ones. .tarting 5ith
degenerate ob3ects+ 5e have cases 5here one or both o% the line segments degenerate into points. The code belo5
assumes that line segments o% length 6ero have been eliminated. :e must also consider nondegenerate ob3ects in
degenerate con%igurations+ as illustrated in ")hibit 1<.1. Line segments A and B intersect HstrictlyI. C and (+ and "
and /+ do not intersectL the intersection point o% the in%initely e)tended lines lies on C in the %irst case+ but lies
neither on " nor on / in the second case. The ne)t three cases are degenerate& , and A intersect barely Hi.e. in an
endpointIL $ and overlap Hi.e. they intersect in in%initely many pointsIL - and L do not intersect. Careless
evaluation o% these last t5o cases is likely to generate over%lo5.
")hibit 1<.1& Cases to be distinguished %or the segment intersection problem.
Computing the intersection point o% the in%initely e)tended lines is a naive approach to this decision problem
that leads to a three#step process&
14. $traight lines and circles
1. Check 5hether the t5o line segments are parallel Ha necessary precaution be%ore attempting to compute the
intersection pointI. $% so+ 5e have a degenerate con%iguration that leads to one o% three special cases& not
collinear+ collinear nonoverlapping+ collinear overlapping
2. Compute the intersection point o% the e)tended lines Hthis step is still sub3ect to numerical problems %or
lines that are almost parallelI.
3. Check 5hether this intersection point lies on both line segments.
$% all 5e 5ant is a yesQno ans5er to the intersection 0uestion+ 5e can save the e%%ort o% computing the
intersection point and obtain a simpler and more robust procedure based on the %ollo5ing idea& t5o line segments
intersect strictly i%% the t5o endpoints o% each line segment lie on opposite sides o% the in%initely e)tended line o% the
other segment.
Let L be a line given by the e0uation hH)+ yI V a K ) ] b K y ] c V 0+ 5here the coe%%icients have been normali6ed
such that a
2
] b
2
V 1. /or a line L given in this Aessean normal %orm+ and %or any point p V H)+ yI+ the %unction h
evaluated at p yields the signed distance bet5een p and L& hHpI d 0 i% p lies on one side o% L+ hHpI e 0 i% p lies on the
other side+ and hHpI V 0 i% p lies on L. A line segment is usually given by its endpoints H)1+ y1I and H)2+ y2I+ and the
Aessean normal %orm o% the in%initely e)tended line L that passes through H)1+ y1I and H)2+ y2I is
5here
is the length o% the line segment+ and hH)+ yI is the distance o% p V H)+ yI %rom L. T5o points p and 0 lie on opposite
sides o% L i%% hHpI K hH0I e 0 H")hibit 1<.2I. hHpI V 0 or hH0I V 0 signals a degenerate con%iguration. Among these+
hHpI V 0 and hH0I V 0 i%% the segment Hp+ 0I is collinear 5ith L.
")hibit 1<.2& .egment s+ its e)tended line L+ and distance to points p+ 0 as computed by %unction h.
type point . record x, y- real end&
segment . record p
"
, p
$
- point end&
function d*s- segment& p- point+- real&
{ computes h(p for the line 0 determined by s }
var dx, dy, 8
"$
- real&
120
egin
dx -. s.p
$
.x 0 s.p
"
.x& dy -. s.p
$
.y 0 s.p
"
.y&
8
"$
-. sArt*dx 1 dx 9 dy 1 dy+&
return**dy 1 *p.x 0 s.p
"
.x+ 0 dx 1 *p.y 0 s.p
"
.y++ P 8
"$
+
end&
To optimi6e the intersection %unction+ 5e recall the assumption L12 d 0 and notice that 5e do not need the actual
distance+ only its sign. Thus the %unction d used belo5 avoids computing L12. The %unction ;intersect; begins by
checking 5hether the t5o line segments are collinear+ and i% so+ tests them %or overlap by intersecting the intervals
obtained by pro3ecting the line segments onto the )#a)is Hor onto the y#a)is+ i% the segments are verticalI. T5o
intervals Na+ bO and Nc+ dO intersect i%% minHa+ bI ` ma)Hc+ dI and minHc+ dI ` ma)Ha+ bI. This condition could be
simpli%ied under the assumption that the representation o% segments and intervals is ordered E%rom le%t to rightE
Hi.e. %or interval Na+ bO 5e have a ` bI. :e do not assume this+ as line segments o%ten have a natural direction and
cannot be Eturned aroundE.
function d*s- segment& p- point+- real&
egin
return**s.p
$
.y 0 s.p
"
.y+ 1 *p.x 0 s.p
"
.x+ 0 *s.p
$
.x 0 s.p
"
.x+ 1
*p.y 0 s.p
"
.y++
end&
function overlap*a, , c, d- real+- oolean&
egin return**min*a, + J max*c, d++ and *min*c, d+ J max*a, +++
end&
function intersect*s
"
, s
$
- segment+- oolean&
var d
""
, d
"$
, d
$"
, d
$$
- real&
egin
d
""
-. d*s
"
, s
$
.p
"
+& d
"$
-. d*s
"
, s
$
.p
$
+&
if *d
""
. 0+ and *d
"$
. 0+ then { s
&
and s
5
are collinear }
if s
"
.p
"
.x . s
"
.p
$
.x then { vertical }
return*overlap*s
"
.p
"
.y, s
"
.p
$
.y, s
$
.p
"
.y, s
$
.p
$
.y++
else { not vertical }
return*overlap*s
"
.p
"
.x, s
"
.p
$
.x, s
$
.p
"
.x, s
$
.p
$
.x++
else egin { s
&
and s
5
are not collinear }
d
$"
-. d*s
$
, s
"
.p
"
+& d
$$
-. d*s
$
, s
"
.p
$
+&
return**d
""
1 d
"$
J 0+ and *d
$"
1 d
$$
J 0++
end
end&
$n addition to the degeneracy issues 5e have addressed+ there are numerical issues o% near#degeneracy that 5e
only mention. The length L12 is a condition number Hi.e. an indicator o% the computation;s accuracyI. As ")hibit 1<.3
suggests+ it may be numerically impossible to tell on 5hich side o% a short line segment L a distant point p lies.
")hibit 1<.3& A point;s distance %rom a segment ampli%ies the error o% the E5hich sideE computation.
Conclusion: A geometric algorithm must check %or degenerate con%igurations e)plicitlyYthe code that handles
con%igurations Ein general positionE 5ill not handle degeneracies.
lipping
The 5idespread use o% 5indo5s on graphic screens makes clipping one o% the most %re0uently e)ecuted
operations& ,iven a rectangular 5indo5 and a con%iguration in the plane+ dra5 that part o% the con%iguration 5hich
lies 5ithin the 5indo5. 'ost con%igurations consist o% line segments+ so 5e sho5 ho5 to clip a line segment given
by its endpoints H)1+ y1I and H)2+ y2I into a 5indo5 given by its %our corners 5ith coordinates ale%t+ rightb atop+
bottomb.
The position o% a point in relation to the 5indo5 is described by %our boolean variables& ll Hto the le%t o% the le%t
borderI+ rr Hto the right o% the right borderI+ bb Hbelo5 the lo5er borderI+ tt Habove the upper borderI&
type wcode . set of *ll, rr, , tt+&
A point inside the 5indo5 has the code ll V rr V bb V tt V %alse+ abbreviated 0000 H")hibit 1<.<I.
")hibit 1<.<& The clipping 5indo5 partitions the plane into nine regions.
The procedure ;classi%y; determines the position o% a point in relation to the 5indo5&
procedure classify*x, y- real& var c- wcode+&
egin
c -. Y& { empty set }
if x S left then c -. =ll> elsif x K right then c -. =rr>&
if y S ottom then c -. c => elsif y K top then c -. c
=tt>
end&
The procedure ;clip; computes the endpoints o% the clipped line segment and calls the procedure ;sho5line; to
dra5 it&
procedure clip*x
"
, y
"
, x
$
, y
$
- real+&
122
p
L
var c, c
"
, c
$
- wcode& x, y- real& outside- oolean&
egin { clip }
classify*x
"
, y
"
, c
"
+& classify*x
$
, y
$
, c
$
+& outside -. false&
while *c
"
T Y+ or *c
$
T Y+ do
if c
"
c
$
T Y then
{ line segment lies completely outside the window }
{ c
"
-. Y& c
$
-. Y& outside -. true }
else egin
c -. c
"
&
if c . Y then c -. c
$
&
if ll c then { segment intersects left }
{ y -. y
"
9 *y
$
0 y
"
+ 1 *left 0 x
"
+ P *x
$
0 x
"
+& x -. left }
elsif rr c then { segment intersects right }
{ y -. y
"
9 *y
$
0 y
"
+ 1 *right 0 x
"
+ P *x
$
0 x
"
+& x -. right }
elsif c then { segment intersects bottom }
{ x -. x
"
9 *x
$
0 x
"
+ 1 *ottom 0 y
"
+ P *y
$
0 y
"
+& y -. ottom }
elsif tt c then { segment intersects top }
{ x -. x
"
9 *x
$
0 x
"
+ 1 *top 0 y
"
+ P *y
$
0 y
"
+& y -. top }&
if c . c
"
then { x
"
-. x& y
"
-. y& classify*x, y, c
"
+ }
else { x
$
-. x& y
$
-. y& classify*x, y, c
$
+ }
end&
if not outside then showline*x
"
, y
"
, x
$
, y
$
+
end& { clip }
Dra*ing digiti,ed lines
A raster graphics screen is an integer grid o% pi)els+ each o% 5hich can be turned on or o%%. "uclidean geometry
does not apply directly to such a discreti6ed plane. Any designer using a CA( system 5ill pre%er "uclidean geometry
to a discrete geometry as a model o% the 5orld. The problem o% ho5 to appro)imate the "uclidean plane by an
integer grid turns out to be a hard 0uestion& Ao5 do 5e map "uclidean geometry onto a digiti6ed space in such a
5ay as to preserve the rich structure o% geometry as much as possible? Let;s begin 5ith simple instances& Ao5 do
you map a straight line onto an integer grid+ and ho5 do you dra5 it e%%iciently? ")hibit 1<.@ sho5s reasonable
e)amples.
")hibit 1<.@& (igiti6ed lines look like staircases.
Consider the slope m V Hy2

U y1I Q H)2

U )1I o% a segment 5ith endpoints p1 V H)1+ y1I and p2

V H)2+ y2I. $% \m\ ` 1 5e
5ant one pi)el blackened on each ) coordinateL i% \m\ Z 1+ one pi)el on each y coordinateL these t5o re0uirements
are consistent %or diagonals 5ith \m\ V 1. Consider the case \m\ ` 1. A unit step in ) takes us %rom point H)+ yI on the
line to H) ] 1+ y ] mI. .o %or each ) bet5een )1 and )2 5e paint the pi)el H)+ yI closest to the mathematical line
according to the %ormula y V roundHy1

] m K H) U )1II. /or the case \m\ d 1+ 5e reverse the roles o% ) and y+ taking a
unit step in y and incrementing ) by 1Qm. The %ollo5ing procedure dra5s line segments 5ith \m\ ` 1 using unit
steps in ).
procedure line*x
"
, y
"
, x
$
, y
$
- integer+&
var x, sx- integer& m- real&
egin
6aint6ixel*x
"
, y
"
+&
if x
"
T x
$
then egin
x -. x
"
& sx -. sgn*x
$
0 x
"
+& m -. *y
$
0 y
"
+ P *x
$
0 x
"
+&
while x T x
$
do
{ x -. x 9 sx& 6aint6ixel*x, round*y
"
9 m 1 *x 0 x
"
+++ }
end
end&
This straight%or5ard implementation has a number o% disadvantages. /irst+ it uses %loating#point arithmetic to
compute integer coordinates o% pi)els+ a costly process. $n addition+ rounding errors may prevent the line %rom
being reversible& reversibilit means that 5e paint the same pi)els+ in reverse order+ i% 5e call the procedure 5ith
the t5o endpoints interchanged. 8eversibility is desirable to avoid the %ollo5ing blemishes& that a line painted
t5ice+ %rom both ends+ looks thicker than other linesL 5orse yet+ that painting a line %rom one end and erasing it
%rom the other leaves spots on the screen. A 5eaker constraint+ 5hich is only concerned 5ith the result and not the
process o% painting+ is easy to achieve but is less use%ul.
Weak reversibilit is most easily achieved by ordering the points p1 and p2 le)icographically by ) and y
coordinates+ dra5ing every line %rom le%t to right+ and vertical lines %rom bottom to top. This solution is inade0uate
%or animation+ 5here the direction o% dra5ing is important+ and the se0uence in 5hich the pi)els are painted is
determined by the applicationYdra5ing the tra3ectory o% a %alling apple %rom the bottom up 5ill not do. Thus
interactive graphics needs the stronger constraint.
"%%icient algorithms+ such as Bresenham;s NBre >@O+ avoid %loating#point arithmetic and e)pensive
multiplications through incremental computation& .tarting 5ith the current point p1+ a ne)t point is computed as a
%unction o% the current point and o% the line segment parameters. $t turns out that only a %e5 additions+ shi%ts+ and
comparisons are re0uired. $n the %ollo5ing 5e assume that the slope m o% the line satis%ies \m\ ` 1. Let
) V )2 U )1+ s) V signH)I+ y V y
2
U y
1
+ sy V signHyI.
Assume that the pi)el H)+ yI is the last that has been determined to be the closest to the actual line+ and 5e no5
5ant to decide 5hether the ne)t pi)el to be set is H) ] s)+ yI or H) ] s)+ y ] syI. ")hibit 1<.> depicts the case s) V 1
and sy V 1.
")hibit 1<.>& At the ne)t coordinate ) ] s)+ 5e identi%y and paint the pi)el closest to the line.
Let t denote the absolute value o% the di%%erence bet5een y and the point 5ith abscissa ) ] s) on the actual line.
Then t is given by
12<
The value o% t determines the pi)el to be dra5n&
As the %ollo5ing e)ample sho5s+ reversibility is not an automatic conse0uence o% the geometric %act that t5o
points determine a uni0ue line+ regardless o% correct rounding or the order in 5hich the t5o endpoints are
presented. A problem arises 5hen t5o grid points are e0ually close to the straight line H")hibit 1<.7I.
")hibit 1<.7& Breaking the tie among e0uidistant grid points.
$% the tie is not broken in a consistent manner He.g. by al5ays taking the upper grid pointI+ the resulting
algorithm %ails to be reversible&
All the variables introduced in this problem range over the integers+ but the ratio
( y)
( x)
appears to introduce
rational e)pressions. This is easily remedied by multiplying everything 5ith ). :e de%ine the decision variable d as
d V \)\ K H2 K t U 1I V s) K ) K H2 K t U 1I. HI
Let di denote the decision variable 5hich determines the pi)el H)
HiI
+ y
HiI
I to be dra5n in the i#th step. .ubstituting t
and inserting ) V )
HiU1I
and y V y
HiU1I
in HI 5e obtain
di V s) K sy K H2K) K y1 ] 2 K H)
HiU1I
] s) U )1I K y U 2K) K y
HiU1I
U ) K syI
and
di]1 V s) K sy K H2Kx K y
1
] 2 K H)
HiI
] s) U )
1
I K y U 2K) K y
HiI
U ) K syI.
.ubtracting di %rom di]1+ 5e get
di]1 U di V s) K sy K H2 K H)
HiI
U )
HiU1I
I K y U 2 K ) K Hy
HiI
U y
HiU1I
II.
.ince )
HiI
U )
HiU1I
V s)+ 5e obtain
di]1 V di ] 2 K sy K y U 2 K s) K ) K sy K Hy
HiI
U y
HiU1I
I.
$% di e 0+ or di V 0 and sy V U1+ then y
HiI
V y
HiU1I
+ and there%ore
di]1 V di ] 2 K \y\.
$% di d 0+ or di V 0 and sy V 1+ then y
HiI
V y
HiU1I
] sy+ and there%ore
d
i]1
V d
i
] 2 K \y\ U 2 K \)\.
This iterative computation o% di]1 %rom the previous di lets us select the pi)el to be dra5n. The initial starting
value %or d1 is %ound by evaluating the %ormula %or di+ kno5ing that H)
H0I
+ y
H0I
I V H)1+ y1I. Then 5e obtain
d1 V 2 K \y\ U \)\.
The arithmetic needed to evaluate these %ormulas is minimal& addition+ subtraction and le%t shi%t Hmultiplication
by 2I. The %ollo5ing procedure implements this algorithmL it assumes that the slope o% the line is bet5een U1 and 1.
procedure ;resenham8ine*x
"
, y
"
, x
$
, y
$
- integer+&
var dx, dy, sx, sy, d, x, y- integer&
egin
dx -. Wx
$
0 x
"
W& sx -. sgn*x
$
0 x
"
+&
dy -. Wy
$
0 y
"
W& sy -. sgn*y
$
0 y
"
+&
d -. $ 1 dy 0 dx& x -. x
"
& y -. y
"
&
6aint6ixel*x, y+&
while x T x
$
do egin
if *d K 0+ or **d . 0+ and *sy . "++ then { y -. y 9 sy&0
$1dx}&
x -. x 9 sx& d -. d 9 $ 1 dy&
6aint6ixel*x, y+
end
end&
The riddle of the braiding straight lines
T5o straight lines in a plane intersect in at most one point+ right? $mportant geometric algorithms rest on this
5ell#kno5n theorem o% "uclidean geometry and 5ould have to be ree)amined i% it 5ere untrue. $s this theorem true
%or computer lines+ that is+ %or data ob3ects that represent and appro)imate straight lines to be processed by a
program? 2erhaps yes+ but mostly no.
Xes. $t is possible+ o% course+ to program geometric problems in such a 5ay that every pair o% straight lines has at
most+ or e)actly+ one intersection point. This is most readily achieved through symbolic computation. /or e)ample+
i% the intersection o% L1 and L2 is denoted by an e)pression ;$ntersectHL1+ L2I; that is never evaluated but simply
combined 5ith other e)pressions to represent a geometric construction+ 5e are %ree to postulate that ;$ntersectHL1+
L2I; is a point.
!o. /or reasons o% e%%iciency+ most computer applications o% geometry re0uire the immediate numerical
evaluation o% every geometric operation. This calculation is done in a discrete+ %inite number system in 5hich case
the theorem need not be true. This %act is most easily seen i% 5e 5ork 5ith a discrete plane o% pi)els+ and 5e
represent a straight line by the set o% all pi)els touched by an ideal mathematical line. ")hibit 1<.D sho5s three
digiti6ed straight lines in such a s0uare grid model o% plane geometry. T5o o% the lines intersect in a common
interval o% three pi)els+ 5hereas t5o others have no pi)el in common+ even though they obviously intersect.
12>
")hibit 1<.D& T5o intersecting lines may share none+ one+ or more pi)els.
:ith %loating#point arithmetic the situation is more complicatedL but the %act remains that the "uclidean plane
is replaced by a discrete set o% points embedded in the planeYall those points 5hose coordinates are representable
in the particular number system being used. ")perience 5ith numerical computation+ and the ha6ards o% rounding
errors+ suggests that the 0uestion E$n ho5 many points can t5o straight lines intersect?E admits the %ollo5ing
ans5ers&
There is no intersectionYthe mathematically correct intersection cannot be represented in the number
system.
A set o% points that lie close to each other& %or e)ample+ an interval.
Cver%lo5 aborts the calculation be%ore a result is computed+ even i% the correct result is representable in the
number system being used.
")ercise& t5o lines intersect in ho5 many points?
Construct e)amples to illustrate these phenomena 5hen using %loating#point arithmetic. Choose a suitable
system , o% %loating#point numbers and t5o distinct straight lines
ai K ) ] bi K y ] ci V 0 5ith ai+ bi+ ci ,+ iV1+ 2+
such that+ 5hen all operations are per%ormed in ,&
HaI There is no point 5hose coordinates )+ y , satis%y both linear e0uations.
HbI There are many points 5hose coordinates )+ y , satis%y both linear e0uations.
HcI There is e)actly one point 5hose coordinates )+ y , satis%y both linear e0uations+ but the straight%or5ard
computation o% ) and y leads to over%lo5.
HdI As a conse0uence o% HaI it %ollo5s that the de%inition Et5o lines intersect they share a common pointE is
inappropriate %or numerical computation. /ormulate a numerically meaning%ul de%inition o% the statement
Et5o line segments intersectE.
")ercise HbI may suggest that the points shared by t5o lines are neighbors. 2ictorially+ i% the slopes o% the t5o
lines are almost identical+ 5e e)pect to see a blurred+ elongated intersection. :e 5ill sho5 that 5orse things may
happen& t5o straight lines may intersect in arbitrarily many points+ and these points are separated by intervals in
5hich the t5o lines alternate in lying on top o% each other. Computer lines may be braidedM To understand this
phenomenon+ 5e need to clari%y some concepts& :hat e)actly is a straight line represented on a computer? :hat is
an intersection?
There is no one ans5er+ there are manyM Consider the analogy o% the mathematical concept o% real numbers+
de%ined by a)ioms. :hen 5e appro)imate real numbers on a computer+ 5e have a choice o% many di%%erent number
systems He.g. various %loating#point number systems+ rational arithmetic 5ith variable precision+ interval
arithmeticI. These systems are typically not de%ined by means o% a)ioms+ but rather in terms o% concrete
representations o% the numbers and algorithms %or e)ecuting the operations on these numbers. .imilarly+ a
computer line 5ill be de%ined in terms o% a concrete representation He.g. t5o points+ a point and a slope+ or a linear
e)pressionI. All 5e obtain depends on the %ormulas 5e use and on the basic arithmetic to operate on these
representations. The notion o% a straight line can be %ormali6ed in many di%%erent 5ays+ and although these are
likely to be mathematically e0uivalent+ they 5ill lead to data ob3ects 5ith di%%erent behavior 5hen evaluated
numerically. 2er%orming an operation consists o% evaluating a %ormula. .ubstituting a %ormula by a mathematically
e0uivalent one may lead to results that are topologically di%%erent+ because e0uivalent %ormulas may e)hibit
di%%erent sensitivities to5ard rounding errors.
Consider a computer that has only integer arithmetic+ i.e. 5e use only the operations ]+ U+ K+ div. Let 4 be the set
o% integers. T5o straight lines gi Hi V 1+ 2I are given by the %ollo5ing e0uations&
ai K ) ] bi K y ] ci V 0 5ith ai+ bi+ ci 4L bi

f 0.
:e consider the problem o% 5hether t5o given straight lines intersect in a given point )0. :e use the %ollo5ing
method& .olve the e0uations %or y Ni. e. y V "1H)I and y V "2H)IO and test 5hether "1H)0I is e0ual to "2H)0I.
$s this method suitable? /irst+ 5e need the %ollo5ing de%initions&
) 4 is a turn %or the pair H"1+ "2I i%%
signH"1H)I U "2H)II f signH"1H) ] 1I U "2H) ] 1II.
An algorithm %or the intersection problem is correct i%% there are at most t5o turns.
The intuitive idea behind this de%inition is the recognition that rounding errors may %orce us to deal 5ith an
intersection interval rather than a single intersection pointL but 5e 5ish to avoid separate intervals. The de%inition
above partitions the )#a)is into at most three dis3oint intervals such that in the le%t interval the %irst line lies above
or belo5 the second line+ in the middle interval the lines EintersectE+ and in the right interval 5e have the
complementary relation o% the le%t one H")hibit 1<.9I.
12D
")hibit 1<.9& (esirable consistency condition %or intersection o% nearly parallel lines.
Consider the straight lines&
3 K ) U @ K y ] <0 V 0 and 2 K ) U 3 K y ] 20 V 0
5hich lead to the evaluation %ormulas
Cur naive approach compares the e)pressions
H3 K ) ] <0I div @ and H2 K ) ] 20I div 3.
*sing the de%initions it is easy to calculate that the turns are
7+ D+ 10+ 11+ 12+ 1<+ 1@+ 22+ 23+ 2@+ 2>+ 27+ 29+ 30.
The straight lines have become step %unctions that intersect many times. They are braided H")hibit 1<.10I.
")hibit 1<.10& Braiding straight lines violate the consistency condition o% ")hibit 1<.9.
")ercise& sho5 that the straight lines
) U 2 K y V 0
k K ) U H2 K k ] 1I K y V 0 %or any integer k d 0
g
1
> g
1
g
2
=
g
1
g
2
< g
1
g
2
g
2
y
x
2 4 6 8 10 12 14 16 18 20 22 24 26 30 32 34 36 38
9
11
13
15
17
19
21
23
25
27
29
31
y
x
(3x + 40) div 5
(2x + 20) div 3
have 2 K k U 1 turns in the %irst 0uadrant.
$s braiding due merely to integer arithmetic? Certainly not& rounding errors also occur in %loating#point
arithmetic+ and 5e can construct even more pathological behavior. As an e)ample+ consider a %loating#point
arithmetic 5ith a t5o#decimal#digit mantissa. :e per%orm the evaluation operation&
and truncate intermediate results immediately to t5o decimal places. Consider the straight lines H")hibit 1<.11I
<.3 K ) U D.3 K y V 0+
1.< K ) U 2.7 K y V 0.
")hibit 1<.11& ")ample to be veri%ied by manual computation.
These e)amples 5ere constructed by intersecting straight lines 5ith almost the same slopeYa numerically ill#
conditioned problem. :hile 5orking 5ith integer arithmetic+ 5e made the mistake o% using the error#prone ;div;
operator. The comparison o% rational e)pressions does not re0uire division.
Let a1

K ) ] b1

K y ] c1

V 0 and a2

K ) ] b2

K y ] c2

V 0 be t5o straight lines. To %ind out 5hether they intersect at )0+
5e have to check 5hether the e0uality
holds. This is e0uivalent to b2

K c1

U b1

K c2

V )0

K Ha2

K b1

U a1

K b2I.
The last %ormula can be evaluated 5ithout error i% su%%iciently large integer arguments are allo5ed. Another 5ay
to evaluate this %ormula 5ithout error is to limit the si6e o% the operands. /or e)ample+ i% ai+ bi+ ci+ and )0 are n#digit
binary numbers+ it su%%ices to be able to represent 3n#digit binary numbers and to compute 5ith n#digit and 2n#
digit binary numbers.
These e)amples demonstrate that programming even a simple geometric problem can cause une)pected
di%%iculties. !umerical computation %orces us to rethink and rede%ine elementary geometric concepts.
130
0.37
0.39
0.73 0.77 0.81 0.85 0.89 0.93
0.41
0.43
0.45
0.47
y
x
1. 4
2. 7
4. 3
8. 3
Digiti,ed circles
The concepts+ problems and techni0ues 5e have discussed in this chapter are not at all restricted to dealing 5ith
straight linesYthey have their counterparts %or any kind o% digiti6ed spatial ob3ect. .traight lines+ de%ined by linear
%ormulas+ are the simplest nontrivial spatial ob3ects and thus best suited to illustrate problems and solutions. $n this
section 5e sho5 that the incremental dra5ing techni0ue generali6es in a straight%or5ard manner to more comple)
ob3ects such as circles.
The basic parameters that de%ine a circle are the center coordinates H)c+ ycI and the radius r. To simpli%y the
presentation 5e %irst consider a circle 5ith radius r centered around the origin. .uch a circle is given by the
e0uation
)
2
] y
2
V r
2
.
"%%icient algorithms %or dra5ing circles+ such as Bresenham;s NBre 77O+ avoid %loating#point arithmetic and
e)pensive multiplications through incremental computation& A ne5 point is computed depending on the current
point and on the circle parameters. Bresenham;s circle algorithm 5as conceived %or use 5ith pen plotters and
there%ore generates all points on a circle centered at the origin by incrementing all the 5ay around the circle. :e
present a modi%ied version o% his algorithm 5hich takes advantage o% the eight-wa smmetr o% a circle. $% H)+ yI is
a point on the circle+ 5e can easily determine seven other points lying on the circle H ")hibit 1<.12I. :e consider only
the <@^ segment o% the circle sho5n in the %igure by incrementing %rom ) V 0 to ) V y V r Q + and use eight#5ay
symmetry to display points on the entire circle.
")hibit 1<.12& "ight%old symmetry o% the circle.
Assume that the pi)el p V H)+ yI is the last that has been determined to be closest to the actual circle+ and 5e no5
5ant to decide 5hether the ne)t pi)el to be set is p1

V H) ] 1+ yI or p2

V H) ] 1+ y U 1I. .ince 5e restrict ourselves to
the <@^ circle segment sho5n above these pi)els are the only candidates. !o5 de%ine
d; V H) ] 1I
2
] y
2
U r
2
dE V H) ] 1I
2
] Hy U 1I
2
U r
2
5hich are the di%%erences bet5een the s0uared distances %rom the center o% the circle to p1 Hor p2I and to the actual
circle. $% \d;\ ` \dE\+ then p1 is closer Hor e0uidistantI to the actual circleL i% \d;\ d \dE\+ then p2 is closer. :e de%ine the
decision variable d as
d V d; ] dE. HI
:e 5ill sho5 that the rule
$% d ` 0 then select p1 else select p2.
correctly selects the pi)el that is closest to the actual circle. ")hibit 1<.13 sho5s a small part o% the pi)el grid and
illustrates the various possible 5ays NH1I to H@IO ho5 the actual circle may intersect the vertical line at ) ] 1 in
relation to the pi)els p
1
and p
2
.
")hibit 1<.13& /or a given octant o% the circle+ i% pi)el p is lit+ only t5o other pi)els
p
1
and p
2
need be e)amined.
$n cases H1I and H2I p2 lies inside+ p1 inside or on the circle+ and 5e there%ore obtain d; ` 0 and dE e 0. !o5 d e 0+
and applying the rule above 5ill lead to the selection o% p1. .ince \d;\ ` \dE\ this selection is correct. $n case H3I p1
lies outside and p2 inside the circle and 5e there%ore obtain d; d 0 and dE e 0. Applying the rule above 5ill lead to
the selection o% p1 i% d ` 0+ and p2 i% d d 0. This selection is correct since in this case d ` 0 is e0uivalent to \d;\ ` \dE\.
$n cases H<I and H@I p1 lies outside+ p2 outside or on the circle and 5e there%ore obtain d; d 0 and dE Z 0. !o5 d d 0+
and applying the rule above 5ill lead to the selection o% p2. .ince \d;\ d \dE\ this selection is correct.
Let d
i
denote the decision variable that determines the pi)el H)
HiI
+ y
HiI
I to be dra5n in the i#th step. .tarting 5ith
H)
H0I
+ y
H0I
I V H0+ rI 5e obtain
d1 V 3 U 2 K r.
$% di ` 0+ then H)
HiI+
y
HiII
V H)
HiI
] 1+ y
HiU1I
I+ and there%ore
d
i]1
V d
i
] < K )
iU1
] >.
$% di d 0+ then H)
HiI
+ y
HiI
I V H)
HiI
] 1+ y
HiU1I
U 1I+ and there%ore
di]1 V di ] < K H)iU1 U yiU1I ] 10.
This iterative computation o% di]1 %rom the previous di lets us select the correct pi)el to be dra5n in the Hi ] 1I#th
step. The arithmetic needed to evaluate these %ormulas is minimal& addition+ subtraction+ and le%t shi%t
Hmultiplication by <I. The %ollo5ing procedure ;BresenhamCircle; 5hich implements this algorithm dra5s a circle
5ith center H)c+ ycI and radius r. $t uses the procedure ;Circle2oints; to display points on the entire circle. $n the
cases ) V y or r V 1 ;Circle2oints; dra5s each o% %our pi)els t5ice. This causes no problem on a raster display.
procedure ;resenham<ircle*x
c
, y
c
, r- integer+&
procedure <ircle6oints*x, y- integer+&
egin
6aint6ixel*x
c
9 x, y
c
9 y+& 6aint6ixel*x
c
0 x, y
c
9 y+&
6aint6ixel*x
c
9 x, y
c
0 y+& 6aint6ixel*x
c
0 x, y
c
0 y+&
6aint6ixel*x
c
9 y, y
c
9 x+& 6aint6ixel*x
c
0 y, y
c
9 x+&
6aint6ixel*x
c
9 y, y
c
0 x+& 6aint6ixel*x
c
0 y, y
c
0 x+
end&
132
var x, y, d- integer&
egin
x -. 0& y -. r& d -. % 0 $ 1 r&
while x S y do egin
<ircle6oints*x, y+&
if d S 0 then d -. d 9 # 1 x 9 '
else { d -. d 9 # 1 *x 0 y+ 9 "0& y -. y 0 " }&
x -. x 9 "
end&
if x . y then <ircle6oints*x, y+
end& .i+.;resenham,s algorithm-circle&
1. (esign and implement an e%%icient geometric primitive 5hich determines 5hether t5o aligned rectangles
Hi.e. rectangles 5ith sides parallel to the coordinate a)esI intersect.
2. (esign and implement a geometric primitive
function inTriangle*t- triangle& p- point+- C&
5hich takes a triangle t given by its three vertices and a point p and returns a ternary value& p is inside t+ p
is on the boundary o% t+ p is outside t.
3. *se the %unctions ;intersect; o% in E$ntersectionE and ;inTriangle; above to program a
function Segment!ntersectsTriangle*s- segment& t- triangle+- C&
to check 5hether segment s and triangle t share common points. ;.egment$ntersectsTriangle; returns a
ternary value& yes+ degenerate+ no. List all distinct cases o% degeneracy that may occur+ and sho5 ho5 your
code handles them.
<. $mplement Bresenham;s incremental algorithms %or dra5ing digiti6ed straight lines and circles.
@. T5o circles H);+ y;+ r;I and H);;+ y;;+ r;;I are given by the coordinates o% their center and their radius. /ind
e%%ective %ormulas %or deciding the three#5ay 0uestion 5hether HaI the circles intersect as lines+ HbI the
circles intersect as disks+ or HcI neither. Avoid the s0uare#root operation 5henever possible.
!art "B# omple/ity of
problems and algorithms
/undamental issues o% computation
A success%ul search %or better and better algorithms naturally leads to the 0uestion E$s there a best algorithm?E+
5hereas an unsuccess%ul search leads one to ask apprehensively& E$s there an algorithm Ho% a certain typeI to solve
this problem?E These 0uestions turned out to be di%%icult and %ertile. Aistorically+ the 0uestion about the e*istence
o% an algorithm came %irst+ and led to the concepts o% computability and decidability in the 1930s. The 0uestion
about a EbestE algorithm led to the development o% comple)ity theory in the 19>0s.
The study o% these %undamental issues o% computation re0uires a mathematical arsenal that includes
mathematical logic+ discrete mathematics+ probability theory+ and certain parts o% analysis+ in particular
asymptotics. :e introduce a %e5 o% these topics+ mostly by e)ample+ and illustrate the use o% mathematical
techni0ues o% algorithm analysis on the important problem o% sorting.
%4& omputability and
comple/ity
algorithm
computability
8$.C& 8educed $nstruction .et Computer
Almost nothing is computable.
The halting problem is undecidable.
comple)ity o% algorithms and problems
.trassen;s matri) multiplication
@odels of computation# the ultimate '"S
Algorithm and computabilit are originally intuitive concepts. They can remain intuitive as long as 5e only 5ant
to sho5 that some speci%ic result can be computed by %ollo5ing a speci%ic algorithm. Almost al5ays an in%ormal
e)planation su%%ices to convince someone 5ith the re0uisite background that a given algorithm computes a
speci%ied result. :e have illustrated this in%ormal approach throughout 2art $$$. "verything changes i% 5e 5ish to
sho5 that a desired result is not computable. The 0uestion arises immediately& E:hat tools are 5e allo5ed to use?E
"verything is computable 5ith the help o% an oracle that kno5s the ans5ers to all 0uestions. The attempt to prove
negative results about the none)istence o% certain algorithms %orces us to agree on a rigorous de%inition o%
algorithm.
The 0uestion E:hat can be computed by an algorithm+ and 5hat cannot?E 5as studied intensively during the
1930s by "mil 2ost H1D97U19@<I+ Alan Turing H1912U19@<I+ Alon6o Church H1903I+ and other logicians. They
de%ined various %ormal models o% computation+ such as production systems+ Turing machines+ and recursive
%unctions+ to capture the intuitive concept o% Ecomputation by the application o% precise rulesE. All these di%%erent
%ormal models o% computation turned out to be e0uivalent. This %act greatly strengthens Church;s thesis that the
intuitive concept o% algorithm is %ormali6ed correctly by any one o% these mathematical systems.
:e 5ill not de%ine any o% these standard models o% computation. They all share the trait that they 5ere designed
to be conceptually simple& their primitive operations are chosen to be as 5eak as possible+ as long as they retain
their property o% being universal computing systems in the sense that they can simulate any computation
per%ormed on any other machine. $t usually comes as a surprise to novices that the set o% primitives o% a universal
computing machine can be so simple as long as these machines possess t5o essential ingredients& unbounded
memor and unbounded time.
'ost simulations o% a po5er%ul computer on a simple one share three characteristics& it is straight%or5ard in
principle+ it involves laborious coding in practice+ and it e)plodes the space and time re0uirements o% a
15. 1omputabilit% and comple&it%
computation. The 5eakness o% the primitives+ desirable %rom a theoretical point o% vie5+ has the conse0uence that
as simple an operation as integer addition becomes an e)ercise in programming.
The model o% computation used most o%ten in algorithm analysis is signi%icantly more po5er%ul than a Turing
machine in t5o respects& H1I its memory is not a tape+ but an array+ and H2I in one primitive operation it can deal
5ith numbers o% arbitrary si6e. This model o% computation is called random access machine+ abbreviated as 8A'.
A 8A' is essentially a random access memor+ also abbreviated as 8A'+ o% unbounded capacity+ as suggested in
")hibit 1@.1. The memory consists o% an in%inite array o% memory cells+ addressed 0+ 1+ 2+ [ . "ach cell can hold a
number+ say an integer+ o% arbitrary si6e+ as the arro5 pointing to the right suggests.
")hibit 1@.1& 8A' # unbounded address space+ unbounded cell si6e.
A 8A' has an arithmetic unit and is driven by a program. The meaning o% the 5ord random is that any memory
cell can be accessed in unit time Has opposed to a tape memory+ say+ 5here access time depends on distanceI. A
%urther crucial assumption in the 8A' model is that an arithmetic operation H]+ U+ K+ QI also takes unit time+
regardless o% the si6e o% the numbers involved. This assumption is unrealistic in a computation 5here numbers may
gro5 very large+ but o%ten is a use%ul assumption. As is the case 5ith all models+ the responsibility %or using them
properly lies 5ith the user. To give the reader the %lavor o% a model o% computation+ 5e de%ine a 8A' 5hose
architecture is rather similar to real computers+ but is unrealistically simple.
The ultimate 8$.C
8$.C stands %or +educed Instruction Set 0omputer+ a machine that has only a %e5 types o% instructions built
into the hard5are. :hat is the minimum number o% instructions a computer needs to be universal? $n theory+ one.
Consider a stored#program computer o% the Evon !eumann typeE 5here data and program are stored in the
same memory Hohn von !eumann+ 1903U19@7I. Let the random access memory H8A'I be Edoubly in%initeE& There
is a countable infinit o% memory cells addressed 0+ 1+ [ + each o% 5hich can hold an integer o% arbitrary si6e+ or an
instruction. :e assume that the constant 1 is hard5ired into memory cell 1L %rom 1 any other integer can be
constructed. There is a single type o% Ethree#address instructionE 5hich 5e call Esubtract+ test and 3umpE+
abbreviated as
.T )+ y+ 6
5here )+ y+ and 6 are addresses. $ts semantics is e0uivalent to
ST\ x, y, L x -. x 0 y& if x J 0 then goto L&
)+ y+ and 6 re%er to cells C)+ Cy+ and C6. The contents o% C) and Cy are treated as data Han integerIL the contents o%
C6+ as an instruction H")hibit 1@.2I.
13>
")hibit 1@.2& .tored program computer& data and instructions share the memory.
.ince this 8$.C has 3ust one type o% instruction+ 5e 5aste no space on an op#code %ield. An instruction contains
three addresses+ each o% 5hich is an unbounded integer. $n theory+ %ortunately+ three unbounded integers can be
packed into the same space re0uired %or a single unbounded integer. $n the %ollo5ing e)ercise+ this simple idea
leads to a 5ell#kno5n techni0ue introduced into mathematical logic by -urt ,qdel H190> U 197DI.
")ercise& ,qdel numbering
HaI 'otel $n%inity has a countable in%inity o% rooms numbered 0+ 1+ 2+ [ . "very room is occupied+ so the sign
claims E!o FacancyE. Convince the manager that there is room %or one more person.
HbI Assume that a memory cell in our 8$.C stores an integer as a sign bit %ollo5ed by a se0uence d0+ d1+ d2+ [ o%
decimal digits+ least signi%icant %irst. (evise a scheme %or storing three addresses in one cell.
HcI .ho5 ho5 a se0uence o% positive integers i1+ i2+ [ + in o% arbitrary length n can be encoded in a single natural
number 3& ,iven 3+ the se0uence can be uni0uely reconstructed. ,qdel;s solution&
Basic program %ragments
This computer is best understood by considering program %ragments %or simple tasks. These %ragments
implement simple operations+ such as setting a variable to a given constant+ or the assignment operator+ that are
given as primitives in most programming languages. 2rogramming these %ragments naturally leads us to introduce
basic concepts o% assembly language+ in particular symbolic and relative addressing.
.et the content o% cell 0 to 0&
.T 0+ 0+ .]1
:hatever the current content o% cell 0+ subtract it %rom itsel% to obtain the integer 0. This instruction resides at
some address in memory+ 5hich 5e abbreviate as ;.;+ read as Ethe current value o% the program counterE. ;.]1; is the
ne)t address+ so regardless o% the outcome o% the test+ control %lo5s to the ne)t instruction in memory.
a &V b+ 5here a and b are symbolic addresses. *se a temporary variable t&
.T t+ t+ .]1 ? t :A 2 @
.T t+ b+ .]1 ? t :A Bb @
.T a+ a+ .]1? a :A 2 @
.T a+ t+ .]1 ? a :A Bt$ so now a A b @
")ercise& a program library
HaI :rite 8$.C programs %or a&V b ] c+ a &V b K c+ a &V b div c+ a &V b mod c+ a &V \b\+ a & V minHb+ cI+ a &V gcdHb+
cI.
0
1
2
13
14
.
.
.
0
1
STJ 0, 0, 14
program counter
Executing instruction 13 sets
cell 0 to 0, and increments
the program counter.
HbI .ho5 ho5 this 8$.C can compute 5ith rational numbers represented by a pair Na+ bO o% integers denoting
numerator and denominator.
HcI HAdvancedI .ho5 that this 8$.C is universal+ in the sense that it can simulate any computation done by any
other computer.
The e)ercise o% building up a 8$.C program library %or elementary %unctions provides the same e)perience as the
e0uivalent e)ercise %or Turing machines+ but leads to the goal much %aster+ since the primitive .T is much more
po5er%ul than the primitives o% a Turing machine.
The purpose o% this section is to introduce the idea that conceptually simple models o% computation are as
po5er%ul+ in theory+ as much more comple) models+ such as a high#level programming language. The ne)t t5o
sections demonstrate results o% an opposite nature& *niversal computers+ in the sense 5e have 3ust introduced+ are
sub3ect to striking limitations+ even i% 5e remove any limit on the memory and time they may use. :e prove the
e)istence o% noncomputable %unctions and sho5 that the Ehalting problemE is undecidable.
The theory o% computability 5as developed in the 1930s+ and greatly e)panded in the 19@0s and 19>0s. $ts basic
ideas have become part o% the %oundation that any computer scientist is e)pected to kno5. Computability theory is
not directly use%ul. $t is based on the concept Ecomputable in principleE but o%%ers no concept o% a E%easible
computationE. /easibility+ rather than Epossible in principleE+ is the touchstone o% computer science. .ince the
19>0s+ a theory o% the comple)ity o% computation is being developed+ 5ith the goal o% partitioning the range o%
computability into comple)ity classes according to time and space re0uirements. This theory is still in %ull
development and breaking ne5 ground+ in particular in the area o% concurrent computation. :e have used some o%
its concepts throughout 2art $$$ and continue to illustrate these ideas 5ith simple e)amples and surprising results.
Almost nothing is computable
Consider as a model o% computation any programming language+ 5ith the %ictitious %eature that it is
implemented on a machine 5ith in%inite memory and no operational time limits. !evertheless 5e reach the
conclusion that Ealmost nothing is computableE. This %ollo5s simply %rom the observation that there are %e5er
programs than problems to be solved H%unctions to be computedI. Both the number o% programs and the number o%
%unctions are in%inite+ but the latter is an in%inity o% higher cardinality.
A programming language L is de%ined over an alphabet AV aa1+ a2+ [ + akb o% k characters. The set o% programs in
L is a subset o% the set A
o% all strings over A. A
is countable+ and so is its subset L+ as it is in one#to#one

correspondence 5ith the natural numbers under the %ollo5ing mapping&
1. ,enerate all strings in A

in order o% increasing length and+ in case o% e0ual length+ in le)icographic order.
2. "rase all strings that do not represent a program according to the synta) rules o% L.
3. "numerate the remaining strings in the originally given order.
Among all programs in L 5e consider only those 5hich compute a HpartialI %unction %rom the set N V a1+ 2+ 3+ [b
o% natural numbers into N. This can be recogni6ed by their headingL %or e)ample+
function f*x- N+- N&
As this is a subset o% L+ there e)ist only countably many such programs.
Ao5ever+ there are uncountably many %unctions %& N N+ as ,eorg Cantor H1D<@U191DI proved by his %amous
diagonali6ation argument. $t starts by assuming the opposite+ that the set a% \ %& N Nb is countable+ then derives
13D
a contradiction. $% there 5ere only a countable number o% such %unctions+ 5e could enumerate all o% them according
to the %ollo5ing scheme&
f
1
(1) f
1
f
1
(2) f
1
(3) f
1
(4)
f
2
(1) f
2
(3) f
2
(4) f
2
(2) f
2
f
3
f
3
(1) f
3
(3) f
3
(4) f
3
(2)
1 2 3 4
.
.
.
. . .
f
4
f
4
(1) f
4
(3) f
4
(4) f
4
(2)
Construct a %unction g& N N+ gHiI V %iHiI ] 1+ 5hich is obtained by adding 1 to the diagonal elements in the
scheme above. Aence g is di%%erent %rom each %
i
+ at least %or the argument i& gHiI f %iHiI. There%ore+ our assumption
that 5e have enumerated all %unctions %& N N is 5rong. .ince there e)ists only a countable in%inity o% programs+
but an uncountable in%inity o% %unctions+ almost all %unctions are noncomputable.
The halting problem is undecidable
$% 5e could predict+ %or any program 2 e)ecuted on any data set (+ 5hether 2 terminates or not Hi.e. 5hether it
5ill get into an in%inite loopI+ 5e 5ould have an interesting and use%ul techni0ue. $% this prediction 5ere based on
rules that prescribe e)actly ho5 the pair H2+ (I is to be tested+ 5e could 5rite a program A %or it. A %undamental
result o% computability theory states that under reasonable assumptions about the model o% computation+ such a
halting program A cannot e)ist.
Consider a programming language L that contains the constructs 5e 5ill use& mainly recursive procedures and
procedure parameters. Consider all procedures 2 in L that have no parameters+ a property that can be recogni6ed
%rom the heading
procedure 6&
This simpli%ies the problem by avoiding any data dependency o% termination.
Assume that there e)ists a program A in L that takes as argument any parameterless procedure 2 in L and
decides 5hether 2 halts or loops Hi.e. runs inde%initelyI&
Consider the behavior o% the %ollo5ing parameterless procedure W&
procedure B&
egin while E*B+ do& end&
Consider the re%erence o% W to itsel%L this trick corresponds to the diagonali6ation in the previous e)ample.
Consider %urther the loop
while E*B+ do&
5hich is in%inite i% AHWI returns true Hi.e. e)actly 5hen W should haltI and terminates i% AHWI returns %alse Hi.e.
e)actly 5hen W should run %oreverI. This trick corresponds to the change o% the diagonal gHiI V %
i
HiI ] 1. :e obtain&
By de%inition o% W& By construction o% W&
The %iendishly cra%ted program W traps A in a 5eb o% contradictions. :e blame the 5eakest link in the chain o%
reasoning that leads to this contradiction+ namely the unsupported assumption o% the e)istence o% a halting
program A. This proves that the halting problem is undecidable.
omputable$ yet un(no*n
$n the preceding t5o sections 5e have illustrated the limitations o% computability& clearly stated 0uestions+ such
as the halting problem+ are undecidable. This means that the halting 0uestion cannot be ans5ered+ in general+ by
any computation no matter ho5 e)tensive in time and space. There are+ o% course+ lots o% individual halting
0uestions that can be ans5ered+ asserting that a particular program running on a particular data set terminates+ or
%ails to do so. To illuminate this key concept o% theoretical computer science %urther+ the %ollo5ing e)amples 5ill
highlight a di%%erent type o% practical limitation o% computability.
Computable or decidable is a concept that naturally involves one algorithm and a denumerabl infinite set o%
problems+ inde)ed by a parameter+ say n. $s there a uni%orm procedure that 5ill solve any one problem in the
in%inite set? /or e)ample+ the E0uestionE Hreally a denumerable in%inity o% 0uestionsI ECan a given integer n d 2 be
e)pressed as the sum o% t5o primes?E is decidable because there e)ists the algorithm ;s2p; that 5ill ans5er any
single instance o% this 0uestion&
procedure s$p*n- integer+- oolean&
{ for n65, s5p(n returns true if n is the sum of two primes,
false otherwise }
function p*k- integer+- integer&
{ for k6', p(k returns the k-th prime= p(& ! 5, p(5 ! #, p(#
! H, C }
end&
egin
for all i, / such that p*i+ S n and p*/ +S n do
if p*i+ 9 p*/+ . n then return*true+&
return*false+&
end& { s5p }
.o the general 0uestion E$s any given integer the sum o% t5o primes?E is solved readily by a simple program. A
single related 0uestion+ ho5ever+ is much harder& E$s every even integer d2 the sum o% t5o primes?E Let;s try&
< V 2 ] 2+ > V 3 ] 3+ D V @ ] 3+ 10 V 7 ] 3 V @ ] @+ 12 V 7 ] @+
1< V 11 ] 3 V 7 ] 7+ 1> V 13 ] 3 V 11 ] @+ 1D V 13 ] @ V 11 ] 7+
20 V 17 ] 3 V 13 ] 7+ 22 V 19 ] 3 V 17 ] @ V 11 ] 11+
2< V 19 ] @ V 17 ] 7 V 13 ] 11+ 2> V 23 ] 3 V 21 ] @ V 19 ] 7 V 13 ] 13+
2D V 23 ] @ V 17 ] 11+ 30 V 23 ] 7 V 19 ] 11 V 17 ] 13+
32 V 29 ] 3 V 19 ] 13+ 3< V 31 ] 3 V 29 ] @ V 23 ] 11 V 17 ] 17+
3> V 33 ] 3 V 31 ] @ V 29 ] 7 V 23 ] 13 V 19 ] 17.
1<0
A bit o% e)perimentation suggests that the number o% distinct representations as a sum o% t5o primes increases
as the target integer gro5s. Christian ,oldbach H1>90U17><I had the good %ortune o% stating the plausible
con3ecture EyesE to a problem so hard that it has de%ied proo% or countere)ample %or three centuries.
Cne might ask& $s the ,oldbach con3ecture decidable? The straight ans5er is that the concept o% decidability
does not apply to a single yesQno 0uestion such as ,oldbach;s con3ecture. Asking %or an algorithm that tells us
5hether the con3ecture is true or %alse is meaninglessly trivial. C% course+ there is such an algorithmM $% the
,oldbach con3ecture is true+ the algorithm that says ;yes; decides. $% the con3ecture is %alse+ the algorithm that says
;no; 5ill do the 3ob. The problem that 5e don%t know 5hich algorithm is the right one is 0uite compatible 5ith
saying that one of those two is the right algorithm. $% 5e package t5o trivial algorithms into one+ 5e get the
%ollo5ing trivial algorithm %or deciding ,oldbach;s con3ecture&
function :oldach4racle*+- oolean-
egin return*:oldach!s5ight+ end&
!otice that ;,oldbachCracle; is a %unction 5ithout arguments+ and ;,oldbach$s8ight; is a boolean constant+
either true or %alse. Cccasionally+ the stark triviality o% the argument above is concealed so cleverly under technical
3argon as to sound pro%ound. :atch out to see through the %ollo5ing plot.
Let us call an even integer d 2 that is not a sum o% t5o primes a countere*ample. !one have been %ound as yet+
but 5e can certainly reason about them+ 5hether they e)ist or not. (e%ine the
function :*k- cardinal+- oolean&
as %ollo5s&
,oldbach;s con3ecture is e0uivalent to ,H0I V true. The HimplausibleI rival con3ecture that there is e)actly one
countere)ample is e0uivalent to ,H0I V %alse+ ,H1I V true. Although 5e do not kno5 the value o% ,HkI %or any single
k+ the de%inition o% , tells us a lot about this arti%icial %unction+ namely&
i% ,HiI V true %or any i+ then ,HkI V true %or all k d i.
:ith such a strong monotonicity property+ ho5 can , look?
1. $% ,oldbach is right+ then , is a constant& ,HkI V true %or all k.
2. $% there are a %inite number i o% e)ceptions+ then , is a step %unction&
,HkI V %alse %or k e i+ ,HkI V true %or k Z i.
3. $% there is an in%inite number o% e)ceptions+ then , is again a constant&
,HkI V %alse %or all k.
"ach o% the in%initely many %unctions listed above is obviously computable. Aence , is computable. The value o%
,H0I determines truth or %alsity o% ,oldbach;s con3ecture. (oes that help us settle this time#honored mathematical
pu66le? Cbviously not. All 5e have done is to rephrase the honest statement 5ith 5hich 5e started this section+
EThe ans5er is yes or no+ but $ don;t kno5 5hichE by the circuitous EThe ans5er can be obtained by evaluating a
computable %unction+ but $ don;t kno5 5hich oneE.
Algorithms and Data Structures 1<1 A ,lobal Te)t
@ultiplication of comple/ numbers
Let us turn our attention %rom noncomputable %unctions and undecidable problems to very simple %unctions that
are obviously computable+ and ask about their comple)ity& Ao5 many primitive operations must be e)ecuted in
evaluating a speci%ic %unction? As an e)ample+ consider arithmetic operations on real numbers to be primitive+ and
consider the product 6 o% t5o comple) numbers ) and y&
) V )1 ] i K )2 and y V y1 ] i K y2+
) K y V 6 V 61 ] i K 62.
The comple) product is de%ined in terms o% operations on real numbers as %ollo5s&
61 V )1 K y1 U )2 K y2+
62 V )1 K y2 ] )2 K y1.
$t appears that one comple) multiplication re0uires %our real multiplications and t5o real
additionsQsubtractions. .urprisingly+ it turns out that multiplications can be traded %or additions. :e %irst compute
three intermediate variables using one multiplication %or each+ and then obtain 6 by additions and subtractions&
p1 V H)1 ] )2I K Hy1 ] y2I+
p2 V )1 K y1+
p3 V )2 K y2+
61 V p2 U p3+ 62 V p1 U p2 U p3.
This evaluation o% the comple) product re0uires only 3 real multiplications+ but @ real additions Q subtractions.
This trade o% one multiplication %or three additions may not look like a good deal in practice+ because many
computers have arithmetic chips 5ith %ast multiplication circuitry. $n theory+ ho5ever+ the trade is clearly %avorable.
The cost o% an addition gro5s linearly in the number o% digits+ 5hereas the cost o% a multiplication using the
standard method gro5s 0uadratically. The key idea behind this algorithm is that Elinear combinations o% k products
o% sums can generate more than k products o% simple termsE. Let us e)ploit this idea in a conte)t 5here it makes a
real di%%erence.
omple/ity of matri/ multiplication
The comple*it of an algorithm is given by its time and space re0uirements. Time is usually measured by the
number o% operations e)ecuted+ space by the number o% variables needed at any one time H%or input+ intermediate
results+ and outputI. /or a given algorithm it is o%ten easy to count the number o% operations per%ormed in the 5orst
and in the best caseL it is usually di%%icult to determine the average number o% operations per%ormed Hi.e. averaged
over all possible input dataI. 2ractical algorithms o%ten have time comple)ities o% the order CHlog nI+ CHn
2
I+ CHn K
log nI+ CHn
2
I+ and space comple)ity o% the order CHnI+ 5here n measures the si6e o% the input data.
The comple*it of a problem is de%ined as the minimal comple)ity o% all algorithms that solve this problem. $t is
almost al5ays di%%icult to determine the comple)ity o% a problem+ since all possible algorithms must be considered+
including those yet unkno5n. This may lead to surprising results that disprove obvious assumptions.
The comple)ity o% an algorithm is an upper bound %or the comple)ity o% the problem solved by this algorithm.
An algorithm is a 5itness %or the assertion& Xou need at most this many operations to solve this problem. A speci%ic
algorithm never provides a lower bound on the comple)ity o% a problemY it cannot e)tinguish the hope %or a more
e%%icient algorithm. Cccasionally+ algorithm designers engage in races lasting decades that result in HtheoreticallyI
%aster and %aster algorithms %or solving a given problem. Folker .trassen started such a race 5ith his 19>9 paper
1<2
E,aussian "limination $s !ot CptimalE N.tr >9O+ 5here he sho5ed that matri) multiplication re0uires %e5er
operations than had commonly been assumed necessary. The race has not yet ended.
The obvious 5ay to multiply t5o n n matrices uses three nested loops+ each o% 5hich is iterated n times+ as 5e
sa5 in a transitive hull algorithm in the chapter+ R'atrices and graphs& transitive closureS. The %act that the obvious
algorithm %or matri) multiplication is o% time comple)ity Hn
3
I+ ho5ever+ does not imply that the matri)
multiplication problem is o% the same comple)ity.
.trassen;s matri) multiplication
The standard algorithm %or multiplying t5o n n matrices needs n
3
scalar multiplications and n
2
K Hn U 1I
additionsL %or the case o% 2 2 matrices+ eight multiplications and %our additions. .even scalar multiplications
su%%ice i% 5e accept 1D additionsQsubtractions.
"valuate seven e)pressions+ each o% 5hich is a product o% sums&
p1 V Ha11 ] a22I K Hb11 ] b22I+
p2 V Ha21 ] a22I K b11
p3 V a11 K Hb12 U b22I
p< V a22 K HUb11 ] b21I p@ V Ha11 ] a12I K b22
p> V HUa11 ] a21I K Hb11 ] b12Ip7 V Ha12 U a22I K Hb21 ] b22I.
The elements o% the product matri) are computed as %ollo5s&
r11 V p1 ] p< U p@ ] p7+
r12 V p3 ] p@+
r21 V p2 ] p<+ r22 V p1 U p2 ] p3 ] p>.
This algorithm does not rely on the commutativity o% scalar multiplication. Aence it can be generali6ed to n n
matrices using the divide#and#con0uer principle. /or reasons o% simplicity consider n to be a po5er o% 2 Hi.e. n V 2
k
IL
%or other values o% n+ imagine padding the matrices 5ith ro5s and columns o% 6eros up to the ne)t po5er o% 2. An n
n matri) is partitioned into %our nQ2 nQ2 matrices&
The product o% t5o n n matrices by .trassen;s method re0uires seven Hnot eightI multiplications and 1D
additionsQsubtractions o% nQ2 nQ2 matrices. /or large n+ the 5ork re0uired %or the 1D additions is negligible
compared to the 5ork re0uired %or even a single multiplication H5hy?IL thus 5e have saved one multiplication out o%
eight+ asymptotically at no cost.
"ach nQ2 nQ2 matri) is again partitioned recursively into %our nQ< nQ< matricesL a%ter log2

n partitioning
steps 5e arrive at 1 1 matrices %or 5hich matri) multiplication is the primitive scalar multiplication. Let THnI
denote the number o% scalar arithmetic operations used by .trassen;s method %or multiplying t5o n n matrices.
/or n d 1+ THnI obeys the recursive e0uation
$% 5e are only interested in the leading term o% the solution+ the constants 7 and 2 3usti%y omitting the 0uadratic
term+ thus obtaining
Thus the number o% primitive operations re0uired to multiply t5o n n matrices using .trassen;s method is
proportional to n
2.D1
+ a statement that 5e abbreviate as E.trassen;s matri) multiplication takes time Hn
2.D1
IE.
(oes this asymptotic improvement lead to a more e%%icient program in practice? 2robabl y not+ as the ratio
gro5s too slo5ly to be o% practical importance& /or n 1000+ %or e)ample+ 5e have 51024 V < Hremember& 2
10
V
102<I. A %actor o% < is not to be disdained+ but there are many 5ays to 5in or lose a %actor o% <. Trading an algorithm
5ith simple code+ such as straight%or5ard matri) multiplication+ %or another that re0uires more elaborate
bookkeeping+ such as .trassen;s+ can easily result in a %our%old increase o% the constant %actor that measures the
time it takes to e)ecute the body o% the innermost loop.
")ercises
1. 2rove that the set o% all ordered pairs o% integers is countably in%inite.
2. A recursive function is de%ined by a %inite set o% rules that speci%y the %unction in terms o% variables+
nonnegative integer constants+ increment H;]1;I+ the %unction itsel%+ or an e)pression built %rom these by
composition o% %unctions. As an e)ample+ consider Ackermann%s function de%ined as AHnI V A
n
HnI %or n Z 1+
5here AkHnI is determined by
AkH1I V 2 %or k Z 1
A1HnI V A1HnU1I ] 2 %or n Z 2
AkHnI V AkU1HAkHnU1II %or k Z 2
HaI Calculate AH1I + AH2I + AH3I+ AH<I.
HbI 2rove that
AkH2I V < %or k Z 1+
A1HnI V 2Kn %or n Z 1+
A2HnI V 2
n
%or n Z 1+
A3HnI V 2
A
3
HnU1I
%or n Z 2.
HcI (e%ine the inverse o% Ackermann;s %unction as
(nI V minam& AHmI Z nb.
.ho5 that HnI ` 3 %or n ` 1>+ that HnI ` < %or n at most a Eto5erE o% >@@3> 2;s+ and that HnI _ as n
_.
1<<
3. Complete .trassen;s algorithm by sho5ing ho5 to multiply n n matrices 5hen n is not an e)act po5er o%
2.
<. Assume that you can multiply 3 3 matrices using k multiplications. :hat is the largest k that 5ill lead to
an asymptotic improvement over .trassen;s algorithm?
@. A permutation matri) 2 is an n n matri) that has e)actly one ;1; in each ro5 and each columnL all other
entries are ;0;. A permutation matri) can be represented by an array
var a- arrayF" .. nG of integer&
as %ollo5s& aNiO V 3 i% the i#th ro5 o% 2 contains a ;1; in the 3#th column.
>. 2rove that the product o% t5o permutation matrices is again a permutation matri).
7. (esign an algorithm that multiplies in time HnI t5o permutation matrices given in the array
representation above+ and stores the result in this same array representation.
Algorithms and Data Structures 1<@ A ,lobal Te)t
%5& The mathematics of
algorithm analysis
5orst#case and average per%ormance o% an algorithm
gro5th rate o% a %unction
asymptotics& CHI+ HI+ HI
asymptotic behavior o% sums
solution techni0ues %or recurrence relations
asymptotic per%ormance o% divide#and#con0uer algorithms
average number o% inversions and average distance in a permutation
trees and their properties
Gro*th rates and orders of magnitude
To understand a speci%ic algorithm+ it is use%ul to ask and ans5er the %ollo5ing 0uestions+ usually in this order&
:hat is the problem to be solved? :hat is the main idea on 5hich this algorithm is based? :hy is it correct? Ao5
e%%icient is it?
The variety o% problems is vast+ and so is the variety o% Emain ideasE that lead one to design an algorithm and
establish its correctness. True+ there are general algorithmic principles or schemas 5hich are problem#independent+
but these rarely su%%ice& $nteresting algorithms typically e)ploit speci%ic %eatures o% a problem+ so there is no uni%ied
approach to understanding the logic o% algorithms. 8emarkably+ there is a uni%ied approach to the e%%iciency
analysis o% algorithms+ 5here e%%iciency is measured by a program;s time and storage re0uirements. This is
remarkable because there is great variety in H1I sets o% input data and H2I environments Hcomputers+ operating
systems+ programming languages+ coding techni0uesI+ and these di%%erences have a great in%luence on the run time
and storage consumed by a program. These t5o types o% di%%erences are overcome as %ollo5s.
(i%%erent sets o% input data& 5orst#case and average per%ormance
The most important characteristic o% a set o% data is its si6e+ measured in terms o% any unit convenient to the
problem at hand. This is typically the number o% primitive ob3ects in the data+ such as bits+ bytes+ integers+ or any
monotonic %unction thereo%+ such as the magnitude o% an integer. E*amples: /or sorting+ the number n o% elements
is naturalL %or s0uare matrices+ the number n o% ro5s and columns is convenientL it is a monotonic %unction Hs0uare
rootI o% the actual si6e n
2
o% the data. An algorithm may 5ell behave very di%%erently on di%%erent data sets o% e0ual
si6e nYamong all possible con%igurations o% given si6e n some 5ill be %avorable+ others less so. Both the worst-case
data set o% si6e n and the average over all data sets o% si6e n provide 5ell#de%ined and important measures o%
e%%iciency. E*ample: :hen sorting data sets about 5hose order nothing is kno5n+ average per%ormance is 5ell
characteri6ed by averaging run time over all nM permutations o% the n elements.
Algorithms and Data Structures 1<> A ,lobal Te)t
1#. )he mathematics of algorithm anal%sis
(i%%erent environments& %ocus on gro5th rate and ignore constants
The 5ork per%ormed by an algorithm is e)pressed as a %unction o% the problem si6e+ typically measured by si6e n
o% the input data. By %ocusing on the gro5th rate o% this %unction but ignoring speci%ic constants+ 5e succeed in
losing a lot o% detail in%ormation that changes 5ildly %rom one computing environment to another+ 5hile retaining
some essential in%ormation that is remarkably invariant 5hen moving a computation %rom a micro# to a
supercomputer+ %rom machine language to 2ascal+ %rom amateur to pro%essional programmer. The de%inition o%
general measures %or the comple)ity o% problems and %or the e%%iciency o% algorithms is a ma3or achievement o%
computer science. $t is based on the notions o% asmptotic time and space comple*it. Asymptotics renounces
e)act measurement but states ho5 the 5ork gro5s as the problem si6e increases. This in%ormation o%ten su%%ices to
distinguish e%%icient algorithms %rom ine%%icient ones. The asymptotic behavior o% an algorithm is described by the
CHI+ HI+ HI+ and oHI notations. To determine the amount o% 5ork to be per%ormed by an algorithm 5e count
operations that take constant time Hindependently o% nI and data ob3ects that re0uire constant storage space. The
time re0uired by an addition+ comparison+ or e)change o% t5o numbers is typically independent o% ho5 many
numbers 5e are processingL so is the storage re0uirement %or a number.
Assume that the time re0uired by %our algorithms A1+ A2+ A3+ and A< is log2n+ n+ n K log2n+ and n
2
+ respectively. The
%ollo5ing table sho5s that %or si6es o% data sets that %re0uently occur in practice+ %rom n o 10
3
to 10
>
+ the di%%erence
in gro5th rate translates into large numerical di%%erences&
n A1 V log2n A2 V n A3 V n K log2n A< V n
2
2
@
V 32 @ 2
@
V 3
2
@ K 2
@
V 1>0
2
10
o 10
3
2
10
V 102< 10 2
10
o 10
3
10 K 2
10
o 10
<
2
20
o 10
>
2
20
o 10> 20 2
20
o 1
0>
20 K 2
20
o 2 K 10
7
2
<0
o 10
12
/or a speci%ic algorithm these %unctions are to be multiplied by a constant %actor proportional to the time it takes
to e)ecute the body o% the innermost loop. :hen comparing di%%erent algorithms that solve the same problem+ it
may 5ell happen that one innermost loop is 10 times %aster or slo5er than another. $t is rare that this di%%erence
approaches a %actor o% 100. Thus %or n o 1000 an algorithm 5ith time comple)ity Hn K log nI 5ill almost al5ays be
much more e%%icient than an algorithm 5ith time comple)ity Hn
2
I. /or small n+ say n V 32+ an algorithm o% time
comple)ity Hn
2
I may be more e%%icient than one o% comple)ity Hn K log nI He.g. i% its constant is 10 times smallerI.
:hen 5e 5ish to predict e)actly ho5 many seconds and bytes a program needs+ asymptotic analysis is still
use%ul but is only a small part o% the 5ork. :e no5 have to go back over the %ormulas and keep track o% all the
constant %actors discarded in cavalier %ashion by the CHI notation. :e have to assign numbers to the time consumed
by scores o% primitive CH1I operations. $t may be su%%icient to estimate the time consuming primitives+ such as
%loating#point operationsL or it may be necessary to include those that are hidden by a high#level programming
language and ans5er 0uestions such as& Ao5 long does an array access aNi+ 3O take? A procedure call? $ncrementing
the inde) i in a loop E%or i &V 0 to nE?
Asymptotics
Asymptotics is a techni0ue used to estimate and compare the gro5th behavior o% %unctions. Consider the
%unction
1<7
%H)I is said to behave like ) %or ) _ and like 1 Q ) %or ) 0. The motivation %or such a statement is that both ) and
1 Q) are intuitively simpler+ more easily understood %unctions than %H)I. A complicated %unction is unlike any simpler
one across its entire domain+ but it usually behaves like a simpler one as ) approaches some particular value. Thus
all asymptotic statements include the 0uali%ier ) )
0
. /or the purpose o% algorithm analysis 5e are interested in
the behavior o% %unctions %or large values o% their argument+ and all our de%initions belo5 assume ) _.
The asymptotic behavior o% %unctions is described by the CHI+ HI+ HI+ and oHI notations+ as in %H)I CHgH)II.
"ach o% these notations assigns to a given %unction g the set of all functions that are related to g in a 5ell#de%ined
5ay. $ntuitively+ CHI+ HI+ HI+ and oHI are used to compare the gro5th o% %unctions+ as `+ Z+ V+ and e are used to
compare numbers. CHgI is the set o% all %unctions that are ` g in a precise technical sense that corresponds to the
intuitive notion Egro5s no %aster than gE. The de%inition involves some technicalities signaled by the preamble c d
0+> )0

W+ ) Z )0. $t says that 5e ignore constant %actors and initial behavior and are interested only in a
%unction;s behavior %rom some point on. N0 is the set o% nonnegative integers+ R0 the set o% nonnegative reals. $n the
%ollo5ing de%initions W stands %or either N0 or R0. Let g& W W.
(e%inition o% CHI+ Ebig ohE&
CHgI &V a%& W W | c > 0, )0 W+ ) )o & %H)I ` c K gH)Ib
:e say that % CHgI+ or that % gro5s at most as %ast as gH)I %or ) _.
(e%inition o% HI+ EomegaE&
HgI &V a%& W W c d 0

)0 W, ) Z

)0& %H)I Z c K gH)Ib.
:e say that % CHgI+ or that % gro5s at least as %ast as gH)I %or ) _.
(e%inition o% HI+ EthetaE&
HgI &V CHgI HgI.
:e say that % HgI+ or that % has the same gro5th rate as gH)I %or ) _.
(e%inition o% oHI+ Esmall ohE&
:e say that % oHgI+ or that % gro5s slo5er than gH)I %or ) _.
!otation& 'ost o% the literature uses V in place o% our + such as in ) V CH)
2
I. $% you do so+ 3ust remember that
this V has none o% the standard properties o% an e0uality relationYit is neither commutative nor transitive. Thus
CH)
2
I V ) is not used+ and %rom ) V CH)
2
I and )
2
V CH)
2
I it does not %ollo5 that ) V )
2
. The key to avoiding con%usion
is the insight that CHI is not a %unction but a set o% %unctions.
Summation formulas
log2 denotes the logarithm to the base 2+ ln the natural logarithm to the base e.
Algorithms and Data Structures 1<D A ,lobal Te)t
The asymptotic behavior o% a sum can be derived by comparing the sum to an integral that can be evaluated in
closed %orm. Let %H)I be a monotonically increasing+ integrable %unction. Then
is bounded belo5 and above by sums H")hibit 1>.1I&
")hibit 1>.1& Bounding a de%inite integral by lo5er and upper sums.
Letting )i

V i ] 1+ this ine0uality becomes
so
1<9
f(x)
x
y

x
0

x
1

x
2

x
n1


")ample
By substituting
5ith k d 0 in HI 5e obtain
and there%ore
")ample
By substituting

f x=ln x
and
ln x dx=xln xx
in H) 5e obtain
(n+1)ln(n+1)nln(n+1)
i=1
n
ln i( n+1) cdotln(n+1)n ,
and there%ore
i=1
n
log
2
i=(n+1)log
2
( n+1)
n
ln 2
+g(n) withg(n) O( log n)
")ample
By substituting
in H) 5e obtain
5ith gHnI CHn K log nI.
'ecurrence relations
A homogeneous linear recurrence relation 5ith constant coe%%icients is o% the %orm
)n V a1 K )nU1 ] a2 K )nU2 ] [ ] ak K )nUk
5here the coe%%icients ai are independent o% n and )1+ )2+ [ + )nU1 are speci%ied. There is a general techni0ue %or
solving linear recurrence relations 5ith constant coe%%icients # that is+ %or determining )n as a %unction o% n. :e 5ill
demonstrate this techni0ue %or the /ibonacci se0uence 5hich is de%ined by the recurrence
Algorithms and Data Structures 1@0 A ,lobal Te)t
)n V )nU1 ] )nU2+ )0 V 0+ )1 V 1.
:e seek a solution o% the %orm
)n V c K r
n
5ith constants c and r to be determined. .ubstituting this into the /ibonacci recurrence relation yields
c K r
n
V c K r
nU1
] c K r
nU2
or
c K r
nU2
K Hr
2
U r U 1I V 0.
This e0uation is satis%ied i% either c V 0 or r V 0 or r
2
U r U 1 V 0. :e obtain the trivial solution )n

V 0 %or all n i% c
V 0 or r V 0. 'ore interestingly+ r
2
U r U 1 V 0 %or
The sum o% t5o solutions o% a homogeneous linear recurrence relation is obviously also a solution+ and it can be
sho5n that any linear combination o% solutions is again a solution. There%ore+ the most general solution o% the
/ibonacci recurrence has the %orm
5here c1 and c2 are determined as solutions o% the linear e0uations derived %rom the initial conditions&
5hich yield
the complete solution %or the /ibonacci recurrence relation is there%ore
8ecurrence relations that are not linear 5ith constant coe%%icients have no general solution techni0ues
comparable to the one discussed above. ,eneral recurrence relations are solved Hor their solutions are
appro)imated or boundedI by trial#and#error techni0ues. $% the trial and error is guided by some general techni0ue+
it 5ill yield at least a good estimate o% the asymptotic behavior o% the solution o% most recurrence relations.
")ample
Consider the recurrence relation
1@1
5ith a d 0 and b d 0+ 5hich appears o%ten in the average#case analysis o% algorithms and data structures. :hen 5e
kno5 %rom the interpretation o% this recurrence that its solution is monotonically nondecreasing+ a systematic trial#
and#error process leads to the asymptotic behavior o% the solution. The simplest possible try is a constant+ )n

V c.
.ubstituting this into HI leads to
so )n V c is not a solution. .ince the le%t#hand side )n is smaller than an average o% previous values on the right#hand
side+ the solution o% this recurrence relation must gro5 %aster than c. !e)t+ 5e try a linear %unction )n

V c K n&
At this stage o% the analysis it su%%ices to %ocus on the leading terms o% each side& c K n on the le%t and Hc ] aI K n on
the right. The assumption a d 0 makes the right side larger than the le%t+ and 5e conclude that a linear %unction also
gro5s too slo5ly to be a solution o% the recurrence relation. A ne5 attempt 5ith a %unction that gro5s yet %aster+ )n V
c K n
2
+ leads to
Comparing the leading terms on both sides+ 5e %ind that the le%t side is no5 larger than the right+ and conclude
that a 0uadratic %unction gro5s too %ast. Aaving bounded the gro5th rate o% the solution %rom belo5 and above+ 5e
try %unctions 5hose gro5th rate lies bet5een that o% a linear and a 0uadratic %unction+ such as )n

V c K n
1.@
. A more
sophisticated approach considers a %amily o% %unctions o% the %orm )n

V c K n
1]e
%or any d 0& All o% them gro5 too
%ast. This suggests )n

V c K n K log2

n+ 5hich gives
5ith gHnI CHn K log nI and hHnI CHlog nI. To match the linear terms on each side+ 5e must choose c such that
or c V a K ln < o 1.3D> K a. Aence 5e no5 kno5 that the solution to the recurrence relation HI has the %orm
Asymptotic performance of divide)and)con1uer algorithms
:e illustrate the po5er o% the techni0ues developed in previous sections by analy6ing the asymptotic
per%ormance not o% a speci%ic algorithm+ but rather+ o% an entire class o% divide#and#con0uer algorithms. $n R(ivide
and con0uer recursionS 5e presented the %ollo5ing schema %or divide#and#con0uer algorithms that partition the set
o% data into t5o parts&
O*@+- if simple*@+ then return*O
0
*@++
else ". divide- partition @ into @
"
and @
$
&
$. conAuer- 5
"
-. O*@
"
+& 5
$
-. O*@
$
+&
%. comine- return*merge*5
"
, 5
$
++&
Assume %urther that the data set ( can al5ays be partitioned into t5o halves+ (1 and (2+ at every level o%
recursion. T5o comments are appropriate&
1. /or repeated halving to be possible it is not necessary that the si6e n o% the data set ( be a po5er o% 2+ n V
2
k
. $t is not important that ( be partitioned into t5o e)act halvesYappro)imate halves 5ill do. $magine
padding any data set ( 5hose si6e is not a po5er o% 2 5ith dummy elements+ up to the ne)t po5er o% 2.
(ummies can al5ays be %ound that do not disturb the real computation& %or e)ample+ by replicating
elements or by appending sentinels. 2adding is usually 3ust a conceptual trick that may help in
understanding the process+ but need not necessarily generate any additional data.
2. :hether or not the divide step is guaranteed to partition ( into t5o appro)imate halves+ on the other hand+
depends critically on the problem and on the data structures used. ")ample& Binary search in an ordered
array partitions ( into halves by probing the element at the midpointL the same idea is impractical in a
linked list because the midpoint is not directly accessible.
*nder our assumption o% halving+ the time comple)ity THnI o% algorithm A applied to data ( o% si6e n satis%ies
the recurrence relation
5here %HnI is the sum o% the partitioning or splitting time and the Estitching timeE re0uired to merge t5o solutions
o% si6e n Q2 into a solution o% si6e n. 8epeated substitution yields
The term n K TH1I e)presses the %act that every data item gets looked at+ the second sums up the splitting and
stitching time. Three typical cases occur&
HaI Constant time splitting and merging %HnI V c yields
1@3
THnI V HTH1I ] cI K n.
E*ample: /ind the ma)imum o% n numbers.
HbI Linear time splitting and merging %HnI V a K n ] b yields
THnI V a K n K log
2
n ] HTH1I ] bI K n.
E*amples: 'ergesort+ 0uicksort.
HcI ")pensive splitting and merging& n oH%HnII yields
THnI V n K TH1I ] CH%HnI K log nI
and there%ore rarely leads to interesting algorithms.
!ermutations
$nversions
Let Hak& 1 ` k ` nI be a permutation o% the integers 1 .. n. A pair Hai+ a3I+ 1 ` $ e 3 ` n+ is called an inversion i%% ai

d
a3. :hat is the average number o% inversions in a permutation? Consider all permutations in pairsL that is+ 5ith any
permutation A&
a1 V )1L a2 V )2L [ L an V )n
consider its inverse A;+ 5hich contains the elements o% A in inverse order&
a1 V )nL a2 V )nU1L [ L an V )1.
$n one o% these t5o permutations )i and )3 are in the correct order+ in the other+ they %orm an inversion. .ince
there are nK Hn U 1I Q2 pairs o% elements H)i+ )3I 5ith 1 ` i e 3 ` n there are+ on average+
inversions.
Average distance
Let Hak& 1 ` k ` nI be a permutation o% the natural numbers %rom 1 to n. The distance o% the element ai %rom its
correct position is \ai U i\. The total distance o% all elements %rom their correct positions is
There%ore+ the average total distance Hi.e. the average over all nM permutationsI is
Algorithms and Data Structures 1@< A ,lobal Te)t
Let 1 ` i ` n and 1 ` 3 ` n. Consider all permutations %or 5hich a
i
is e0ual to 3. .ince there are Hn U 1IM such
permutations+ 5e obtain
There%ore+
the average distance o% an element ai %rom its correct position is there%ore
Trees
(rees are ubi0uitous in discrete mathematics and computer science+ and this section summari6es some o% the
basic concepts+ terminology+ and results. Although trees come in di%%erent versions+ in the conte)t o% algorithms and
data structures+ EtreeE almost al5ays means an ordered rooted tree. An ordered rooted tree is either empty or it
consists o% a node+ called a root+ and a se0uence o% k ordered subtrees T1+ T2+ [ + Tk H")hibit 1>.2I. The nodes o% an
ordered tree that have only empty subtrees are called leaves or e)ternal nodes+ the other nodes are called internal
nodes H")hibit 1>.3I. The roots o% the subtrees attached to a node are its childrenL and this node is their parent.
1@@
")hibit 1>.2& 8ecursive de%inition o% a rooted+ ordered tree.
The level o% a node is de%ined recursively. The root o% a tree is at level 0. The children o% a node at level t are at
level t ] 1. The level o% a node is the length o% the path %rom the root o% the tree to this node. The height o% a tree is
de%ined as the ma)imum level o% all leaves. The path length o% a tree is the sum o% the levels o% all its nodes H")hibit
1>.3I.
")hibit 1>.3& A tree o% height V < and path length V 3@.
A binar tree is an ordered tree 5hose nodes have at most t5o children. A 0#2 binary tree is a tree in 5hich
every node has 6ero or t5o children but not one. A 0#2 tree 5ith n leaves has e)actly n U 1 internal nodes. A binary
tree o% height h is called complete Hcompletely balancedI i% it has 2
h]1
U 1 nodes H")hibit 1>.<. A binary tree o% height
h is called almost complete if all its leaves are on levels h U 1 and h+ and all leaves on level h are as %ar le%t as
possible H")hibit 1>.<I.
Algorithms and Data Structures 1@> A ,lobal Te)t
")hibit 1>.<& ")amples o% 5ell#balanced binary trees.
")ercises
1. .uppose that 5e are comparing implementations o% t5o algorithms on the same machine. /or inputs o% si6e
n+ the %irst algorithm runs in 9 K n
2
steps+ 5hile the second algorithm runs in D1 K n K log2

n steps. Assuming
that the steps in both algorithms take the same time+ %or 5hich values o% n does the %irst algorithm beat the
second algorithm?
2. :hat is the smallest value o% n such that an algorithm 5hose running time is 2@> K n
2
runs %aster than an
algorithm 5hose running time is 2
n
on the same machine?
3. /or each o% the %ollo5ing %unctions %iHnI+ determine a %unction gHnI such that %iHnI HgHnII. The %unction
gHnI should be as simple as possible.
%1HnI V 0.001 K n
7
] n
2
] 2 K n
%2HnI V n K log n ] log n ] 123< K n
%3HnI V @ K n K log n ] n
2
K log n ] n
2
%<HnI V @ K n K log n ] n
3
] n
2
K log n
<. 2rove %ormally that 102< K n
2
] @ K n Hn
2
I.
@. ,ive an asymptotically tight bound %or the %ollo5ing summation&
>. /ind the most general solutions to the %ollo5ing recurrence relations.
7. .olve the recurrence THnI V 2KTHI ] log2

n. =int: 'ake a change o% variables m V log2

n.
D. Compute the number o% inversions and the total distance %or the permutation H3 1 2 <I.
1@7
%9& Sorting and its
comple/ity
:hat is sorting?
basic ideas and intrinsic comple)ity
insertion sort
selection sort
merge sort
distribution sort
a lo5er bound HnK log nI
=uicksort
.orting in linear time?
sorting net5orks
What is sorting3 Ho* difficult is it3
The problem
Assume that . is a set o% n elements )1+ )2+ [ + )n dra5n %rom a domain W+ on 5hich a total order ` is de%ined Hi.e.
a relation that satis%ies the %ollo5ing a)iomsI&
J is reflexive *i.e x B- x J x+
J is antisymmetric *i.e x, y B- x J y y J x x . y+
J is transitive *i.e x, y, L B- x J y y J L x J L+
J is total *i.e. x, y B x J y y J x+
.orting is the process o% generating a se0uence
such that Hi1+ i2+ [ + inI is a permutation o% the integers %rom 1 to n and
holds. 2hrased abstractly+ sorting is the problem o% %inding a speci%ic permutation Hor one among a %e5
permutations+ 5hen distinct elements may have e0ual valuesI out o% nM possible permutations o% the n given
elements. *sually+ the set . o% elements to be sorted 5ill be given in a data structureL in this case+ the elements o% .
are ordered implicitly by this data structure+ but not necessarily according to the desired order `. Typical sorting
problems assume that . is given in an array or in a se0uential %ile Hmagnetic tapeI+ and the result is to be generated
in the same structure. :e characteri6e elements by their position in the structure He.g. ANiO in the array A or by the
Algorithms and Data Structures 1@D A ,lobal Te)t
1'. $orting and its comple&it%
value o% a pointer in a se0uential %ileI. The access operations provided by the underlying data structure determine
5hat sorting algorithms are possible.
Algorithms
'ost sorting algorithms are re%inements o% the %ollo5ing idea&
while *i, /+- i S / OFiG K OF/G do OFiG -.- OF/G&
5here &V& denotes the e)change operator. "ven sorting algorithms that do not e)plicitly e)change pairs o% elements+
or do not use an array as the underlying data structure+ can usually be thought o% as con%orming to the schema
above. An insertion sort+ %or e)ample+ takes one element at a time and inserts it in its proper place among those
already sorted. To %ind the correct place o% insertion+ 5e can think o% a ripple e%%ect 5hereby the ne5 element
successively displaces He)changes position 5ithI all those larger than itsel%.
As the schema above sho5s+ t5o types o% operations are needed in order to sort&
collecting in%ormation about the order o% the given elements
ordering the elements He.g. by e)changing a pairI
:hen designing an e%%icient algorithm 5e seek to economi6e the number o% operations o% both types& :e try to
avoid collecting redundant in%ormation+ and 5e hope to move an element as %e5 times as possible. The
nondeterministic algorithm given above lets us per%orm any one o% a number o% e)changes at a given time+
regardless o% their use%ulness. /or e)ample+ in sorting the se0uence
)1 V @+ )2

V

2+ )3 V 3+ )< V <+ )@ V 1
the nondeterministic algorithm permits any o% seven e)changes
)1 &V& )i %or 2 ` i ` @ and )3 &V& )@ %or 2 ` 3 ` <.
:e might have reached the state sho5n above by %ollo5ing an e)otic sorting techni0ue that sorts E%rom the
middle to5ard both endsE+ and 5e might kno5 at this time that the single e)change )1

&V& )@ 5ill complete the sort.
The nondeterministic algorithm gives us no handle to e)press and use this kno5ledge.
The attempt to economi6e 5ork %orces us to depart %rom nondeterminacy and to impose a control structure that
care%ully se0uences the operations to be per%ormed so as to make ma)imal use o% the in%ormation gained so %ar. The
resulting algorithms 5ill be more comple) and di%%icult to understand. $t is use%ul to remember+ though+ that
sorting is basically a simple problem 5ith a simple solution and that all the acrobatics in this chapter are due to our
0uest %or e%%iciency.
$ntrinsic comple)ity
There are obvious limits to ho5 much 5e can economi6e. $n the absence o% any previously ac0uired in%ormation+
it is clear that each element must be inspected and+ in general+ moved at least once. Thus 5e cannot hope to get
a5ay 5ith %e5er than HnI primitive operations. There are less obvious limits+ 5e mention t5o o% them here.
1. $% in%ormation is collected by asking binary 0uestions only Hany 0uestion that may receive one o% t5o
ans5ers He.g. a yesQno 0uestion+ or a comparison o% t5o elements that yields either ` or dI+ then at least n K
log2 n 0uestions are necessary in general+ as 5ill be proved in the section EA lo5er bound n K lognE. Thus in
this model o% computation+ sorting re0uires time Hn K log nI.
2. $n addition to collecting in%ormation+ one must rearrange the elements. $n the section E2ermutationE in
chapter 1>+ 5e have sho5n that in a permutation the average distance o% an element %rom its correct
1@9
position is appro)imately nQ3. There%ore elements have to move an average distance o% appro)imately nQ3
elements to end up at their destination. (epending on the access operations o% the underlying storage
structure+ an element can be moved to its correct position in a single step o% average length nQ3+ or in nQ3
steps o% average length 1. $% elements are rearranged by e)changing ad3acent elements only+ then on average
Hn
2
I moving operations are re0uired. There%ore+ short steps are insu%%icient to obtain an e%%icient Hn K log
nI sorting algorithm.
2ractical aspects o% sorting
!ecords instead of elements. :e discuss sorting assuming only that the elements to be sorted are dra5n
%rom a totally ordered domain. $n practice these elements are 3ust the keys o% records that contain additional data
associated 5ith the key& %or e)ample+
type recordtype . record
key- keytype& { totally ordered by 4 }
data- anytype
end&
:e use the relational operators V+ e+ ` to compare keys+ but in a given programming language+ say 2ascal+ these
may be unde%ined on values o% type keytype. $n general+ they must be replaced by procedures& %or e)ample+ 5hen
comparing strings 5ith respect to the le)icographic order.
$% the key %ield is only a small part o% a large record+ the e)change operation &V&+ interpreted literally+ becomes an
unnecessarily costly copy operation. This can be avoided by leaving the record Hor 3ust its data %ieldI in place+ and
only moving a small surrogate record consisting o% a key and a pointer to its associated record.
+ort generators. Cn many systems+ particularly in the 5orld o% commercial data processing+ you may never
need to 5rite a sorting program+ even though sorting is a %re0uently e)ecuted operation. .orting is taken care o% by
a sort generator+ a program akin to a compilerL it selects a suitable sorting algorithm %rom its repertoire and tailors
it to the problem at hand+ depending on parameters such as the number o% elements to be sorted+ the resources
available+ the key type+ or the length o% the records.
Partially sorted se8uences. The algorithms 5e discuss ignore any order that may e)ist in the se0uence to be
sorted. 'any applications call %or sorting %iles that are almost sorted+ %or e)ample+ the case 5here a sorted master
file is updated 5ith an unsorted transaction file. .ome algorithms take advantage o% any order present in the input
dataL their time comple)ity varies %rom CHnI %or almost sorted %iles to CHn K log nI %or randomly ordered %iles.
Types of sorting algorithms
T5o important classes o% incremental sorting algorithms create order by processing each element in turn and
placing it in its correct position. These classes+ insertion sorts and selection sorts+ are best understood as
maintaining t5o dis3oint+ mutually e)haustive structures called ;sorted; and ;unsorted;.
!nitialiLe- ,sorted, -. Y& ,unsorted, -. =x
"
, x
$
, C , x
n
>&
8oop- for i -. " to n do
move an element from ,unsorted, to its correct place in
,sorted,&
The %ollo5ing illustrations sho5 ;sorted; and ;unsorted; sharing an arrayN1 .. nO. $n this case the boundary
bet5een ;sorted; and ;unsorted; is represented by an inde) i that increases as more elements become ordered. The
important distinction bet5een the t5o types o% sorting algorithms emerges %rom the 0uestion& $n 5hich o% the t5o
Algorithms and Data Structures 1>0 A ,lobal Te)t
structures is most o% the 5ork done? $nsertion sorts remove the %irst or most easily accessible element %rom
;unsorted; and search through ;sorted; to %ind its proper place. .election sorts search through ;unsorted; to %ind the
ne)t element to be appended to ;sorted;.
$nsertion sort
The i#th step inserts the i#th element into the sorted se0uence o% the %irst Hi U 1I elements ")hibit 17.1I.
")hibit 17.1& $nsertion sorts move an easily accessed element to its correct place.
.election sort
The i#th step selects the smallest among the n U i ] 1 elements not yet sorted+ and moves it to the i#th position
H")hibit 17.2I.
")hibit 17.2& .election sorts search %or the correct element to move to an easily accessed place.
$nsertion and selection sorts repeatedly search through a large part o% the entire data to %ind the proper place o%
insertion or the proper element to be moved. "%%icient search re0uires random access+ hence these sorting
techni0ues are used primarily %or internal sorting in central memory.
'erge sort
'erge sorts process HsubIse0uences o% elements in unidirectional order and thus are 5ell suited %or e*ternal
sorting on secondary storage media that provide se0uential access only+ such as magnetic tapesL or random access
to large blocks o% data+ such as disks. 'erge sorts are also e%%icient %or internal sorting. The basic idea is to merge
t5o sorted se0uences o% elements+ called runs+ into one longer sorted se0uence. :e read each o% the input runs+ and
5rite the output run+ starting 5ith small elements and ending 5ith the large ones. :e keep comparing the smallest
o% the remaining elements on each input run+ and append the smaller o% the t5o to the output run+ until both input
runs are e)hausted H")hibit 17.3I.
1>1
")hibit 17.3& 'erge sorts e)ploit order already present.
The processor sho5n at le%t in ")hibit 17.< reads t5o tapes+ A and B. Tape A contains runs 1 and 2L tape B
contains runs 3 and <. The processor merges runs 1 and 3 into the single run 1 r 3 on tape C+ and runs 2 and < into
the single run 2 r < on tape (. $n a second merge step+ the processor sho5n at the right reads tapes C and ( and
merges the t5o runs 1 r 3 and 2 r < into one run+ 1 r 3 r 2 r <.
")hibit 17.<& T5o merge steps in se0uence.
(istribution sort
(istribution sorts process the representation o% an element as a value in a radi) number system and use
primitive arithmetic operations such as Ee)tract the k#th digitE. These sorts do not compare elements directly. They
introduce a di%%erent model o% computation than the sorts based on comparisons+ e)changes+ insertions+ and
deletions that 5e have considered thus %ar. As an e)ample+ consider numbers 5ith at most three digits in radi) <
representation. $n a %irst step these numbers are distributed among %our 0ueues according to their least signi%icant
digit+ and the 0ueues are concatenated in increasing order. The process is repeated %or the middle digit+ and %inally
%or the le%tmost+ most signi%icant digit+ as sho5n in ")hibit 17.@
")hibit 17.@ (istribution sorts use the radi) representation o% keys to organi6e elements in buckets
:e have no5 seen the basic ideas on 5hich all sorting algorithms are built. $t is more important to understand
these ideas than to kno5 do6ens o% algorithms based on them. To appreciate the intricacy o% sorting+ you must
understand some algorithms in detail& 5e begin 5ith simple ones that turn out to be ine%%icient.
Simple sorting algorithms that *or( in time Cn
-
D
$% you invent your o5n sorting techni0ue 5ithout prior study o% the literature+ you 5ill probably EdiscoverE a
5ell#kno5n ine%%icient algorithm that 5orks in time CHn
2
I+ re0uires time Hn
2
I in the 5orst case+ and thus is o% time
comple)ity Hn
2
I. Xour algorithm might be similar to one described belo5.
Consider in-place algorithms that 5ork on an array declared as
var O- arrayF" .. nG of elt&
and place the elements in ascending order. Assume that the comparison operators are de%ined on values o% type elt.
Let cbest+ caverage+ and c5orst denote the number o% comparisons+ and ebest+ eaverage+ and e5orst the number o% e)change
operations per%ormed in the best+ average+ and 5orst case+ respectively. Let invaverage denote the average number o%
inversions in a permutation.
$nsertion sort H")hibit 17.>I
Let U_ denote a constant ` any key value. The smallest value in the domain o%ten serves as a sentinel U_.
1>3
")hibit 17.>& .traight insertion propagates a ripple#e%%ect across the sorted part o% the array.
OF0G -. 0R&
/ -. i&
while OF/G S OF/ 0 "G do { OF/G -.- OF/ 0 "G& { exchange }
/ -. / 0 " }
end&
This straight insertion sort is an HnI algorithm in the best case and an Hn
2
I algorithm in the average and 5orst
cases. $n the program above+ the point o% insertion is %ound by a linear search interleaved 5ith e)changes. A binary
search is possible but does not improve the time comple)ity in the average and 5orst cases+ since the actual
insertion still re0uires a linear#time ripple o% e)changes.
.election sort H")hibit 17.7I
")hibit 17.7& .traight selection scans the unsorted part o% the array.
for i -. " to n 0 " do egin
minindex -. i& minkey -. OFiG&
for / -. i 9 " to n do
if OF/G S minkey then { minkey -. OF/G& minindex -. / }
OFiG -.- OFminindexG { exchange }
end&
Algorithms and Data Structures 1>< A ,lobal Te)t
The sum in the %ormula %or the number o% comparisons re%lects the structure o% the t5o nested %or loops. The
body o% the inner loop is e)ecuted the same number o% times %or each o% the three cases. Thus this straight selection
sort is o% time comple)ity Hn
2
I.
A lo*er bound Cn E log nD
A straight%or5ard counting argument yields a lo5er bound on the time comple)ity o% any sorting algorithm that
collects in%ormation about the ordering o% the elements by asking only binary 0uestions. A binary 0uestion has a
t5o#valued ans5er& yes or no+ true or %alse. A comparison o% t5o elements+ ) ` y+ is the most obvious e)ample+ but
the %ollo5ing theorem holds %or binary 0uestions in general.
Theorem: Any sorting algorithm that collects in%ormation by asking binary 0uestions only e)ecutes at least
binary 0uestions both in the 5orst case+ and averaged over all nM permutations. Thus the average and 5orst#case
time comple)ity o% such an algorithm is Hn K log nI.
Proof: A sorting algorithm o% the type considered here can be represented by a binar decision tree. "ach
internal node in such a tree represents a binary 0uestion+ and each lea% corresponds to a result o% the decision
process. The decision tree must distinguish each o% the nM possible permutations o% the input data %rom all the
othersL and thus must have at least nM leaves+ one %or each permutation.
E*ample: The decision tree sho5n in ")hibit 17.D collects the in%ormation necessary to sort three elements+ )+ y
and 6+ by comparisons bet5een t5o elements.
")hibit 17.D The decision tree sho5s the possible nM Cutcomes 5hen sorting n elements.
1>@
The average number o% binary 0uestions needed by a sorting algorithm is e0ual to the average depth o% the
leaves o% this decision tree. The lemma %ollo5ing this theorem 5ill sho5 that in a binary tree 5ith k leaves the
average depth o% the leaves is at least log2k. There%ore+ the average depth o% the leaves corresponding to the nM
permutations is at least log2nM. .ince
it %ollo5s that on average at least
nlog
2
n1
n
ln
2
binary 0uestions are needed+ that is+ the time comple)ity o% each such sorting algorithm is Hn K log nI in the
average+ and there%ore also in the 5orst case.
Lemma: $n a binary tree 5ith k leaves the average depth o% the leaves is at least log2k.
Proof: .uppose that the lemma is not true+ and let T be the countere)ample 5ith the smallest number o% nodes.
T cannot consist o% a single node because the lemma is true %or such a tree. $% the root r o% T has only one child+ the
subtree T; rooted at this child 5ould contain the k leaves o% T that have an even smaller average depth in T; than in
T. .ince T 5as the countere)ample 5ith the smallest number o% nodes+ such a T; cannot e)ist. There%ore+ the root r
o% T must have t5o children+ and there must be kL d 0 leaves in the le%t subtree and k8

d 0 leaves in the right subtree
o% r HkL ] k8

V kI. .ince T 5as chosen minimal+ the kL leaves in the le%t subtree must have an average depth o% at least
log2

kL+ and the k8 leaves in the right subtree must have an average depth o% at least log2

k8. There%ore+ the average
depth o% all k leaves in T must be at least
$t is easy to see that HI assumes its minimum value i% kL

V k8. .ince ( )has the value log2

k i% kL

V k8

V k Q 2 5e have
%ound a contradiction to our assumption that the lemma is %alse.
.uic(sort
=uicksort HC. A. 8. Aoare+ 19>2I NAoa >2O combines the po5er%ul algorithmic principle o% divide#and#
con0uer 5ith an e%%icient 5ay o% moving elements using %e5 e)changes. The divide phase partitions the array into
t5o dis3oint parts& the EsmallE elements on the le%t and the ElargeE elements on the right. The con,uer phase sorts
each part separately. Thanks to the 5ork o% the divide phase+ the merge phase re0uires no 5ork at all to combine
t5o partial solutions. =uicksort;s e%%iciency depends crucially on the e)pectation that the divide phase cuts t5o
si6able subarrays rather than merely slicing o%% an element at either end o% the array H")hibit 17.9I.
Algorithms and Data Structures 1>> A ,lobal Te)t
")hibit 17.9& =uicksort partitions the array into the EsmallE elements on the le%t and the ElargeE elements
on the right.
:e chose an arbitrary threshold value m to de%ine EsmallE as ` m+ and ElargeE as Z m+ thus ensuring that any
Esmall elementE ` any Elarge elementE. :e partition an arbitrary subarray ANL .. 8O to be sorted by e)ecuting a le%t#
to#right scan Hincrementing an inde) iI EconcurrentlyE 5ith a right#to#le%t scan Hdecrementing 3I H")hibit 17.10I. The
le%t#to#right scan pauses at the %irst element ANiO Z m+ and the right#to#le%t scan pauses at the %irst element AN3O ` m.
:hen both scans have paused+ 5e e)change ANiO and AN3O and resume the scans. The partition is complete 5hen the
t5o scans have crossed over 5ith 3 e i. Therea%ter+ 0uicksort is called recursively %or ANL .. 3O and ANi .. 8O+ unless one
or both o% these subarrays consists o% a single element and thus is trivially sorted. E*ample o% partitioning Hm V 1>I&
$( $% % "' # ) $9 '
i /
' $% % "' # ) $9 $(
i /
' ) % "' # $% $9 $(
i /
' ) % # "' $% $9 $(
/ i
")hibit 17.10& .canning the array concurrently %rom le%t to right and %rom right to le%t.
Although the threshold value m appeared arbitrary in the description above+ it must meet criteria o% correctness
and e%%iciency. 0orrectness: i% either the set o% elements ` m or the set o% elements Z m is empty+ 0uicksort %ails to
terminate. Thus 5e re0uire that minH)iI ` m ` ma)H)iI. Efficienc re0uires that m be close to the median.
Ao5 do 5e %ind the median o% n elements? The obvious ans5er is to sort the elements and pick the middle one+
but this leads to a chicken#and#egg problem 5hen trying to sort in the %irst place. There e)ist sophisticated
algorithms that determine the e)act median o% n elements in time CHnI in the 5orst case NB/28T 72O. The
multiplicative constant might be large+ but %rom a theoretical point o% vie5 this does not matter. The elements are
partitioned into t5o e0ual#si6ed halves+ and 0uicksort runs in time CHn K log nI even in the 5orst case. /rom a
practical point o% vie5+ ho5ever+ it is not 5orth5hile to spend much e%%ort in %inding the e)act median 5hen there
are much cheaper 5ays o% %inding an acceptable appro)imation. The %ollo5ing techni0ues have all been used to pick
a threshold m as a Eguess at the medianE&
An array element in a %i)ed position such as ANHL ] 8I div 2O. Warning: stay a5ay %rom either end+ ANLO or
AN8O+ as these thresholds lead to poor per%ormance i% the elements are partially sorted.
An array element in a random position& a simple techni0ue that yields good results.
The median o% three or %ive array elements in %i)ed or random positions.
1>7
The average bet5een the smallest and largest element. This re0uires a separate scan o% the entire array in
the beginningL therea%ter+ the average %or each subarray can be calculated during the previous partitioning
process.
The recursive procedure ;r0s; is a possible implementation o% 0uicksort. The %unction ;guessmedian; must yield a
threshold that lies on or bet5een the smallest and largest o% the elements to be sorted. $% an array element is used as
the threshold+ the procedure ;r0s; should be changed in such a 5ay that a%ter %inishing the partitioning process this
element is in its %inal position bet5een the le%t and right parts o% the array.
procedure rAs *8, 5- " .. n+& { sorts <708, C , <7I8 }
var i, /- 0 .. n 9 "&
procedure partition&
var m- elt&
egin { partition }
m -. guessmedian *8, 5+&
{ min(<708, C , <7I8 4 m 4 max(<708, C , <7I8 }
i -. 8& / -. 5&
repeat
{ <708, C , <7i D &8 4 m 4 <7j B &8, C , <7I8 }
while OFiG S m do i -. i 9 "&
{ <708, C , <7i D &8 4 m 4 <7i8 }
while m S OF/G do / -. / 0 "&
{ <7j8 4 m 4 <7j B &8, C , <7I8 }
if i J / then egin
OFiG -.- OF/G& { exchange }
{ i 4 j <7i8 4 m 4 <7j8 }
i -. i 9 "& / -. / 0 "
{ <708, C , <7i D &8 4 m 4 <7j B &8, C , <7I8 }
end
else
{ i 6 j i ! j B & exit }
end
until i K /
end& { partition }
egin { rJs }
partition&
if 8 S / then rAs*8, /+&
if i S 5 then rAs*i, 5+
end& { rJs }
An initial call ;r0sH1+ nI; 5ith n d 1 guarantees that L e 8 holds %or each recursive call.
An iterative implementation o% 0uicksort is given by the %ollo5ing procedure+ ;i0s;+ 5hich sorts the 5hole array
AN1 .. nO. The boundaries o% the subarrays to be sorted are maintained on a stack.
procedure iAs&
const stacklength . C &
type stackelement . record 8, 5- " .. n end&
var i, /, 8, 5, s- 0 .. n&
stack- arrayF" .. stacklengthG of stackelement&
procedure partition& { same as in rJs }
end& { partition }
egin { iJs }
s -. "& stackF"G.8 -. "& stackF"G.5 -. n&
repeat
8 -. stackFsG.8& 5 -. stackFsG.5& s -. s 0 "&
Algorithms and Data Structures 1>D A ,lobal Te)t
repeat
partition&
if / 0 8 S 5 0 i then egin
if i S 5 then { s -. s 9 "& stackFsG.8 -. i&
stackFsG.5 -. 5 }&
5 -. /
end
else egin
if 8 S / then { s -. s 9 "& stackFsG.8 -. 8&
stackFsG.5 -. / }&
8 -. i
end
until 8 I 5
until s . 0
end& { iJs }
A%ter partitioning+ ;i0s; pushes the bounds o% the larger part onto the stack+ thus making sure that part 5ill be
sorted later+ and sorts the smaller part %irst. Thus the length o% the stack is bounded by log2n.
/or very small arrays+ the overhead o% managing a stack makes 0uicksort less e%%icient than simpler CHn
2
I
algorithms+ such as an insertion sort. A practically e%%icient implementation o% 0uicksort might s5itch to another
sorting techni0ue %or subarrays o% si6e up to 10 or 20. N.ed 7DO is a comprehensive discussion o% ho5 to optimi6e
0uicksort.
Analysis for three cases# best$ =typical=$ and *orst
Consider a 0uicksort algorithm that chooses a guessed median that di%%ers %rom any o% the elements to be sorted
and thus partitions the array into t5o parts+ one 5ith k elements+ the other 5ith n U k elements. The 5ork 0HnI
re0uired to sort n elements satis%ies the recurrence relation
The constant b measures the cost o% calling 0uicksort %or the array to be sorted. The term a K n covers the cost o%
partitioning+ and the terms 0HkI and 0Hn U kI correspond to the 5ork involved in 0uicksorting the t5o subarrays.
'ost 0uicksort algorithms partition the array into three parts& the EsmallE le%t part+ the single array element used to
guess the median+ and the ElargeE right part. Their 5ork is e)pressed by the e0uation
:e analy6e e0uation HgIL it is close enough to the second e0uation to have the same asymptotic solution.
=uicksort;s behavior in the best and 5orst cases are easy to analy6e+ but the average over all permutations is not.
There%ore+ 5e analy6e another average 5hich 5e call the tpical case.
=uicksort;s best-case behavior is obtained i% 5e guess the correct median that partitions the array into t5o
e0ual#si6ed subarrays. /or simplicity;s sake the %ollo5ing calculation assumes that n is a po5er o% 2+ but this
assumption does not a%%ect the solution. Then HgI can be re5ritten as
:e use this recurrence e0uation to calculate
1>9
and substitute on the right#hand side to obtain
8epeated substitution yields
The constant 0H1I+ 5hich measures 0uicksort;s 5ork on a trivially sorted array o% length 1+ and b+ the cost o% a
single procedure call+ do not a%%ect the dominant term n K log2n. The constant %actor a in the dominant term can be
estimated by analy6ing the code o% the procedure ;partition;. :hen these details do not matter+ 5e summari6e&
=uicksort;s time comple)ity in the best case is Hn K log nI.
=uicksort;s worst-case behavior occurs 5hen one o% the t5o subarrays consists o% a single element a%ter each
partitioning. $n this case e0uation HI becomes
:e use this recurrence e0uation to calculate
and substitute on the right#hand side to obtain
8epeated substitution yields
There%ore the time comple)ity o% 0uicksort in the 5orst case is Hn
2
I.
/or the analysis o% 0uicksort;s tpical behavior 5e make the plausible assumption that the array is e0ually likely
to get partitioned bet5een any t5o o% its elements& /or all k+ 1 ` k e n+ the probability that the array A is partitioned
into the subarrays AN1 .. kO and ANk ] 1 .. nO is 1 Q Hn U 1I. Then the average 5ork to be per%ormed by 0uicksort is
e)pressed by the recurrence relation
This recurrence relation appro)imates the recurrence relation discussed in chapter 1> 5ell enough to have the
same solution
.ince ln < o 1.3D>+ 0uicksort;s asymptotic behavior in the typical case is only about <0s 5orse than in the best
case+ and remains in Hn K log nI. N.ed 77O is a thorough analysis o% 0uicksort.
'erging and merge sorts
The internal sorting algorithms presented so %ar re0uire direct access to each element. This is re%lected in our
analyses by treating an array access ANiO+ or each e)change ANiO &V& AN3O+ as a primitive operation 5hose cost is
constant Hindependent o% nI. This assumption is not valid %or elements stored on secondary storage devices such as
magnetic tapes or disks. A better assumption that mirrors the realities o% e*ternal sorting is that the elements to be
sorted are stored as a se,uential file %. The %ile is accessed through a %ile pointer 5hich+ at any given time+ provides
direct access to a single element. Accessing other elements re0uires repositioning o% the %ile pointer. .e0uential %iles
may permit the pointer to advance in one direction only+ as in the case o% 2ascal %iles+ or to move back5ard and
%or5ard. $n either case+ our theoretical model assumes that the time re0uired %or repositioning the pointer is
proportional to the distance traveled. This assumption obviously %avors algorithms that process Hcompare+
e)changeI pairs o% ad3acent elements+ and penali6es algorithms such as 0uicksort that access elements in random
positions.
The %ollo5ing e)ternal sorting algorithm is based on the merge sort principle. To make optimal use o% the
available main memory+ the algorithm %irst creates initial runsL a run is a sorted subse0uence o% elements %i+ %i]1+ [ +
%3 stored consecutively in %ile %+ %k

` %k]1 %or all k 5ith i ` k ` 3 U 1. Assume that a bu%%er o% capacity m elements is
available in main memory to create initial runs o% length m Hperhaps less %or the last runI. $n processing the r#th
run+ r V 0+ 1+ [ + 5e read the m elements %rKm]1+ %rKm]2+ [ + %rKm]m into memory+ sort them internally+ and 5rite the sorted
se0uence to a modi%ied %ile %+ 5hich may or may not reside in the same physical storage area as the original %ile %.
This ne5 %ile % is partially sorted into runs& %k ` %k]1 %or all k 5ith r K m ] 1 ` k e r K m ] m.
At this point 5e need t5o %iles+ g and h+ in addition to the %ile %+ 5hich contai ns the initial runs. $n a cop phase
5e distribute the initial runs by copying hal% o% them to g+ the other hal% to h. $n the subse0uent merge phase each
run o% g is merged 5ith e)actly one run o% h+ and the resulting ne5 run o% double length is 5ritten onto % H ")hibit
17.11I. A%ter the %irst cycle+ consisting o% a copy phase %ollo5ed by a merge phase+ % contains hal% as many runs as it
did be%ore. A%ter log2Hn QmI cycles % contains one single run+ 5hich is the sorted se0uence o% all elements.
171
")hibit 17.11& "ach copy#merge cycle halves the number o% runs and doubles their lengths.
")ercise& a merge sort in main memory
Consider the %ollo5ing procedure that sorts the array A&
const n . C &
var O- arrayF" .. nG of integer&
C
procedure sort *8, 5- " .. n+&
var m- " .. n&
procedure comine&
var ;- array F" .. nG of integer&
i, /, k- " .. n&
egin { combine }
i -. 8& / -. m 9 "&
for k -. 8 to 5 do
if *i K m+ cor **/ J 5+ cand *OF/G S OFiG++ then
{ ;FkG -. OF/G& / -. / 9 " }
else
{ ;FkG -. OFiG& i -. i 9 " } &
for k -. 8 to 5 do OFkG -. ;FkG
end& { combine }
egin { sort}
if 8 S 5 then
{ m -. *8 9 5+ div $& sort*8, m+& sort*m 9 ", 5+& comine }
end& { sort }
The relational operators ;cand; and ;cor; are conditionalM The procedure is initially called by
sort*",n+&
HaI (ra5 a picture to sho5 ho5 ;sort; 5orks on an array o% eight elements.
HbI :rite do5n a recurrence relation to describe the 5ork done in sorting n elements.
HcI (etermine the asymptotic time comple)ity by solving this recurrence relation.
HdI Assume that ;sort; is called %or m subarrays o% e0ual si6e+ not 3ust %or t5o. Ao5 does the asymptotic time
comple)ity change?
.olution
HaI ;sort; depends on the algorithmic principle o% divide and con0uer. A%ter dividing an array into a le%t and a
right subarray 5hose numbers o% elements di%%er by at most one+ ;sort; calls itsel% recursively on these t5o
subarrays. A%ter these t5o calls are %inished+ the procedure ;combine; merges the t5o sorted subarrays
ANL .. mO and ANm ] 1 .. 8O together in B. /inally+ B is copied to A. An e)ample is sho5n in ")hibit 17.12.
")hibit 17.12& .orting an array by using a divide#and#con0uer scheme.
HbI The 5ork 5HnI per%ormed 5hile sorting n elements satis%ies
The %irst term describes the cost o% the t5o recursive calls o% ;sort;+ the term a K n is the cost o% merging the
t5o sorted subarrays+ and the constant b is the cost o% calling ;sort; %or the array.
HcI $%
is substituted in HgI+ 5e obtain
Continuing this substitution process results in
173
since 5H1I is constant the time comple)ity o% ;sort; is Hn K log nI.
HdI $% ;sort; is called recursively %or m subarrays o% e0ual si6e+ the cost 5;HnI is
solving this recursive e0uation sho5s that the time comple)ity does not change Ni.e. it is Hn K log nIO.
"s it possible to sort in linear time3
The lo5er bound Hn K log nI has been derived %or sorting algorithms that gather in%ormation about the ordering
o% the elements by binary 0uestions and nothing else. This lo5er bound need not apply in other situations.
")ample 1& sorting a permutation o% the integers %rom 1 to n
$% 5e kno5 that the elements to be sorted are a permutation o% the integers 1 .. n+ it is possible to sort in time
HnI by storing element i in the array element 5ith inde) i.
")ample 2& sorting elements %rom a %inite domain
Assume that the elements to be sorted are samples %rom a %inite domain : V 1 .. 5. Then it is possible to sort in
time HnI i% gaps bet5een the elements are allo5ed H")hibit 17.13I. The gaps can be closed in time H5I.
")hibit 17.13& .orting elements %rom a %inite domain in linear time.
(o these e)amples contradict the lo5er bound Hn K log nI? !o+ because in these e)amples the in%ormation
about the ordering o% elements is obtained by asking 0uestions more po5er%ul than binary 0uestions& namely+ n#
valued 0uestions in ")ample 1 and 5#valued 0uestions in ")ample 2.
A k#valued 0uestion is e0uivalent to log2k binary 0uestions. :hen this Ee)change rateE is taken into
consideration+ the theoretical time comple)ities o% the t5o sorting techni0ues above are Hn K log nI and Hn K log
5I+ respectively+ thus con%orming to the lo5er bound in the section EA lo5er bound Hn K log nIE.
.orting algorithms that sort in linear time He)pected linear time+ but not in the 5orst caseI are described in the
literature under the terms bucket sort+ distribution sort+ and radi* sort.
Sorting net*or(s
The sorting algorithms above are designed to run on a se0uential machine in 5hich all operations+ such as
comparisons and e)changes+ are per%ormed one at a time 5ith a single processor. $% algorithms are to be e%%icient+
they need to be rethought 5hen the ground rules %or their e)ecution change& 5hen the theoretician uses another
model o% computation+ or 5hen they are e)ecuted on a computer 5ith a di%%erent architecture. This is particularly
true o% the many di%%erent types o% multiprocessor architectures that have been built or conceived. :hen many
processors are available to share the 5orkload+ 0uestions o% ho5 to distribute the 5ork among them+ ho5 to
synchroni6e their operation+ and ho5 to transport data+ prevail. $t is not our intention to discuss sorting on general#
purpose parallel machines. :e 5ish to illustrate the point that algorithms must be redesigned 5hen the model o%
computation changes. /or this purpose a discussion o% special#purpose sorting net5orks su%%ices. The EprocessorsE
in a sorting net5ork are merely comparators& Their only %unction is to compare the values on t5o input 5ires and
s5itch them onto t5o output 5ires such that the smaller is on top+ the larger at the bottom H")hibit 17.1<I.
")hibit 17.1<& Building block o% sorting net5orks.
Comparators are arranged into a net5ork in 5hich n 5ires enter at the le%t and n 5ires e)it at the right+ as
")hibit 17.1@ sho5s+ 5here each vertical connection 3oining a pair o% 5ires represents a comparator. The illustration
also sho5s 5hat happens to %our input elements+ chosen to be <+ 1+ 3+ 2 in this e)ample+ as they travel %rom le%t to
right through the net5ork.
")hibit 17.1@& A comparator net5ork that %ails to sort. The output o% each
comparator per%orming an e)change is sho5n in the ovals.
A net5ork o% comparators is a sorting network i% it sorts every input con%iguration. :e consider an input
con%iguration to consist o% distinct elements+ so that 5ithout loss o% generality 5e may regard it as one o% the nM
permutations o% the se0uence H1+ 2+ [ + nI. A net5ork that sorts a duplicate#%ree con%iguration 5ill also sort a
con%iguration containing duplicates.
The comparator net5ork above correctly sorts many o% its <M V 2< input con%igurations+ but it %ails on the
se0uence H<+ 1+ 3+ 2I. Aence it is not a sorting net5ork. $t is evident that a net5ork 5ith a su%%icient number o%
comparators in the right places 5ill sort correctly+ but as the e)ample above sho5s+ it is not immediately evident
5hat number su%%ices or ho5 the comparators should be placed. The net5ork in ")hibit 17.1> sho5s that %ive
comparators+ arranged 3udiciously+ su%%ice to sort %our elements.
")hibit 17.1>& /ive comparators su%%ice to sort %our elements.
Ao5 can 5e tell i% a given net5ork sorts success%ully? ")haustive testing is %easible %or small net5orks such as
the one above+ 5here 5e can trace the %lo5 o% all <M V 2< input con%igurations. !et5orks 5ith a regular structure
17@
c
1
c
2
c
3
c
4
c
5
usually admit a simpler correctness proo%. /or this e)ample+ 5e observe that c1+ c2+ and c3 place the smallest element
on the top 5ire. .imilarly+ c1+ c2+ and c< place the largest on the bottom 5ire. This leaves the middle t5o elements on
the middle t5o 5ires+ 5hich c@ then puts into place.
:hat design principles might lead us to create large sorting net5orks guaranteed to be correct? .orting
algorithms designed %or a se0uential machine cannot+ in general+ be mapped directly into net5ork notation+
because the net5ork is a more restricted model o% computation& :hereas most se0uential sorting algorithms make
comparisons based on the outcome o% previous comparisons+ a sorting net5ork makes the same comparisons %or all
input con%igurations. The same %undamental algorithm design principles use%ul 5hen designing se0uential
algorithms also apply to parallel algorithms.
Divide-and-con,uer. 2lace t5o sorting net5orks %or n 5ires ne)t to each other+ and combine them into a sorting
net5ork %or 2 K n 5ires by appending a merge network to merge their outputs. $n se0uential computation merging
is simple because 5e can choose the most use%ul comparison depending on the outcome o% previous comparisons.
The rigid structure o% comparator net5orks makes merging net5orks harder to design.
Incremental algorithm.:e place an n#th 5ire ne)t to a sorting net5ork 5ith n U 1 5ires+ and either precede or
%ollo5 the net5ork by a EladderE o% comparators that tie the e)tra 5ire into the e)isting net5ork+ as sho5n in the
%ollo5ing %igures. This leads to designs that mirror the straight insertion and selection algorithms in the section
E.imple sorting algorithms that 5ork in time Hn
2
I
Insertion sort. :ith the top n U 1 elements sorted+ the element on the bottom 5ire trickles into its correct place.
$nduction yields the e)panded diagram on the right in ")hibit 17.17.
")hibit 17.17& $nsertion sort leads by induction to the sorting net5ork on the right.
Selection sort. The ma)imum element %irst trickles do5n to the bottom+ then the remaining elements are sorted.
The e)panded diagram is on the right in ")hibit 17.1D.
")hibit 17.1D& .election sort leads by induction to the sorting net5ork on the right.
Comparators can be shi%ted along their pair o% 5ires so as to reduce the number o% stages+ provided that the
topology o% the net5ork remains unchanged. This compression reduces both insertion and selection sort to the
triangular net5ork sho5n in ")hibit 17.19. Thus 5e see that the distinction bet5een insertion and selection 5as
more a distinction o% se0uential order o% operations rather than one o% data %lo5.
")hibit 17.19& .hi%ting comparators reduces the number o% stages.
Any number o% comparators that are aligned vertically re0uire only a single unit o% time. The compressed
triangular net5ork has CHn
2
I comparators+ but its time comple)ity is 2 K n U 1 CHnI. There are net5orks 5ith
better asymptotic behavior+ but they are rather e)otic N-nu 73bO.
1. $mplement insertion sort+ selection sort+ merge sort+ and 0uicksort and animate the sorting process %or each
o% these algorithms& %or e)ample+ as sho5n in the snapshots in RAlgorithm animationS. Compare the
number o% comparisons and e)change operations needed by the algorithms %or di%%erent input
con%igurations.
2. :hat is the smallest possible depth o% a lea% in a decision tree %or a sorting algorithm?
3. .ho5 that 2 K n U 1 comparisons are necessary in the 5orst case to merge t5o sorted arrays containing n
elements each.
<. The most obvious method o% systematically interchanging the out#o%#order pairs o% elements in an array
var A& arrayN1 .. nO o% eltL
is to scan ad3acent pairs o% elements %rom bottom to top Himagine that the array is dra5n vertically+ 5ith
AN1O at the top and ANnO at the bottomI repeatedly+ interchanging those %ound out o% order&
for i -. " to n 0 " do
for / -. n downto i 9 " do
if OF/ 0 "G K OF/G then OF/ 0 "G -.- OF/G&
This techni0ue is kno5n as bubble sort$ since smaller elements Ebubble upE to the top.
HaI ")plain by 5ords+ %igures+ and an e)ample ho5 bubble sort 5orks. .ho5 that this algorithm sorts
correctly.
HbI (etermine the e)act number o% comparisons and e)change operations that are per%ormed by bubble
sort in the best+ average+ and 5orst case.
HcI :hat is the 5orst#case time comple)ity o% this algorithm?
@. A sorting algorithm is called stable i% it preserves the original order o% e0ual elements. :hich o% the sorting
algorithms discussed in this chapter is stable?
>. Assume that 0uicksort chooses the threshold m as the %irst element o% the se0uence to be sorted. .ho5 that
the running time o% such a 0uicksort algorithm is Hn
2
I 5hen the input array is sorted in nonincreasing or
nondecreasing order.
7. /ind a 5orst#case input con%iguration %or a 0uicksort algorithm that chooses the threshold m as the median
o% the %irst+ middle+ and last elements o% the se0uence to be sorted.
D. Array A contains m and array B contains n di%%erent integers 5hich are not necessarily ordered&
const m . C & { length of array < }
n . C & { length of array K }
var O- arrayF" .. mG of integer&
;- arrayF" .. nG of integer&
177
A duplicate is an integer that is contained in both A and B. Problem: Ao5 many duplicates are there in A
and B?
HaI (etermine the time comple)ity o% the brute#%orce algorithm that compares each integer contained in
one array to all integers in the other array.
HbI :rite a more e%%icient
function duplicates- integer&
Xour solution may rearrange the integers in the arrays.
HcI :hat is the 5orst#case time comple)ity o% your improved algorithm?
!art B# Data structures
The tools o% bookkeeping
:hen thinking o% algorithms 5e emphasi6e a dynamic se0uence o% actions& ETake this and do that+ then that+
then [ .E $n human e)perience+ EtakeE is usually a straight%or5ard operation+ 5hereas EdoE means 5ork. $n
programming+ on the other hand+ there are lots o% interesting e)amples 5here EdoE is nothing more comple) than
incrementing a counter or setting a bitL but EtakeE triggers a long+ sophisticated search. :hy do 5e need %ancy data
structures at all? :hy can;t 5e 3ust spread out the data on a desk top? "veryday e)perience does not prepare us to
appreciate the importance o% data structureYit takes programming e)perience to see that algorithms are nothing
5ithout data structures. The algorithms presented so %ar 5ere care%ully chosen to re0uire only the simplest o% data
structures& static arrays. The geometric algorithms o% 2art F$+ on the other hand+ and lots o% other use%ul
algorithms+ depend on sophisticated data structures %or their e%%iciency.
The key insight in understanding data structures is the recognition that an algorithm in e)ecution is+ at all times+
in some state+ chosen %rom a potentially huge state space. The state records such vital in%ormation as 5hat steps
have already been taken 5ith 5hat results+ and 5hat remains to be done. (ata structures are the bookkeepers that
record all this state in%ormation in a tidy manner so that any part can be accessed and updated e%%iciently. The
remarkable %act is that there are a relatively small number o% standard data structures that turn out to be use%ul in
the most varied types o% algorithms and problems+ and constitute essential kno5ledge %or any programmer.
The literature on data structures. :hereas one can present some algorithms 5ithout emphasi6ing data
structures+ as 5e did in 2art $$$+ it appears pointless to discuss data structures 5ithout some o% the typical
algorithms that use themL at the very least+ access and update algorithms %orm a necessary part o% any data
structure. Accordingly+ a ne5 data structure is typically published in the conte)t o% a particular ne5 algorithm. Cnly
later+ as one notices its general applicability+ it may %ind its 5ay into te)tbooks. The data structures that have
become standard today can be %ound in many books+ such as NAA* D3O+ NCL8 90O+ N,B 91O+ NA. D2O+ N-nu 73aO+
N-nu 73bO+ N'eh D<aO+ N'eh D<cO+ N8!( 77O+ N.am 90aO+ N.am 90bO+ NTar D3O+ and N:ir D>O.
%<& What is a data structure3
data structures %or manual use He.g. edge#notched cardsI
general#purpose data structures
abstract data types speci%y %unctional properties only
data structures include access and maintenance algorithms and their implementation
per%ormance criteria and measures
asymptotics
Data structures old and ne*
The discipline o% data structures+ as a systematic body o% kno5ledge+ is truly a creation o% computer science. The
0uestion o% ho5 best to organi6e data 5as a lot simpler to ans5er in the days be%ore the e)istence o% computers& the
organi6ation had to be simple+ because there 5as no automatic device that could have processed an elaborate data
structure+ and there is no human being 5ith enough patience to do it. Consider t5o e)amples.
1. 'anual %iles and catalogs+ as used in business o%%ices and libraries+ e)hibit several distinct organi6ing
principles+ such as se0uential and hierarchical order and cross#re%erences. /rom today;s point o% vie5+
ho5ever+ manual %iles are not 5ell#de%ined data structures. /or good reasons+ people did not rigorously
de%ine those aspects that 5e consider essential 5hen characteri6ing a data structure& 5hat constraints are
imposed on the data+ both on the structure and its contentL 5hat operations the data structure must
supportL 5hat constraints these operations must satis%y. As a conse0uence+ searching and updating a
manual %ile is not typically a process that can be automated& $t re0uires common sense+ and perhaps even
e)pert training+ as is the case %or a library catalog.
2. $n manual computing H5ith pencil and paper or a nonprogrammable calculatorI the algorithm is the %ocus
o% attention+ not the data structure. 'ost %re0uently+ the person computing 5rites data Hinput+ intermediate
results+ outputI in any convenient place 5ithin his %ield o% vision+ hoping to %ind them again 5hen he needs
them. Cccasionally+ to %acilitate highly repetitive computations Hsuch as income ta) declarationsI+ someone
designs a %orm to prompt the user+ one operation at a time+ to 5rite each data item into a speci%ic %ield. .uch
a %orm speci%ies both an algorithm and a data structure 5ith considerable %ormality. Compared to the
general#purpose data structures 5e study in this chapter+ ho5ever+ such %orms are highly special purpose.
"dge#notched cards are perhaps the most sophisticated data structures ever designed %or manual use. Let us
illustrate them 5ith the e)ample o% a database o% "nglish 5ords organi6ed so as to help in solving cross5ord
pu66les. :e 5rite one 5ord per card and inde) it according to 5hich vo5els it contains and 5hich ones it does not
contain. Across the top ro5 o% the card 5e punch 10 holes labeled A+ "+ $+ C+ *+ kA+ k"+ k$+ kC+ k*. :hen a 5ord+
say ABACA+ e)hibits a given vo5el+ such as A+ 5e cut a notch above the hole %or AL 5hen it does not+ such as "+ 5e
cut a notch above the hole %or k" Hpronounced Enot "EI. ")hibit 1D.1 sho5s the encoding o% the 5ords B"A*T$/*L+
"W"T"8+ C'AAA+ C'",A. /or e)ample+ 5e search %or 5ords that contain at least one "+ but no *+ by sticking
Algorithms and Data Structures 1D0 A ,lobal Te)t
1(. 2hat is a data structure3
t5o needles through the pack o% cards at the holes " and k*. "W"T"8 and C'",A 5ill drop out. $n principle it is
easy to make this sample database more po5er%ul by including additional attributes+ such as EA occurs e)actly
onceE+ EA occurs e)actly t5iceE+ EA occurs as the %irst letter in the 5ordE+ and so on. $n practice+ a %e5 do6en
attributes and thousands o% cards 5ill stretch this mechanical implementation o% a multikey data structure to its
limits o% %easibility.
")hibit 1D.1& "ncoding o% di%%erent 5ords in edge#notched cards.
$n contrast to data structures suitable %or manual processing+ those developed %or automatic data processing can
be comple). Comple)ity is not a goal in itsel%+ o% course+ but it may be an unavoidable conse0uence o% the search %or
e%%iciency. "%%iciency+ as measured by processing time and memory space re0uired+ is the primary concern o% the
discipline o% data structures. Cther criteria+ such as simplicity o% the code+ play a role+ but the %irst 0uestion to be
asked 5hen evaluating a data structure that supports a speci%ied set o% operations is typically& Ao5 much time and
space does it re0uire?
$n contrast to the typical situation o% manual computing Hconsideration o% the algorithm comes %irst+ data gets
organi6ed only as neededI+ programmed computing typically proceeds in the opposite direction& /irst 5e de%ine the
organi6ation o% the data rigorously+ and %rom this the structure o% the algorithm %ollo5s. Thus algorithm design is
o%ten driven by data structure design.
The range of data structures studied
:e present generally use%ul data structures along 5ith the corresponding 0uery+ update+ and maintenance
algorithmsL and 5e develop concepts and techni0ues designed to organi6e a vast body o% kno5ledge into a coherent
5hole. Let us elaborate on both o% these goals.
E,enerally use%ulE re%ers to data structures that occur naturally in many applications. They are relatively simple
%rom the point o% vie5 o% the operations they supportYtables and 0ueues o% various types are typical e)amples.
These basic data structures are the building blocks %rom 5hich an applications programmer may construct more
elaborate structures tailored to her particular application. Although our collection o% speci%ic data structures is
rather small+ it covers the great ma3ority o% techni0ues an applications programmer is likely to need.
:e develop a uni%ied scheme %or understanding many data structures as special cases o% general concepts. This
includes&
1D1
The separation o% abstract data types+ 5hich speci%y only %unctional properties+ %rom data structures+ 5hich
also involve aspects o% implementation
The classi%ication o% all data structures into three ma3or types& implicit data structures+ lists+ and address
computation
A rough assessment o% the per%ormance o% data structures based on the asymptotic analysis o% time and
memory re0uirements
The simplest and most common assumption about the elements to be stored in a data structure is that they
belong to a domain on 5hich a total order ` is de%ined. E*amples: integers ordered by magnitude+ a character set
5ith its alphabetic order+ character strings o% bounded length ordered le)icographically. :e assume that each
element in a domain re0uires as much storage as any other element in that domainL in other 5ords+ that a data
structure manages memory %ragments o% %i)ed si6e. (ata ob3ects o% greatly variable si6e or length+ such as %ragments
o% te)t+ are typically not considered to be EelementsEL instead+ they are broken into constituent pieces o% %i)ed si6e+
each o% 5hich becomes an element o% the data structure.
The elements stored in a data structure are o%ten processed according to the order ` de%ined on their domain.
The topic o% sorting+ 5hich 5e surveyed in R.orting and its comple)ityS+ is closely related to the study o% data
structures& $ndeed+ several sorting algorithms appear E%or %reeE in RList structuresS+ because every structure that
implements the abstract data type dictionar leads to a sorting algorithm by successive insertion o% elements+
%ollo5ed by a traversal.
!erformance criteria and measures
The design o% data structures is dominated by considerations o% e%%iciency+ speci%ically 5ith respect to time and
memory. But e%%iciency is a multi%aceted 0uality not easily de%ined and measured. As a scienti%ic discipline+ the
study o% data structures is not directly concerned 5ith the number o% microseconds+ machine cycles+ or bytes
re0uired by a speci%ic program processing a given set o% data on a particular system. $t is concerned 5ith general
statements %rom 5hich an e)pert practitioner can predict concrete outcomes %or a speci%ic processing task. Thus+
measuring run times and memory usage is not the typical 5ay to evaluate data structures. :e need concepts and
notations %or e)pressing the per%ormance o% an algorithm independently o% machine speed+ memory si6e+
programming language+ and operating system+ and a host o% other details that vary %rom run to run.
The solution to this problem emerged over the past t5o decades as the discipline o% computational comple)ity
5as developed. $n this theory+ algorithms are Ee)ecutedE on some Emathematical machineE+ care%ully designed to be
as simple as possible to re%lect the bare essentials o% a problem. The machine makes available certain primitive
operations+ and 5e measure EtimeE by counting ho5 many o% those are e)ecuted. /or a given algorithm and all the
data sets it accepts as input+ 5e analy6e the number o% primitive operations e)ecuted as a %unction o% the si6e o% the
data. :e are o%ten interested in the worst case+ that is+ a data set o% given si6e that causes the algorithm to run as
long as possible+ and the average case+ the run time averaged over all data sets o% a given si6e.
Among the many di%%erent mathematical machines that have been de%ined in the theory o% computation+ data
structures are evaluated almost e)clusively 5ith respect to a theoretical random access machine H8A'I. A 8A' is
essentially a memory 5ith as many locations as needed+ each o% 5hich can hold a data element+ such as an integer+
or a real numberL and a processing unit that can read %rom any one or t5o locations+ operate on their content+ and
5rite the result back into a third location+ all in one time unit. This model is rather close to actual se0uential
1(. 2hat is a data structure3
computers+ e)cept that it incorporates no bounds on the memory si6eYeither in terms o% the number o% locations or
the si6e o% the content o% this location. $t implies+ %or e)ample+ that a multiplication o% t5o very large numbers
re0uires no more time than 2 K 3 does. This assumption is unrealistic %or certain problems+ but is an e)cellent one
%or most program runs that %it in central memory and do not re0uire variable#precision arithmetic or variable#
length data elements. The point is that the programmer has to understand the model and its assumptions+ and
bears responsibility %or applying it 3udiciously.
$n this model+ time and memory re0uirements are e)pressed as %unctions o% input data si6e+ and thus comparing
the per%ormance o% t5o data structures is reduced to comparing %unctions. Asmptotics has proven to be 3ust the
right tool %or this comparison& sharp enough to distinguish di%%erent gro5th rates+ blunt enough to ignore constant
%actors that di%%er %rom machine to machine.
As an e)ample o% the concise descriptions made possible by asymptotic operation counts+ the %ollo5ing table
evaluates several implementations %or the abstract data type ;dictionary;. The %our operations ;%ind;+ ;insert;+ ;delete;+
and ;ne)t; H5ith respect to the order Ì e)hibit di%%erent asymptotic time re0uirements %or the di%%erent
implementations. The student should be able to e)plain and derive this table a%ter studying this part o% the book.
4rdered array 8inear list ;alanced tree Eash tale
find 4*log n+ 4*n+ 4*log n+ 4*"+
a
next 4*"+ 4*"+ 4*log n+ 4*n+
insert 4*n+ 4*n+ 4*log n+ 4*"+
a

delete 4*n+ 4*n+ 4*log n+ 4*"+

a
Cn the average+ but not necessarily in the 5orst case
b
(eletions are possible but may degrade per%ormance
")ercise
1. (escribe the manual data structures that have been developed to organi6e libraries He.g. catalogs that allo5
users to get access to the literature in their %ield o% interest+ or circulation records+ 5hich keep track o% 5ho
has borro5ed 5hat bookI. ,ive e)amples o% 0ueries that can be ans5ered by these data structures.
1D3
%>& Abstract data types
data abstraction
abstract data types as a tool to describe the %unctional behavior o% data structures
e)amples o% abstract data types& stack+ %i%o 0ueue+ priority 0ueue+ dictionary+ string
oncepts# What and *hy3
A data structure organi6es the data to be processed in such a 5ay that the relations among the data elements are
re%lected and the operations to be per%ormed on the data are supported. =ow these goals can be achieved e%%iciently
is the central issue in data structures and a ma3or concern o% this book. $n this chapter+ ho5ever+ 5e ask not ho5
but what. $n particular+ 5e ask& 5hat is the e)act %unctional behavior a data structure must e)hibit to be called a
stack+ a 0ueue+ or a dictionary or table?
There are several reasons %or seeking a %ormal %unctional speci%ication %or common data structures. The primary
motivation is increased generality through abstractionL speci%ically+ to separate inputQoutput behavior %rom
implementation+ so that the implementation can be changed 5ithout a%%ecting any program that uses a particular
data type. This goal led to the earlier introduction o% the concept o% tpe in programming languages& the type real is
implemented di%%erently on di%%erent machines+ but usually a program using reals does not re0uire modi%ication
5hen run on another machine. A secondary motivation is the ability to prove general theorems about all data
structures that e)hibit certain properties+ thus avoiding the need to veri%y the theorem in each instance. This goal is
akin to the one that sparked the development o% algebra& %rom the a)ioms that de%ine a %ield+ 5e prove theorems
that hold e0ually true %or real or comple) numbers as 5ell as 0uaternions.
The primary motivation can be %urther e)plained by calling on an analogy bet5een data and programs. All
programming languages support the concept o% procedural abstraction& operations or algorithms are isolated in
procedures+ thus making it easy to replace or change them 5ithout a%%ecting other parts o% the program. Cther
program parts do not kno5 ho5 a certain operation is reali6edL they kno5 only ho5 to call the corresponding
procedure and 5hat e%%ect the procedure call 5ill have. 'odern programming languages increasingly support the
analogous concept o% data abstraction or data encapsulation& the organi6ation o% data is encapsulated He.g. in a
module or a packageI so that it is possible to change the data structure 5ithout having to change the 5hole
program.
The secondary motivation %or %ormal speci%ication o% data types remains an unreali6ed goal& although abstract
data types are an active topic %or theoretical research+ it is di%%icult today to make the case that any theorem o% use
to programmers has been proved.
An abstract data tpe consists o% a domain %rom 5hich the data elements are dra5n+ and a set o% operations.
The speci%ication o% an abstract data type must identi%y the domain and de%ine each o% the operations. $denti%ying
and describing the domain is generally straight%or5ard. The de%inition o% each operation consists o% a syntactic and
a semantic part. The sntactic part+ 5hich corresponds to a procedure heading+ speci%ies the operation;s name and
Algorithms and Data Structures 1D< A ,lobal Te)t
1,. Abstract data t%pes
the type o% each operand. :e present the synta) o% operations in mathematical %unction notation+ speci%ying its
domain and range. The semantic part attaches a meaning to each operation& 5hat values it produces or 5hat e%%ect
it has on its environment. :e speci%y the semantics o% abstract data types algebraically by a)ioms %rom 5hich other
properties may be deduced. This %ormal approach has the advantage that the operations are de%ined rigorously %or
any domain 5ith the re0uired properties. A %ormal description+ ho5ever+ does not al5ays appeal to intuition+ and
o%ten %orces us to speci%y details that 5e might pre%er to ignore. :hen every detail matters+ on the other hand+ a
%ormal speci%ication is superior to a precise speci%ication in natural languageL the latter tends to become
cumbersome and di%%icult to understand+ as it o%ten takes many 5ords to avoid ambiguity.
$n this chapter 5e consider the abstract data types& stack+ %irst#in#%irst#out 0ueue+ priority 0ueue+ and dictionary.
/or each o% these data types+ there is an ideal+ unbounded version+ and several versions that re%lect the realities o%
%inite machines. /rom a theoretical point o% vie5 5e only need the ideal data types+ but %rom a practical point o%
vie5+ that doesn;t tell the 5hole story& in order to capture the di%%erent properties a programmer intuitively
associates 5ith the vague concept EstackE+ %or e)ample+ 5e are %orced into speci%ying di%%erent types o% stacks. $n
addition to the ideal unbounded stack+ 5e speci%y a fi*ed-length stack 5hich mirrors the behavior o% an array
implementation+ and a variable-length stack 5hich mirrors the behavior o% a list implementation. .imilar
distinctions apply to the other data types+ but 5e only speci%y their unbounded versions.
Let W denote the domain %rom 5hich the data elements are dra5n. .tacks and %i%o 0ueues make no assumptions
about WL priority 0ueues and dictionaries re0uire that a total order ` be de%ined on W. Let W
denote the set o% all

%inite se0uences over W.
Stac(
A stack is also called a last-in-first-out ,ueue+ or lifo ,ueue. A brie% in%ormal description o% the abstract data type
stack Hmore speci%ically+ unbounded stack+ in contrast to the versions introduced laterI might merely state that the
%ollo5ing operations are de%ined on it&
M create <reate a new, empty stack.
M empty 5eturn true if the stack is empty.
M push !nsert a new element.
M top 5eturn the element most recently inserted, if the stack is not
empty.
M pop 5emove the element most recently inserted, if the stack is not
empty.
")hibit 19.1 helps to clari%y the meaning o% these 5ords.
")hibit 19.1& "lements are inserted at and removed %rom the top o% the stack.
A de%inition that uses conventional mathematical notation to capture the intention o% the description above
might de%ine the operations by e)plicitly sho5ing their e%%ect on the contents o% a stack. Let . V W
be the set o%
1D@
possible states o% a stack+ let s V )1

)2

[ )k . be an arbitrary stack state 5ith k elements+ and let denote the empty
state o% the stack+ corresponding to the null string W
g
. Let ;cat; denote string concatenation. (e%ine the %unctions
create- ] S
empty- S ] =true, false>
push- S B ] S
top- S 0 =^> ] B
pop- S 0 =^> ] S
as %ollo5s&
s S,x, y B-
create . ^
empty*^+ . true
s T ^ empty*s+ . false
push*s, y+ . s cat y . x
"
x
$
C x
k
y
s T ^ top*s+ . x
k
s T pop*s+ . x
"
x
$
C x
k0"
This de%inition re%ers e)plicitly to the contents o% the stack. $% 5e pre%er to hide the contents and re%er only to
operations and their results+ 5e are led to another style o% %ormal de%inition o% abstract data types that e)presses the
semantics o% the operations by relating them to each other rather than to the e)plicitly listed contents o% a data
structure. This is the commonly used approach to de%ine abstract data types+ and 5e %ollo5 it %or the rest o% this
chapter.
Let . be a set and s0

. a distinguished state. s0 denotes the empty stack+ and . is the set o% stack states that can
be obtained %rom the empty stack by per%orming %inite se0uences o% ;push; and ;pop; operations. The %ollo5ing
%unctions represent stack operations&
create- ] S
empty- S ] =true, false>
push- S B ] S
top- S 0 =s
0
> ] B
pop- S 0 =s
0
> ] S
The semantics of the stack operations is specified y the following
axioms-
s S, x B-
*"+ create . s
0
*$+ empty*s
0
+ . true
*%+ empty*push*s, x++ . false
*#+ top*push*s, x++ . x
*(+ pop*push*s, x++ . s
These axioms can e descried in natural language as follows-
*"+ ,create, produces a stack in the distinguished state.
*$+ The distinguished state is empty.
*%+ O stack is not empty after an element has een inserted.
*#+ The element most recently inserted is on top of the stack.
*(+ ,pop, is the inverse of ,push,.
!otice that ;create; plays a di%%erent role %rom the other stack operations& it is merely a mechanism %or causing a
stack to come into e)istence+ and could have been omitted by postulating the e)istence o% a stack in st ate s0. $n any
implementation+ ho5ever+ there is al5ays some code that corresponds to ;create;. (echnical note: 5e could identi%y
Algorithms and Data Structures 1D> A ,lobal Te)t
;create; 5ith s0+ but 5e choose to make a distinction bet5een the act o% creating a ne5 empty stack and the empty
state that results %rom this creationL the latter may recur during normal operation o% the stack.
8educed se0uences
Any s . is obtained %rom the empty stack s0 by per%orming a %inite se0uence o% ;push; and ;pop; operations. By
a)iom H@I this se0uence can be reduced to a se0uence that trans%orms s0 into s and consists o% ;push; operations
only.
")ample
s . pop*push*pop*push*push*s
0
, x+, y++, L++
. pop*push*push*s
0
, x+, L++
. push*s
0
, x+
An implementation o% a stack may provide the %ollo5ing procedures&
procedure create*var s- stack+&
function empty*s- stack+- oolean&
procedure push*var s- stack& x- elt+&
function top*s- stack+- elt&
procedure pop*var s- stack+&
Any program that uses this data type is restricted to calling these %ive procedures %or creating and
operating on stacksL it is not allo5ed to use in%ormation about the underlying implementation. The
procedures may only be called 5ithin the constraints o% the speci%icationL %or e)ample+ ;top; and
;pop; may be called only i% the stack is not empty&
if not empty*s+ then pop*s+&
The speci%ication above assumes that a stack can gro5 5ithout a boundL it de%ines an abstract data type called
unbounded stack' Ao5ever+ any implementation imposes some bound on the si6e HdepthI o% a stack& the si6e o% the
underlying array in an array impled re%lect such limitations. The %ollo5ing fi*ed-length stack describes an
implementation as an array o% %i)ed si6e m+ 5hich limits the ma)imal stack depth.
/i)ed#length stack
create- S
empty- S =true, false>
full- S =true, false>
push- =s S- not full*s+> B S
top- S 0 =s
0
> B
pop- S 0 =s
0
> S
To speci%y the behavior o% the %unction ;%ull; 5e need an internal %unction
depth& . a0+ 1+ 2+ [ + mb
that measures the stack depth+ that is+ the number o% elements currently in the stack. The %unction ;depth; interacts
5ith the other %unctions in the %ollo5ing a)ioms+ 5hich speci%y the stack semantics&
s S, x B-
create . s
0
empty*s+ . true
not full*s+ empty*push*s, x++ . false
depth*s
0
+ . 0
1D7
not empty*s+ depth*pop*s++ . depth*s+ 0 "
not full*s+ depth*push*s, x++ . depth*s+ 9 "
full*s+ . *depth*s+ . m+
not full*s+
top*push*s, x++ . x
pop*push*s, x++ . s
Fariable#length stack
A stack implemented as a list may over%lo5 at unpredictable moments depending on the contents o% the entire
memory+ not 3ust o% the stack. :e speci%y this behavior by postulating a %unction ;space#available;. $t has no domain
and thus acts as an oracle that chooses its value independently o% the state o% the stack Hi% 5e gave ;space#available; a
domain+ this 5ould have to be the set o% states o% the entire memoryI.
create- S
spaceMavailale- =true, false>
push- S B S
top- S 0 =s
0
> B
pop- S 0 =s
0
> S
s S, x B-
create . s
0
empty*s
0
+ . true
spaceMavailale
empty*push*s, x++ . false
top*push*s, x++ . x
pop*push*s, x++ . s
$mplementation
:e have seen that abstract data types cannot capture our intuitive+ vague concept o% a stack in one single model.
The rigor en%orced by the %ormal de%inition makes us a5are that there are di%%erent types o% stacks 5ith di%%erent
behavior H0uite apart %rom the issue o% the domain type W+ 5hich speci%ies 5hat type o% elements are to be storedI.
This clarity is an advantage 5henever 5e attempt to process abstract data types automaticallyL it may be a
disadvantage %or human communication+ because a rigorous de%inition may %orce us to HoverIspeci%y details.
The di%%erent types o% stacks that 5e have introduced are directly related to di%%erent styles o% implementation.
The %i)ed#length stack+ %or e)ample+ describes the %ollo5ing implementation&
const m . C & { maximum length of a stack }
type elt . C &
stack .record
a- arrayF" .. mG of elt&
d- 0 .. m& { current depth of stack }
end&
procedure create*var s- stack+&
egin s.d -. 0 end&
function empty*s- stack+- oolean&
egin return*s.d . 0+ end&
function full*s- stack+- oolean&
egin return*s.d . m+ end&
procedure push*var s- stack& x- elt+& { not to be called if the stack
is full }
egin s.d -. s.d 9 "& s.aFs.dG -. x end&
Algorithms and Data Structures 1DD A ,lobal Te)t
function top*s- stack+- elt& { not to be called if the stack is
empty }
egin return*s.aFs.dG+ end&
procedure pop*var s- stack+& { not to be called if the stack is
empty }
egin s.d -. s.d 0 " end&
.ince the %unction ;depth; is not e)ported Hi.e. not made available to the user o% this data typeI+ it need not be
provided as a procedure. $nstead+ 5e have implemented it as a variable d 5hich also serves as a stack pointer.
Cur implementation assumes that the user checks that the stack is not %ull be%ore calling ;push;+ and that it is not
empty be%ore calling ;top; or ;pop;. :e could+ o% course+ 5rite the procedures ;push;+ ;top;+ and ;pop; so as to Eprotect
themselvesE against illegal calls on a %ull or an empty stack simply by returning an error message to the calling
program. This re0uires adding a %urther argument to each o% these three procedures and leads to yet other types o%
stacks 5hich are %ormally di%%erent abstract data types %rom the ones 5e have discussed.
8irst)in)first)out 1ueue
The %ollo5ing operations H")hibit 19.2I are de%ined %or the abstract data type fifo ,ueue H%irst#in#%irst#out
0ueueI&
empty 5eturn true if the Aueue is empty.
enAueue !nsert a new element at the tail end of the Aueue.
front 5eturn the front element of the Aueue.
deAueue 5emove the front element.
")hibit 19.2& "lements are inserted at the tail and removed %rom the head o% the %i%o 0ueue.
Let / be the set o% 0ueue states that can be obtained %rom the empty 0ueue by per%orming %inite se0uences o%
;en0ueue; and ;de0ueue; operations. %
0
/ denotes the empty 0ueue. The %ollo5ing %unctions represent %i%o 0ueue
operations&
create- 3
empty- 3 =true, false>
enAueue- 3 B 3
front- 3 0 =f
0
> B
deAueue- 3 0 =f
0
> 3
The semantics of the fifo Aueue operations is specified y the
following axioms-
f 3,x B-
*"+ create . f
0
*$+ empty*f
0
+ . true
*%+ empty*enAueue*f, x++ . false
*#+ front*enAueue*f
0
, x++ . x
*(+ not empty*f+ front*enAueue*f, x++ . front*f+
*'+ deAueue*enAueue*f
0
, x++ . f
0
*)+ not empty*f+ deAueue*enAueue*f, x++ . enAueue*deAueue*f+, x+
1D9
Any % / is obtained %rom the empty %i%o 0ueue %
0
by per%orming a %inite se0uence o% ;en0ueue; and ;de0ueue;
operations. By a)ioms H>I and H7I this se0uence can be reduced to a se0uence consisting o% ;en0ueue; operations
only 5hich also trans%orms %
0
into %.
")ample
f . deAueue*enAueue*deAueue*enAueue*enAueue*f
0
, x+, y++, L++
. deAueue*enAueue*enAueue*deAueue*enAueue*f
0
, x++, y+, L++
. deAueue*enAueue*enAueue*f
0
, y+, L++
. enAueue*deAueue*enAueue*f
0
, y++, L+
. enAueue*f
0
, L+
An implementation o% a %i%o 0ueue may provide the %ollo5ing procedures&
procedure create*var f- fifoAueue+&
function empty*f- fifoAueue+- oolean&
procedure enAueue*var f- fifoAueue& x- elt+&
function front*f- fifoAueue+- elt&
procedure deAueue*var f- fifoAueue+&
!riority 1ueue
A priority 0ueue orders the elements according to their value rather than their arrival time. Thus 5e assume that
a total order ` is de%ined on the domain W. $n the %ollo5ing e)amples+ W is the set o% integersL a small integer means
high priority. The %ollo5ing operations H")hibit 19.3I are de%ined %or the abstract data type priorit ,ueue&
M empty 5eturn true if the Aueue is empty.
M insert !nsert a new element into the Aueue.
M min 5eturn the element of highest priority contained in the Aueue.
M delete 5emove the element of highest priority from the Aueue.
")hibit 19.3& An element;s priority determines its position in a priority 0ueue.
Let 2 be the set o% priority 0ueue states that can be obtained %rom the empty 0ueue by per%orming %inite
se0uences o% ;insert; and ;delete; operations. The empty priority 0ueue is denoted by p
0
2. The %ollo5ing %unctions
represent priority 0ueue operations&
create- 6
empty- 6 =true, false>
insert- 6 B 6
min- 6 0 =p
0
> B
delete- 6 0 =p
0
> 6
The semantics o% the priority 0ueue operations is speci%ied by the %ollo5ing a)ioms. /or )+ y W+ the %unction
'$!H)+ yI returns the smaller o% the t5o values.
p 6,x B-
*"+ create . p
0
*$+ empty*p
0
+ . true
*%+ empty*insert*p, x++ . false
*#+ min*insert*p
0
, x++ . x
*(+ not empty*p+ min*insert*p, x++ . M!D*x, min*p++
*'+ delete*insert*p
0
, x++ . p
0
*)+ not empty*p+
delete *insert*p,x++.=
pifxminp
insertdeletep,xelse
Any p 2 is obtained %rom the empty 0ueue p0 by a %inite se0uence o% ;insert; and ;delete; operations. By a)ioms
H>I and H7I any such se0uence can be reduced to a shorter one that also trans%orms p0 into p and consists o% ;insert;
operations only.
")ample
Ossume that x S L, y S L.
p . delete*insert*delete*insert*insert*p
0
, x+, L++, y++
. delete*insert*insert*delete*insert*p
0
, x++, L+, y++
. delete*insert*insert*p
0
, L+, y++
. insert*p
0
, L+
An implementation o% a priority 0ueue may provide the %ollo5ing procedures&
procedure create*var p- priorityAueue+&
function empty*p- priorityAueue+- oolean&
procedure insert*var p- priorityAueue& x- elt+&
function min*p- priorityAueue+- elt&
procedure delete*var p- priorityAueue+&
Dictionary
:hereas stacks and %i%o 0ueues are designed to retrieve and process elements depending on their order o%
arrival+ a dictionary Hor tableI is designed to process elements e)clusively by their value HnameI. A priority 0ueue is
a hybrid& insertion is done according to value+ as in a dictionary+ and deletion according to position+ as in a %i%o
0ueue.
The simplest type of dictionary supports the following operations-
M memer 5eturn true if a given element is contained in the
dictionary.
M insert !nsert a new element into the dictionary.
M delete 5emove a given element from the dictionary.
Let ( be the set o% dictionary states that can be obtained %rom the empty dictionary by per%orming %inite
se0uences o% ;insert; and ;delete; operations. d0

( denotes the empty dictionary. Then the operations can be
represented by %unctions as %ollo5s&
create- @
insert- @ B @
memer- @ B =true, false>
delete- @ B @
The semantics of the dictionary operations is specified y the
following axioms-
191
d @,x, y B-
*"+ create . d
0
*$+ memer*d
0
, x+ . false
*%+ memer*insert*d, x+, x+ . true
*#+ x T y memer*insert*d, y+, x+ . memer*d, x+
*(+ delete*d
0
, x+ . d
0
*'+ delete*insert*d, x+, x+ . delete*d, x+
*)+ x T y delete*insert*d, x+, y+ . insert*delete*d, y+, x+
Any d ( is obtained %rom the empty dictionary d0 by a %inite se0uence o% ;insert; and ;delete; operations. By
a)ioms H>I and H7I any such se0uence can be reduced to a shorter one that also trans%orms d0 into d and consists o%
;insert; operations only.
")ample
d . delete*insert*insert*insert*d
0
, x+, y+, L+, y+
. insert*delete*insert*insert*d
0
, x+, y+, y+, L+
. insert*delete*insert*d
0
, x+, y+, L+
. insert*insert*delete*d
0
, y+, x+, L+
. insert*insert*d
0
, x+, L+
This speci%ication allo5s duplicates to be inserted. Ao5ever+ a)iom H>I guarantees that all duplicates are
removed i% a delete operation is per%ormed. To prevent duplicates+ the %ollo5ing a)iom is added to the speci%ication
above&
*2+ memer*d, x+ insert*d, x+ . d
!n this case axiom *'+ can e weakened to
*',+ not memer*d, x+ delete*insert*d, x+, x+ . d
An implementation o% a dictionary may provide the %ollo5ing procedures&
procedure create*var d- dictionary+&
function memer*d- dictionary& x- elt+- oolean&
procedure insert*var d- dictionary& x- elt+&
procedure delete*var d- dictionary& x- elt+&
$n actual programming practice+ a dictionary usually supports the additional operations ;%ind;+ ;predecessor;+ and
;successor;. ;%ind; is similar to ;member; but in addition to a trueQ%alse ans5er+ provides a pointer to the element
%ound. Both ;predecessor; and ;successor; take a pointer to an element e as an argument+ and return a pointer to the
element in the dictionary that immediately precedes or %ollo5s e+ according to the order `. 8epeated call o%
;successor; thus processes the dictionary in se0uential order.
")ercise& e)tending the abstract data type ;dictionary;
:e have de%ined a dictionary as supporting the three operations ;member;+ ;insert; and ;delete;. But a dictionary+
or table+ usually supports additional operations based on a total ordering ` de%ined on its domain W. Let us add t5o
operations that take an argument ) W and deliver its t5o neighboring elements in the table&
succ*x+5eturn the successor of x in the tale.
pred*x+5eturn the predecessor of x in the tale.
The successor o% ) is de%ined as the smallest o% all the elements in the table 5hich are larger than )+ or as ]_ i%
none e)ists. The predecessor is de%ined symmetrically& the largest o% all the elements in the table that are smaller
than )+ or U_. 2resent a %ormal speci%ication to describe the behavior o% the table.
.olution
Let T be the set o% states o% the table+ and t0 a special state that denotes the empty table. The %unctions and
a)ioms are as %ollo5s&
memer- T B =true,false>
insert- T B T
delete- T B T
succ- T B B =9R>
pred- T B B =0R>
t T,x, y B-
memer*t
0
, x+ . false
memer*insert*t, x+, x+ . true
x T y memer*insert*t, y+, x+ . memer*t, x+
delete*t
0
, x+ . t
0

delete*insert*t, x+, x+ . delete*t, x+
x T y delete*insert*t, x+, y+ . insert*delete*t, y+, x+
0R S x S 9R
pred*t, x+ S x S succ*t, x+
succ*t, x+ T 9R memer*t, succ*t, x++ . true
pred*t, x+ T 0R memer*t, pred*t, x++ . true
x S y, memer*t, y+, y T succ*t, x+ succ*t, x+ S y
x K y, memer*t, y+, y T pred*t, x+ y S pred*t, x+
")ercise& the abstract data type ;string;
:e de%ine the %ollo5ing operations %or the abstract data type string&
M empty 5eturn true if the string is empty.
M append Oppend a new element to the tail of the string.
M head 5eturn the head element of the string.
M tail 5emove the head element of the given string.
M length 5eturn the length of the string.
M find 5eturn the index of the first occurrence of a value within the
string.
Let W V aa+ b+ [ + 6b+ and . be the set o% string states that can be obtained %rom the empty string by per%orming a
%inite number o% ;append; and ;tail; operations. s
0

. denotes the empty string. The operations can be represented
by %unctions as %ollo5s&
append- S B S
head- S 0 =s
0
> B
tail- S 0 =s
0
> S
length- S =0, ", $, C >
find- S B =0, ", $, C >
")amples&
empty*,ac,+ . false& append*,ac,, ,d,+ . ,acd,& head*,acd,+ .
,a,&
193
tail*,acd,+ . ,cd,& length*,acd,+ . #& find*,acd,, ,,+ . $.
HaI ,ive the a)ioms that speci%y the semantics o% the abstract data type ;string;.
HbI The %unction hchop& . W . returns the substring o% a string s beginning 5ith the %irst occurrence o% a
given value. .imilarly+ tchop& . W . returns the substring o% s beginning 5ith headHsI and ending 5ith
the last occurrence o% a given value. .peci%y the behavior o% these operations by additional a)ioms.
E*amples:
hchop*,acdac,,,c,+.,cdac,
tchop*,acdac,, ,,+ . ,acda,
HcI The %unction cat& . . . returns the concatenation o% t5o se0uences. .peci%y the behavior o% ;cat; by
additional a)ioms. E*ample:
cat*,acd,, ,efg,+ . ,acdefg,
HdI The %unction reverse& . . returns the given se0uence in reverse order. .peci%y the behavior o% reverse by
additional a)ioms. E*ample:
reverse*,acd,+ . ,dca,
.olution
HaI A)ioms %or the si) ;string; operations&
s S, x, y B-
empty*s
0
+ . true
empty*append*s, x++ . false
head*append*s
0
, x++ . x
not empty*s+ head*s+ . head*append*s, x++
tail*append*s
0
, x++ . s
0
not empty*s+ tail*append*s, x++ . append*tail*s+, x+
length*s
0
+ . 0
length*append*s, x++ . length*s+ 9 "
find*s
0
, x+ . 0
x T y, find*s, x+ . 0 find*append*s, y+, x+ . 0
find*s, x+ . 0 find*append*s, x+, x+ . length*s+ 9 "
find*s, x+ . d K 0 find*append*s, y+, x+ . d
HbI A)ioms %or ;hchop; and ;tchop;&
s S, x, y B-
hchop*s
0
, x+ . s
0
not empty*s+, head*s+ . x hchop*s, x+ . s
not empty*s+, head*s+ T x hchop*s, x+ . hchop*tail*s+, x+
tchop*s
0
, x+ . s
0
tchop*append*s, x+, x+ . append*s, x+
x T y tchop*append*s, y+, x+ . tchop*s, x+
HcI A)ioms %or ;cat;&
s, s, S-
cat*s, s
0
+ . s
not empty*s,+ cat*s, s,+ . cat*append*s, head*s,++, tail*s,++
HdI A)ioms %or ;reverse;&
s S-
reverse*s
0
+ . s
0
s T s
0
reverse*s+ . append*reverse*tail*s++, head*s++
")ercises
1. $mplement t5o stacks i on array aN1 .. mO in such a 5ay that neither stack over%lo5s unless the total
number o% elements in both stacks together is m. The operations ;push;+ ;top;+ and ;pop; should run in CH1I
time.
2. A double#ended 0ueue Hde0ueI can gro5 and shrink at both ends+ le%t and right+ using the procedures
;en0ueue#le%t;+ ;de0ueue#le%t;+ ;en0ueue#right;+ and ;de0ueue#right;. 2resent a %ormal speci%ication to
describe the behavior o% the abstract data type de0ue.
3. ")tend the abstract data type priority 0ueue by the operation ne)tH)I+ 5hich returns the element in the
priority 0ueue having the ne)t lo5er priority than ).
19@
-?& "mplicit data structures
implicit data structures describe relationships among data elements implicitly by %ormulas and declarations
array storage
band matrices
sparse matrices
Bu%%ers eliminate temporary speed di%%erences among interacting producer and consumer processes.
%i%o 0ueue implemented as a circular bu%%er
priority 0ueue implemented as a heap
heapsort
What is an implicit data structure3
An important aspect o% the art o% data structure design is the e%%icient representation o% the structural
relationships among the data elements to be stored. (ata is usually modeled as a graph+ 5ith nodes corresponding
to data elements and links Hdirected arcs+ or bidirectional edgesI corresponding to relationships. 8elationships
o%ten serve a double purpose. 2rimarily+ they de%ine the semantics o% the data and thus allo5 programs to interpret
the data correctly. This aspect o% relationships is highlighted in the database %ield& %or e)ample+ in the entity#
relationship model. .econdarily+ relationships provide a means o% accessing data+ by starting at some element and
%ollo5ing an access path that leads to other elements o% interest. $n studying data structures 5e are mainly
concerned 5ith the use o% relationships %or access to data.
:hen the structure o% the data is irregular+ or 5hen the structure is highly dynamic He)tensively modi%ied at run
timeI+ there is no practical alternative to representing the relationships e)plicitly. This is the domain o% list
structures+ presented in the chapter on RList structuresS. :hen the structure o% the data is static and obeys a regular
pattern+ on the other hand+ there are alternatives that compress the structural in%ormation. :e can o%ten replace
many e)plicit links by a %e5 %ormulas that tell us 5here to %ind the EneighboringE elements. :hen this approach
5orks+ it saves memory space and o%ten leads to %aster programs.
:e use the term implicit to denote data structures in 5hich the relationships among data elements are given
implicitly by %ormulas and declarations in the programL no additional space is needed %or these relationships in the
data storage. The best kno5n e)ample is the array. $% one looks at the area in 5hich an array is stored+ it is
impossible to derive+ %rom its contents+ any relationships among the elements 5ithout the in%ormation that the
elements belong to an array o% a given type.
(ata structures al5ays go hand in hand 5ith the corresponding procedures %or accessing and operating on the
data. This is particularly true %or implicit data structures& They simply do not e)ist independent o% their accessing
procedures. .eparated %rom its code+ an implicit data structure represents at best an unordered set o% data. :ith the
right code+ it e)hibits a rich structure+ as is beauti%ully illustrated by the heap at the end o% this chapter.
2.. 0mplicit data structures
Array storage
A t5o#dimensional array declared as
var O- arrayF" .. m, " .. nG of elt&
is usually 5ritten in a rectangular shape&
OF", "G OF", $G C OF", nG
OF$, "G OF$, $G C OF$, nG
C C C C
OFm, "G OFm, $G C OFm, nG
But it is stored in a linearly addressed memory+ typically ro5 by ro5 Has sho5n belo5I or column by column Has
in /ortranI in consecutive storage cells+ starting at base address b. $% an element %its into one cell+ 5e have
address
OF", "G
OF", $G 9 "
C C
OF", nG 9 n 0 "
OF$, "G 9 n
OF$, $G 9 n 9 "
C C
OF$, nG 9 $ 1 n 0 "
C C
OFm, nG 9 m 1 n 0 "
$% an element o% type ;elt; occupies c storage cells+ the address tHi+ 3I o% ANi+ 3O is
This linear %ormula generali6es to k#dimensional arrays declared as
var O- arrayF" .. m
"
, " .. m
$
, C , " .. m
k
G of elt&
The address tHi1+ i2+ [ + ikI o% element ANi
1
+ i2+ [ + ikO is
197
The point is that access to an element ANi+ 3+ [O invokes evaluation o% a HlinearI %ormula tHi+ 3+ [I that tells us
5here to %ind this element. A high#level programming language hides most o% the details o% address computation+
e)cept 5hen 5e 5ish to take advantage o% any special structure our matrices may have. The %ollo5ing types o%
sparse matrices occur %re0uently in numerical linear algebra.
/and matrices. An n n matri) ' is called a band matri* of width 3 C b D ; Hb V 0+ 1+ [I i% 'i+3

V 0 %or all i and
3 5ith \i U 3\ d b. $n other 5ords+ all non6ero elements are located on the main diagonal and in b ad3acent minor
diagonals on both sides o% the main diagonal. $% n is large and b is small+ much space is saved by storing ' in a t5o#
dimensional array A 5ith n K H2 K b ] 1I cells rather than in an array 5ith n
2
cells&
type andm . arrayF" .. n, 0 .. G of elt&
var O- andm&
"ach ro5 ANi+ KO stores the non6ero elements o% the corresponding ro5 o% '+ namely the diagonal element 'i+i+
the b elements to the le%t o% the diagonal
M
i,i0
, M
i,i09"
, C , M
i,i0"
and the b elements to the right o% the diagonal
M
i,i9"
, M
i,i9$
, C , M
i,i9
.
The %irst and the last b ro5s o% A contain empty cells corresponding to the triangles that stick out %rom ' in
")hibit 20.1. The elements o% ' are stored in array A such that ANi+ 3O contains 'i+i]3 H1 ` i ` n+ Ub ` 3 ` bI. A total o%
b K Hb ] 1I cells in the upper le%t and lo5er right o% A remain unused. $t is not 5orth saving an additional b K Hb ] 1I
cells by packing the band matri) ' into an array o% minimal si6e+ as the mapping becomes irregular and the
%ormula %or calculating the indices o% 'i+3 becomes much more complicated.
")hibit 20.1& ")tending the diagonals 5ith dummy elements gives the
band matri) the shape o% a rectangular array.
")ercise& band matrices
HaI :rite a procedure addHp+ 0& bandmL var r& bandmIL
5hich adds t5o band matrices stored in p and 0 and stores the result in r.
HbI :rite a procedure bmvHp& bandmL v& [ L var 5& [ IL
5hich multiplies a band matri) stored in p 5ith a vector v o% length n and stores the result in 5.
.olution
*a+ procedure add*p, A- andm& var r- andm+&
var i- " .. n& /- 0 .. &
egin
for i -. " to n do
for / -. 0 to do
rFi, /G -. pFi, /G 9 AFi, /G
end&
*+ type vector . arrayF" .. nG of real&
procedure mv*p- andm& v- vector& var w- vector+&
var i- " .. n& /- 0 .. &
egin
for i -. " to n do egin
wFiG -. 0.0&
for / -. 0 to do
if *i 9 / I "+ and *i 9 / J n+ then wFiG -. wFiG 9 pFi, /G 1
vFi 9 /G
end
end&
+parse matrices. A matri) is called sparse i% it consists mostly o% 6eros. :e have seen that sparse matrices o%
regular shape can be compressed e%%iciently using address computation. $rregularly shaped sparse matrices+ on the
other hand+ do not yield grace%ully to compression into a smaller array in such a 5ay that access can be based on
address computation. $nstead+ the non6ero elements may be stored in an unstructured set o% records+ 5here each
record contains the pair HHi+ 3I+ ANi+ 3OI consisting o% an inde) tuple Hi+ 3I and the value ANi+ 3O. Any element that is
absent %rom this set is assumed to be 6ero. As the position o% a data element is stored e)plicitly as an inde) pair Hi+
3I+ this representation is not an implicit data structure. As a conse0uence+ access to a random element o% an
irregularly shaped sparse matri) typically re0uires searching %or it+ and thus is likely to be slo5er than the direct
access to an element o% a matri) o% regular shape stored in an implicit data structure.
")ercise& triangular matrices
Let A and B be lo5er#triangular n n#matricesL that is+ all elements above the diagonal are 6ero& Ai+3

V Bi+3

V 0 %or
i e 3.
HaI 2rove that the inverse Hi% it e)istsI and the matri) product o% lo5er#triangular matrices are again
lo5er#triangular.
HbI (evise a scheme %or storing t5o lo5er#triangular matrices A and B in one array C o% minimal si6e.
:rite a 2ascal declaration %or C and dra5 a picture o% its contents.
HcI :rite t5o %unctions
function O*i, /- " .. n+- real&
function ;*i, /- " .. n+- real&
199
HdI that access C and return the corresponding matri) elements.
HeI :rite a procedure that computes A &V A K B in place& The entries o% A in C are replaced by the entries
o% the product A K B. Xou may use a HsmallI constant number o% additional variables+ independent o%
the si6e o% A and B.
H%I .ame as HdI+ but using A &V A
U1
K B.
.olution
HaI The inverse o% an n u n#matri) e)ists i%% the determinant o% the matri) is non 6ero. Let A be a lo5er#
triangular matri) %or 5hich the inverse matri) B e)ists+ that is+
and
Let 1 ` 3 ` n. Then
and there%ore B is a lo5er#triangular matri).
Let A and B be lo5er#triangular+ C &V A K B&
$% i e 3+ this sum is empty and there%ore Ci+3 V 0 Hi. e. C is lo5er#triangularI.
HbI A and B can be stored in an array C o% si6e n K Hn ] 1I as %ollo5s H")hibit 20.2I&
const n . C &
var <- array F0 .. n, " .. nG of real&
")hibit 20.2& A staircase separates t5o triangular matrices
HcI stored in a rectangular array. Hgraphic does not matchI
function O*i, /- " .. n+- real
egin if i S / then return*0.0+ else return*<Fi, /G+ end&
function ;*i, /- " .. n+- real&
egin if i S / then return*0.0+ else return*<Fn 0 i, n 9 " 0
/G+ end&
HdI Because the ne5 elements o% the result matri) C over5rite the old elements o% A+ it is important to compute
them in the right order. .peci%ically+ 5ithin every ro5 i o% C+ elements C
i+3
must be computed %rom le%t to
right+ that is+ in increasing order o% 3.
procedure mult&
var i, /, k- integer& x- real&
egin
for i -. " to n do
for / -. " to i do egin
x -. 0.0&
for k -. / to i do x -. x 9 O*i, k+ 1 ;*k, /+&
<Fi, /G -. x
end
end&
*e+ procedure invertO&
var i, /, k- integer& x- real&
egin
for i -. " to n do egin
for / -. " to i 0 " do egin
x -. 0.0&
for k -. / to i 0 " do x -. x 0 <Fi, kG 1 <Fk, /G&
201
<Fi, /G -. x P <Fi, iG
end&
<Fi, iG -. ".0 P <Fi, iG
end
end&
procedure Oinvertedmult;&
egin invertO& mult end&
"mplementation of the fi/ed)length fifo 1ueue as a circular buffer
A %i%o 0ueue is needed in situations 5here t5o processes interact in the %ollo5ing 5ay. A process called producer
generates data %or a process called consumer. The processes typically 5ork in bursts& The producer may generate a
lot o% data 5hile the consumer is busy 5ith something elseL thus the data has to be saved temporarily in a bu%%er+
%rom 5hich the consumer takes it as needed. A keyboard driver and an editor are an e)ample o% this producer#
consumer interaction. The keyboard driver trans%ers characters generated by key presses into the bu%%er+ and the
editor reads them %rom the bu%%er and interprets them He.g. as control characters or as te)t to be insertedI. $t is
5orth remembering+ though+ that a bu%%er helps only i% t5o processes 5ork at about the same speed over the long
run. $% the producer is al5ays %aster+ any bu%%er 5ill over%lo5L i% the consumer is al5ays %aster+ no bu%%er is needed. A
bu%%er can e0uali6e only temporar di%%erences in speeds.
:ith some kno5ledge about the statistical behavior o% producer and consumer one can usually compute a bu%%er
si6e that is su%%icient to absorb producer bursts 5ith high probability+ and allocate the bu%%er statically in an array o%
%i)ed si6e. Among statically allocated bu%%ers+ a circular buffer is the natural implementation o% a %i%o 0ueue.
A circular bu%%er is an array B+ considered as a ring in 5hich the %irst cell BN0O is the successor o% the last cell BNm
U 1O+ as sho5n in ")hibit 20.3. The elements are stored in the bu%%er in consecutive cells bet5een the t5o pointers
;in; and ;out;& ;in; points to the empty cell into 5hich the ne)t element is to be insertedL ;out; points to the cell
containing the ne)t element to be removed. A ne5 element is inserted by storing it in BNinO and advancing ;in; to the
ne)t cell. The element in BNoutO is removed by advancing ;out; to the ne)t cell.
")hibit 20.3& $nsertions move the pointer ;in;+ deletions the pointer ;out; counterclock5ise around the array.
!otice that the pointers ;in; and ;out; meet both 5hen the bu%%er gets %ull and 5hen it gets empty. Clearly+ 5e
must be able to distinguish a %ull bu%%er %rom an empty one+ so as to avoid insertion into the %ormer and removal
%rom the latter. At %irst sight it appears that the pointers ;in; and ;out; are insu%%icient to determine 5hether a
circular bu%%er is %ull or empty. Thus the %ollo5ing implementation uses an additional variable n+ 5hich counts ho5
many elements are in the bu%%er.
const m . C & { length of buffer }
type addr . 0 .. m 0 "& { index range }
var ;- arrayFaddrG of elt& {storage}
in, out- addr& { access to buffer }
n- 0 .. m& { number of elements currently in buffer }
procedure create&
egin in -. 0& out -. 0& n -. 0 end&
function empty*+- oolean&
egin return*n . 0+ end&
function full*+- oolean&
egin return*n . m+ end&
procedure enAueue*x- elt+&
{ not to be called if the Jueue is full }
egin ;FinG -. x& in -. *in 9 "+ mod m& n -. n 9 " end&
function front*+- elt&
{ not to be called if the Jueue is empty }
egin return*;FoutG+ end&
procedure deAueue&
egin out -. *out 9 "+ mod m& n -. n 0 " end&
203
The producer uses only ;en0ueue; and ;%ull;+ as it deletes no elements %rom the circular bu%%er. The consumer uses
only ;%ront;+ ;de0ueue;+ and ;empty;+ as it inserts no elements.
The state o% the circular bu%%er is described by its contents and the values o% ;in;+ ;out;+ and n. .ince ;in; is changed
only 5ithin ;en0ueue;+ only the producer needs 5rite#access to ;in;. .ince ;out; is changed only by ;de0ueue;+ only the
consumer needs 5rite#access to ;out;. The variable n+ ho5ever+ is changed by both processes and thus is a shared
variable to 5hich both processes have 5rite#access H")hibit 20.< HaII.
")hibit 20.<&
HaI 2roducer and consumer both have 5rite#access to shared variable n.
HbI The producer has readQ5rite#access to ;in; and read#only#access to ;out;+
the consumer has readQ5rite#access to ;out; and read#only#access to ;in;.
$n a concurrent programming environment 5here several processes e)ecute independently+ access to shared
variables must be synchroni6ed. .ynchroni6ation is overhead to be avoided i% possible. The shared variable n
becomes super%luous H")hibit 20.< HbII i% 5e use the time#honored trick o% leaving at least one cell %ree as a sentinel.
This ensures that ;empty; and ;%ull;+ 5hen e)pressed in terms o% ;in; and ;out;+ can be distinguished. .peci%ically+ 5e
de%ine ;empty; as in V out+ and ;%ull; as Hin ] 1I mod m V out. This leads to an elegant and more e%%icient
implementation o% the fi*ed-length fifo ,ueue by a circular bu%%er&
const m . C & { length of buffer }
type addr . 0 .. m 0 "& { index range }
fifoAueue . record
;- arrayFaddrG of elt& { storage }
in, out- addr { access to buffer }
end&
egin f.in -. 0& f.out -. 0 end&
egin return*f.in . f.out+ end&
function full*f- fifoAueue+- oolean&
egin return**f.in 9 "+ mod m . f.out+ end&
egin f.;Ff.inG -. x& f.in -. * f.in 9 "+ mod m end&
egin return*f.;Ff.outG+ end&
procedure deAueue*f- fifoAueue+&
egin f.out -. *f.out 9 "+ mod m end&
"mplementation of the fi/ed)length priority 1ueue as a heap
A fi*ed-length priorit ,ueue can be reali6ed by a circular bu%%er+ 5ith elements stored in the cells bet5een ;in;
and ;out;+ and ordered according to their priority such that ;out; points to the element 5ith highest priority H")hibit
20.@I. $n this implementation+ the operations ;min; and ;delete; have time comple)ity CH1I+ since ;out; points directly
to the element 5ith the highest priority. But insertion re0uires %inding the correct cell corresponding to the priority
o% the element to be inserted+ and shi%ting other elements in the bu%%er to make space. Binary search could achieve
the %ormer task in time CHlog nI+ but the latter re0uires time CHnI.
")hibit 20.@& $mplementing a %i)ed#length priority 0ueue by a circular bu%%er.
.hi%ting elements to make space %or a ne5 element costs CHnI time.
$mplementing a priority 0ueue as a linear list+ 5ith elements ordered according to their priority+ does not speed
up insertion& /inding the correct position o% insertion still re0uires time CHnI H")hibit 20.>I.
")hibit 20.>& $mplementing a %i)ed#length priority 0ueue by a linear list. /inding the correct
position %or a ne5 element costs CHnI time.
The heap is an elegant and e%%icient data structure %or implementing a priority 0ueue. $t allo5s the operation
;min; to be per%ormed in time CH1I and allo5s both ;insert; and ;delete; to be per%ormed in 5orst#case time CHlog nI.
A heap is a binary tree that&
obeys a structural property
obeys an order property
is embedded in an array in a certain 5ay
Structure: The binary tree is as balanced as possibleL all leaves are at t5o ad3acent levels+ and the nodes at the
bottom level are located as %ar to the le%t as possible H")hibit 20.7I.
20@
")hibit 20.7& A heap has the structure o% an almost complete binary tree.
Order: The element assigned to any node is ` the elements assigned to any children this node may have
H")hibit 20.DI.
")hibit 20.D& The order property implies that the smallest element is stored at the root.
The order property implies that the smallest element Hthe one 5ith top priorityI is stored in the root. The ;min;
operation returns its value in time CH1I+ but the most obvious 5ay to delete this element leaves a hole+ 5hich takes
time to %ill. Ao5 can the tree be reorgani6ed so as to retain the structural and the order property? The structural
condition re0uires the removal o% the rightmost node on the lo5est level. The element stored thereU13 in our
e)ampleUis used HtemporarilyI to %ill the vacuum in the root. The root may no5 violate the order condition+ but the
latter can be restored by si%ting 13 do5n the tree according to its 5eight H ")hibit 20.9I. $% the order condition is
violated at any node+ the element in this node is e)changed 5ith the smaller o% the elements stored in its childrenL
in our e)ample+ 13 is e)changed 5ith 2. This sift-down process continues until the element %inds its proper level+ at
the latest 5hen it lands in a lea%.
1
2
3 9
19 10 8 4 13
6
5
7
")hibit 20.9& 8ebuilding the order property o% the tree in ")hibit 20.D a%ter 1 has been
removed and 13 has been moved to the root.
$nsertion is handled analogously. The structural condition re0uires that a ne5 node is created on the bottom
level at the le%tmost empty slot. The ne5 element # 0 in our e)ample # is temporarily stored in this node H")hibit
20.10I. $% the parent node no5 violates the order condition+ 5e restore it by %loating the ne5 element up5ard
according to its 5eight. $% the ne5 element is smaller than the one stored in its parent node+ these t5o elements # in
our e)ample 0 and > # are e)changed. This sift-up process continues until the element %inds its proper level+ at the
latest 5hen it sur%aces at the root.
")hibit 20.10& 8ebuilding the order property o% the tree in ")hibit 20.D a%ter 0 has
been inserted in a ne5 rightmost node on the lo5est level.
The number o% steps e)ecuted during the si%t#up process and the si%t#do5n process is at most e0ual to the height
o% the tree. The structural condition implies that this height is Nlog
2

nO. Thus both ;insert; and ;delete; in a heap 5ork
in time CHlog nI.
207
A binary tree can be implemented in many di%%erent 5ays+ but the special class o% trees that meets the structural
condition stated above has a particularly e%%icient array implementation. A heap is a binary tree that satis%ies the
structural and the order condition and is embedded in a linear array in such a 5ay that the children o% a node 5ith
inde) i have indices 2 K i and 2 K i ] 1 H")hibit 20.11I. Thus the parent o% a node 5ith inde) 3 has inde) 3 div 2. Any
subtree o% a heap is also a heap+ although it may not be stored contiguously. The order property %or the heap implies
that the elements stored at indices 2 K i and 2 K i ] 1 are Z the element stored at inde) i. This order is called the heap
order.
")hibit 20.11& "mbedding the tree o% ")hibit 20.D in a linear array.
The procedure ;restore; is a use%ul tool %or managing a heap. $t creates a heap out o% a binary tree embedded in a
linear array h that satis%ies the structural condition+ provided that the t5o subtrees o% the root node are already
heaps. 2rocedure ;restore; is applied to subtrees o% the entire heap 5hose nodes are stored bet5een the indices L
and 8 and 5hose tree structure is de%ined by the %ormulas 2 K i and 2 K i ] 1.
const m . C & { length of heap }
type addr . " .. m&
var h- arrayFaddrG of elt&
procedure restore*8, 5- addr+&
var i, /- addr&
egin
i -. 8&
while i J *5 div $+ do egin
if *$ 1 i S 5+ cand *hF$ 1 i 9 "G S hF$ 1 iG+ then / -. $ 1 i 9
" else / -. $ 1 i&
if hF/G S hFiG then { hFiG -.- hF/G& i -. / } else i -. 5
end
end&
.ince ;restore; operates along a single path %rom the root to a lea% in a tree 5ith at most 8 U L nodes+ it 5orks in
time CHlog H8 U LII.
Creating a heap
An array h can be turned into a heap as %ollo5s& %or i &V n div 2 do5n to 1 do restoreHi+ nIL
This is more e%%icient than repeated insertion o% a single element into an e)isting heap. .ince the %or loop is
e)ecuted n div 2 times+ and n U i ` n+ the time comple)ity %or creating a heap 5ith n elements is CHn K log nI. A more
care%ul analysis sho5s that the time comple)ity %or creating a heap is CHnI.
Aeap implementation o% the %i)ed#length priority 0ueue
const m . C & { maximum length of heap }
type addr . " .. m&
priorityAueue . record
h- arrayFaddrG of elt& { heap storage }
n- 0 .. m { current number of elements }
end&
procedure restore*var h- arrayFaddrG of elt& 8, 5- addr+&
egin C end&
procedure create*var p- priorityAueue+&
egin p.n -. 0 end&
function empty*p- priorityAueue+- oolean&
egin return*p.n . 0+ end&
function full*p- priorityAueue+- oolean&
egin return*p.n . m+ end&
procedure insert*var p- priorityAueue& x- elt+&
var i- " .. m&
egin
p.n -. p.n 9 "& p.hFp.nG -. x& i -. p.n&
while *i K "+ cand *p.hFiG S p.hFi div $G+ do
{ p.hFiG -.- p.hFi div $G& i -. i div $ }
end&
function min*p- priorityAueue+- elt&
egin return*p.hF"G+ end&
procedure delete*var p- priorityAueue+&
egin p.hF"G -. p.hFp.nG& p.n -. p.n 0 "& restore*p.h, ", p.n+
end&
Heapsort
The heap is the core o% an elegant CHn K log nI sorting algorithm. The %ollo5ing procedure ;heapsort; sorts n
elements stored in the array h into decreasing order.
procedure heapsort*n- addr+& { sort elements stored in h7& "" n8 }
var i- addr&
egin { heap creation phase= the heap is built up }
for i -. n div $ downto " do restore*i, n+&
{ shift-up phase= elements are extracted from heap in increasing
order }
for i -. n downto $ do { hFiG -.- hF"G& restore*", i 0 "+ }
end&
"ach o% the %or loops is e)ecuted less than n times+ and the time comple)ity o% restore is CHlog nI. Thus heapsort
al5ays 5orks in time CHn K log nI.
209
1. Block#diagonal matrices are composed o% smaller matrices that line up along the diagonal and have 0
elements every5here else+ as sho5n in ")hibit 20.12. .ho5 ho5 to store an arbitrary block#diagonal matri)
in a minimal storage area+ and 5rite do5n the corresponding address computation %ormulas.
")hibit 20.12& .tructure o% a block#diagonal matri).
2. Let A be an antisymmetric n n#matri) Hi. e.+ all elements o% the matri) satis%y Ai3 V UA3iI.
HaI :hat values do the diagonal elements Aii o% the matri) have?
HbI Ao5 can A be stored in a linear array c o% minimal si6e? :hat i s the si6e o% c?
HcI :rite a
function O*i, /- " .. n+- real&
5hich returns the value o% the corresponding matri) element.
3. .ho5 that the product o% t5o n n matrices o% 5idth 2 K b ] 1 Hb V 0+ 1+ [I is again a band matri). :hat is
the 5idth o% the product matri)? :rite a procedure that computes the product o% t5o band matrices both
having the same 5idth and stores the result as a band matri) o% minimal 5idth.
<. $mplement a double#ended 0ueue Hde0ueI by a circular bu%%er.
@. :hat are the minimum and ma)imum numbers o% elements in a heap o% height h?
>. (etermine the time comple)ities o% the %ollo5ing operations per%ormed on a heap storing n elements. HaI
.earching any element. HbI .earching the largest element Hi.e. the element 5ith lo5est priorityI.
7. $mplement heapsort and animate the sorting process+ %or e)ample as sho5n in the snapshots in RAlgorithm
animationS. Compare the number o% comparisons and e)change operations needed by heapsort and other
sorting algorithms He.g. 0uicksortI %or di%%erent input con%igurations.
D. :hat is the running time o% heapsort on an array hN1 .. nO that is already sorted in increasing order? :hat
about decreasing order?
9. $n a k#ary heap+ nodes have k children instead o% 2 children.
HaI Ao5 5ould you represent a k#ary heap in an array?
HbI :hat is the height o% a k#ary heap in terms o% the number o% elements n and k?
HcI $mplement a priority 0ueue by a k#ary heap. :hat are the time comple)ities o% the operations ;insert;
and ;delete; in terms o% n and k?
-%& Aist structures
static vs dynamic data structures
linear+ circular and t5o#5ay lists
%i%o 0ueue implemented as a linear list
breadth#%irst and depth#%irst tree traversal
traversing a binary tree 5ithout any au)iliary memory& triple tree traversal algorithm
dictionary implemented as a binary search tree
Balanced trees guarantee that dictionary operations can be per%ormed in logarithmic time
height#balanced trees
multi5ay trees
Aists$ memory management$ pointer variables
The spectrum o% data structures ranges %rom static ob3ects+ such as a table o% constants+ to dynamic structures+
such as lists. A list is designed so that not only the data values stored in it+ but its si&e and shape can change at run
time+ due to insertions+ deletions+ or rearrangement o% data elements. 'ost o% the data structures discussed so %ar
can change their si6e and shape to a limited e)tent. A circular bu%%er+ %or e)ample+ supports insertion at one end and
deletion at the other+ and can gro5 to a predeclared ma)imal si6e. A heap supports deletion at one end and
insertion any5here into an array. $n a list+ any local change can be done 5ith an e%%ort that is independent o% the
si6e o% the list # provided that 5e kno5 the memory locations o% the data elements involved. The key to meeting this
re0uirement is the idea o% abandoning memory allocation in large contiguous chunks+ and instead allocating it
dynamically in the smallest chunk that 5ill hold a given ob3ect. Because data elements are stored randomly in
memory+ not contiguously+ an insertion or deletion into a list does not propagate a ripple e%%ect that shi%ts other
elements around. An element inserted is allocated any5here in memory 5here there is space and tied to other
elements by pointers Hi.e. addresses o% the memory locations 5here these elements happen to be stored at the
momentI. An element deleted does not leave a gap that needs to be %illed as it 5ould in an array. $nstead+ it leaves
some %ree space that can be reclaimed later by a memory management process. The element deleted is likely to
break some chains that tie other elements togetherL i% so+ the broken chains are relinked according to rules speci%ic
to the type o% list used.
2ointers are the language %eature used in modern programming languages to capture the e0uivalent o% a
memory address. A pointer value is essentially an address+ and a pointer variable ranges over addresses. A pointer+
ho5ever+ may contain more in%ormation than merely an address. $n 2ascal and other strongly typed languages+ %or
e)ample+ a pointer also re%erences the type de%inition o% the ob3ects it can point to # a %eature that enhances the
compiler;s ability to check %or consistent use o% pointer variables.
Let us illustrate these concepts 5ith a simple e)ample& a one-way linear list is a se0uence o% cells each o%
5hich He)cept the lastI points to its successor. The %irst cell is the head o% the list+ the last cell is the tail. .ince the
21. 4ist structures
tail has no successor+ its pointer is assigned a prede%ined value ;nil;+ 5hich di%%ers %rom the address o% any cell.
Access to the list is provided by an e)ternal pointer ;head;. $% the list is empty+ ;head; has the value ;nil;. A cell stores
an element )
i
and a pointer to the successor cell H")hibit 21.1I&
type cptr . _cell&
cell . record e- elt& next- cptr end&
")hibit 21.1& A one#5ay linear list.
Local operations+ such as insertion or deletion at a position given by a pointer p+ are e%%icient. /or e)ample+ the
%ollo5ing statements insert a ne5 cell containing an element y as successor o% a cell being pointed at by p H ")hibit
21.2I&
new*A+& A_.e -. y& A_.next -. p_.next& p_.next -. A&
")hibit 21.2& $nsertion as a local operation.
The successor o% the cell pointed at by p is deleted by a single assignment statement H")hibit 21.3I&
p_.next -. p_.next_.next&
")hibit 21.3& (eletion as a local operation.
An insertion or deletion at the head or tail o% this list is a special case to be handled separately. To support
insertion at the tail+ an additional pointer variable ;tail; may be set to point to the tail element+ i% it e)ists.
A one#5ay linear list sometimes is handier i% the tail points back to the head+ making it a circular list. $n a
circular list+ the head and tail cells are replaced by a single entry cell+ and any cell can be reached %rom any other
5ithout having to start at the e)ternal pointer ;entry; H")hibit 21.<I.
")hibit 21.<& A circular list combines head and tail into a single entry point
212
$n a two-wa Hor doubl linkedI list each cell contains t5o pointers+ one to its successor+ the other to its
predecessor. The list can be traversed in both directions. ")hibit 21.@ sho5s a circular t5o#5ay list.
")hibit 21.@& A circular t5o#5ay or doubly#linked list
")ercise& traversal o% a singly linked list in both directions
:rite a recursive
procedure traverse*p- cptr+&
to traverse a singly linked list %rom the head to the tail and back again. At each visit o% a node+ call the
procedure visit*p- cptr+&
.olve the same problem iteratively 5ithout using any additional storage beyond a %e5 local pointers. Xour
traversal procedure may modi%y the structure o% the list temporarily.
.olution
*a+ procedure traverse*p- cptr+&
egin if p T nil then = visit*p+& traverse*p_.next+&
visit*p+ > end&
The initial call of this procedure is
traverse*head+&
*+ procedure traverse*p- cptr+&
var o, A- cptr& i- integer&
egin
for i -. " to $ do = forward and ack again > egin
o -. nil&
while p T nil do egin
visit*p+& A -. p_.next& p_.next -. o&
o -. p& p -. A = the fork advances >
end&
p -. o
end
end&
Traversal ecomes simpler if we let the ,next, pointer of the tail
cell point to this cell itself-
procedure traverse*p- cptr+&
var o, A- cptr&
egin
o -. nil&
visit*p+& A -. p_.next& p_.next -. o&
o -. p& p -. A = the fork advances >
end
end&
21. 4ist structures
The fifo 1ueue implemented as a one)*ay list
$t is natural to implement a %i%o 0ueue as a one#5ay linear list+ 5here each element points to the ne)t one Ein
lineE. The operation ;de0ueue; occurs at the pointer ;head;+ and ;en0ueue; is made %ast by having an e)ternal pointer
;tail; point to the last element in the 0ueue. A cra%ty implementation o% this data structure involves an empty cell+
called a sentinel+ at the tail o% the list. $ts purpose is to make the list#handling procedures simpler and %aster by
making the empty 0ueue look more like all other states o% the 0ueue. 'ore precisely+ 5hen the 0ueue is empty+ the
e)ternal pointers ;head; and ;tail; both point to the sentinel rather than having the value ;nil;. The sentinel allo5s
insertion into the empty 0ueue+ and deletion that results in an empty 0ueue+ to be handled by the same code that
handles the general case o% ;en0ueue; and ;de0ueue;. The reader should veri%y our claim that a sentinel simpli%ies the
code by programming the plausible+ but less e%%icient+ procedures 5hich assume that an empty 0ueue is represented
by head V tail V nil.
The 0ueue is empty i% and only i% ;head; and ;tail; both point to the sentinel Hi.e. i% head V tailI. An ;en0ueue;
operation is per%ormed by inserting the ne5 element into the sentinel cell and then creating a ne5 sentinel.
type cptr . _cell&
cell . record e- elt& next- cptr end&
fifoAueue . record head, tail- cptr end&
egin new*f.head+& f.tail -. f.head end&
egin return*f.head . f.tail+ end&
egin f.tail_.e -. x& new*f.tail_.next+& f.tail -. f.tail_.next
end&
egin return*f.head_.e+ end&
procedure deAueue*var f- fifoAueue+&
egin f.head -. f.head_.next end&
Tree traversal
:hen 5e speak o% trees in computer science+ 5e usually mean rooted ordered trees& they have a
distinguished node called the root+ and the subtrees o% any node are ordered. 8ooted+ ordered trees are best de%ined
recursively& a tree T is either empty+ or it is a tuple H!+ T
1
+ [ + T
k
I+ 5here ! is the root o% the tree+ and T
1
+ [ + T
k
is a
se0uence o% trees. Binary trees are the special case k V 2.
Trees are typically used to organi6e data or activities in a hierarchy& a top#level data set or activity is composed o%
a ne)t level o% data or activities+ and so on. :hen one 5ishes to gather or survey all o% the data or activities+ it is
necessary to traverse the tree+ visiting Hi.e. processingI the nodes in some systematic order. The visit at each node
might be as simple as printing its contents or as complicated as computing a %unction that depends on all nodes in
the tree. There are t5o ma3or 5ays to traverse trees& breadth %irst and depth %irst.
/readth-first traversal visits the nodes level by level. This is use%ul in heuristic search+ 5here a node
represents a partial solution to a problem+ 5ith deeper levels representing more complete solutions. Be%ore
pursuing any one solution to a great depth+ it may be advantageous to assess all the partial solutions at the present
21<
level+ in order to pursue the most promising one. :e do not discuss breadth#%irst traversal %urther+ 5e merely
suggest the %ollo5ing&
")ercise& breadth#%irst traversal
(ecide on a representation %or trees 5here each node may have a variable number o% children. :rite a
procedure %or breadth#%irst traversal o% such a tree. =int: use a %i%o 0ueue to organi6e the traversal. The node to be
visited is removed %rom the head o% the 0ueue+ and its children are en0ueued+ in order+ at the tail end.
%epth-first traversal al5ays moves to the %irst unvisited node at the ne)t deeper level+ i% there is one. $t turns
out that depth#%irst better %its the recursive de%inition o% trees than breadth#%irst does and orders nodes in 5ays that
are more o%ten use%ul. :e discuss depth#%irst %or binary trees and leave the generali6ation to other trees to the
reader. (epth#%irst can generate three basic orders %or traversing a binary tree& preorder+ inorder+ and
postorder+ de%ined recursively as&
preorder ìsit root, traverse left sutree, traverse right sutree.
!norder Traverse left sutree, visit root, traverse right sutree.
postorder Traverse left sutree, traverse right sutree, visit root.
/or the tree in ")hibit 21.>5e obtain the orders sho5n.
")hibit 21.>& .tandard orders de%ined on a binary tree
An arithmetic e)pression can be represented as a binary tree by assigning the operands to the leaves and the
operators to the internal nodes. The basic traversal orders correspond to di%%erent notations %or representing
arithmetic e)pressions. By traversing the e)pression tree H")hibit 21.7I in preorder+ inorder+ or postorder+ 5e
obtain the prefix+ infix+ or suffix notation+ respectively.
")hibit 21.7& .tandard traversal orders correspond to di%%erent notations %or arithmetic e)pressions
A binary tree can be implemented as a list structure in many 5ays. The most common 5ay uses an e)ternal
pointer ;root; to access the root o% the tree and represents each node by a cell that contains a %ield %or an element to
be stored+ a pointer to the root o% the le%t subtree+ and a pointer to the root o% the right subtree H")hibit 21.DI. An
empty le%t or right subtree may be represented by the pointer value ;nil;+ or by pointing at a sentinel+ or+ as 5e shall
see+ by a pointer that points to the node itsel%.
type nptr . _node&
node . record e- elt& 8, 5- nptr end&
var root- nptr&
21. 4ist structures
")hibit 21.D& .traight%or5ard implementation o% a binary tree
The %ollo5ing procedure ;traverse; implements any or all o% the three orders preorder+ inorder+ and postorder+
depending on ho5 the procedures ;visit1;+ ;visit2;+ and ;visit3; process the data in the node re%erenced by the pointer p.
The root o% the subtree to be traversed is passed through the %ormal parameter p. $n the simplest case+ a visit does
nothing or simply prints the contents o% the node.
procedure traverse*p- nptr+&
egin
if p T nil then egin
visit
"
*p+& { preorder }
traverse*p_.8+&
visit
$
*p+& { inorder }
traverse*p_.5+&
visit
%
*p+ { postorder }
end
end&
Traversing a tree involves both advancing %rom the root to5ard the leaves+ and backing up %rom the leaves
to5ard the root. 8ecursive invocations o% the procedure ;traverse; build up a stack 5hose entries contain re%erences
to the nodes %or 5hich ;traverse; has been called. These entries provide a means o% returning to a node a%ter the
traversal o% one o% its subtrees has been %inished. The bookkeeping done by a stack or e0uivalent au)iliary structure
can be avoided i% the tree to be traversed may be modi%ied temporarily.
The %ollo5ing triple-tree traversal algorithm provides an elegant and e%%icient 5ay o% traversing a binary tree
5ithout using any au)iliary memory Hi.e. no stack is used and it is not assumed that a node contains a pointer to its
parent nodeI. The data structure is modi%ied temporarily to retain the in%ormation needed to %ind the 5ay back up
the tree and to restore each subtree to its initial condition a%ter traversal. The triple#tree traversal algorithm
assumes that an empty subtree is encoded not by a ;nil; pointer+ but rather by an L Hle%tI or 8 HrightI pointer that
points to the node itsel%+ as sho5n in ")hibit 21.9.
21>
")hibit 21.9& Coding o% a lea% used in procedure TTT
procedure TTT&
var o, p, A- nptr&
egin
o -. nil& p-. root&
visit*p+&
A -. p_.8&
p_.8 -. p_.5& { rotate left pointer }
p_.5 -. o& { rotate right pointer }
o -. p&
p -. A
end
end&
$n this procedure the pointers p HEpresentEI and o HEoldEI serve as a t5o#pronged %ork. The tree is being
traversed by the pointer p and the companion pointer o+ 5hich al5ays lags one step behind p. The t5o pointers
%orm a t5o#pronged %ork that runs around the tree+ starting in the initial condition 5ith p pointing to the root o% the
tree+ and o V nil. An au)iliary pointer 0 is needed temporarily to advance the %ork. The 5hile loop in ;TTT; is
e)ecuted as long as p points to a node in the tree and is terminated 5hen p assumes the value ;nil;. The initial value
o% the o pointer gets saved as a temporary value. /irst it is assigned to the 8 pointer o% the root+ later to the L
pointer. /inally+ it gets assigned to p+ the %ork e)its %rom the root o% the tree+ and the traversal o% the tree is
complete. The correctness o% this algorithm is proved by induction on the number o% nodes in the tree.
Induction hpothesis =: i% at the beginning o% an iteration o% the 5hile loop+ the %ork pointer p points to the root
o% a subtree 5ith n d 0 nodes+ and o has a value ) that is di%%erent %rom any pointer value inside this subtree+ then
a%ter 3 K n iterations the subtree 5ill have been traversed in triple order Hvisiting each node e)actly three timesI+ all
tree pointers in the subtree 5ill have been restored to their original value+ and the %ork pointers 5ill have been
reversed Hi.e. p has the value ) and o points to the root o% the subtreeI.
5ase of induction: A is true %or n V 1.
Proof: The smallest tree 5e consider has e)actly one node+ the root alone. Be%ore the 5hile loop is e)ecuted %or
this subtree+ the %ork and the tree are in the initial state sho5n in")hibit 21.10. ")hibit 21.11 sho5s the state o% the
%ork and the tree a%ter each iteration o% the 5hile loop. The node is visited in each iteration.
")hibit 21.10 & $nitial con%iguration %or traversing a tree consisting o% a single node
21. 4ist structures
")hibit 21.11& Tracing procedure TTT 5hile traversing the smallest tree
Induction step: $% A is true %or all n+ 0 e n ` k+ A is also true %or k ] 1.
Proof: Consider a tree T 5ith k ] 1 nodes. T consists o% a root and k nodes shared among the le%t and right
subtrees o% the root. "ach o% these subtrees has ` k nodes+ so 5e apply the induction hypothesis to each o% them.
The %ollo5ing is a highly compressed account o% the proo% o% the induction step+ illustrated by ")hibit 21.12.
Consider the tree 5ith k ] 1 nodes sho5n in state 1. The root is a node 5ith three %ieldsL the le%t and right subtrees
are sho5n as triangles. The %igure sho5s the typical case 5hen both subtrees are nonempty. $% one o% the t5o
subtrees is empty+ the corresponding pointer points back to the rootL these t5o cases can be handled similarly to the
case n V 1. The %ork starts out 5ith p pointing at the root and o pointing at anything outside the subtree being
traversed. :e 5ant to sho5 that the initial state 1 is trans%ormed in 3 K Hk ] 1I iterations into the %inal state >. $n the
%inal state the subtrees are shaded to indicate that they have been correctly traversedL the %ork has e)ited %rom the
root+ 5ith p and o having e)changed values. To sho5 that the algorithm correctly trans%orms state 1 into state >+ 5e
consider the intermediate states 2 to @+ and 5hat happens in each transition.
1 2 Cne iteration through the 5hile loop advances the %ork into the le%t subtree and rotates the pointers o% the
root.
2 3 A applied to the le%t subtree o% the root says that this subtree 5ill be correctly traversed+ and the %ork 5ill
e)it %rom the subtree 5ith pointers reversed.
3 < This is the second iteration through the 5hile loop that visits the root. The %ork advances into the right
subtree+ and the pointers o% the root rotate a second time.
< @ A applied to the right subtree o% the root says that this subtree 5ill be correctly traversed+ and the %ork 5ill
e)it %rom the subtree 5ith pointers reversed.
@ > This is the third iteration through the 5hile loop that visits the root. The %ork moves out o% the tree being
traversedL the pointers o% the root rotate a third time and thereby assume their original values.
21D
")hibit 21.12& Trace o% procedure TTT+ invoking the induction hypothesis
")ercise& binary trees
Consider a binary tree declared as %ollo5s&
type nptr . _node&
node . record 8, 5- nptr end&
var root- nptr&
HaI $% a node has no le%t or right subtree+ the corresponding pointer has the value ;nil;. 2rove that a binary tree
5ith n nodes+ n d 0+ has n ] 1 ;nil; pointers.
21. 4ist structures
HbI :rite a %unction nodesH[I& integerL that returns the number o% nodes+ and a %unction depthH[I& integerL
that returns the depth o% a binary tree. The depth o% the root is de%ined to be 0L the depth o% any other node
is the depth o% its parent increased by 1. The depth o% the tree is the ma)imum depth o% its nodes.
.olution
HaI "ach node contains t5o pointers+ %or a total o% 2 K n pointers in the tree. There is e)actly one pointer that
points to each o% n U 1 nodes+ none points to the root. Thus 2 K n U Hn U 1I V n ] 1 pointers are ;nil;. This can
also be proved by induction on the number o% nodes in the tree.
*+ function nodes*p- nptr+- integer&
egin
if p . nil then
return*0+
else
return*nodes*p_.8+ 9 nodes*p_.5+ 9 "+
end&
function depth*p- nptr+- integer&
egin
if p . nil then return *0"+
else return*" 9 max*depth*p_.8+, depth*p_.5+++
end&
where ,max, is
function max*a, - integer+- integer&
egin if a K then return*a+ else return*+ end&
")ercise& list copying
"%%ective memory management sometimes makes it desirable or necessary to copy a list. /or e)ample+
per%ormance may improve drastically i% a list spread over several pages can be compressed into a single page. List
copying involves a traversal o% the original concurrently 5ith a traversal o% the copy+ as the latter is being built up.
HaI Consider binary trees built %rom nodes o% type ;node; and pointers o% type ;nptr;. A tree is accessed through
a pointer to the root+ 5hich is ;nil; %or an empty tree
type nptr . _ node&
:rite a recursive
function cptree*p- nptr+- nptr&
to copy a tree given by a pointer p to its root+ and return a pointer to the root o% the copy.
HbI Consider arbitrary graphs built %rom nodes o% a type similar to the nodes in HaI+ but they have an additional
pointer %ield cn+ intended to point to the copy o% a node&
type node . record e- elt& 8, 5- nptr& cn- nptr end&
A graph is accessed through a pointer to a node called the origin+ and 5e are only concerned 5ith nodes that can
be reached %rom the originL this access pointer is ;nil; %or an empty graph. :rite a recursive
function cpgraph*p- nptr+- nptr&
220
to copy a graph given by a pointer p to its origin+ and return a pointer to the origin o% the copy. *se the %ield cn+
assuming that its initial value is ;nil; in every node o% the original graphL set it to ;nil; in every node o% the copy.
.olution
*a+ function cptree*p- nptr+- nptr&
var cp- nptr&
egin
if p . nil then
return*nil+
else egin
new*cp+&
cp_.e -. p_.e& cp_.8 -. cptree*p_.8+& cp_.5 -. cptree*p_.5+&
return*cp+
end
end&
*+ function cpgraph*p- nptr+- nptr&
var cp- nptr&
egin
if p . nil then
return*nil+
elsif p_.cn T nil then { node has already been copied }
return*p_.cn+
else egin
new*cp+& p_.cn -. cp& cp_.cn -. nil&
cp_.e -. p_.e& cp_.8 -. cpgraph*p_.8+& cp_.5 -. cpgraph*p_.5+&
return*cp+
end
end&
")ercise& list copying 5ith constant au)iliary memory
Consider binary trees as in part HaI o% the preceding e)ercise. 'emory %or the stack implied by the recursion can
be saved by 5riting an iterative tree copying procedure that uses only a constant amount o% au)iliary memory. This
re0uires a trick+ as any depth#%irst traversal must be able to back up %rom the leaves to5ard the root. $n the triple#
tree traversal procedure+ the return path is temporarily encoded in the tree being traversed. This idea can again be
used here+ but there is a simpler solution& The return path is temporarily encoded in the 8#%ields o% the copyL the L#
%ields o% certain nodes o% the copy point back to the corresponding node in the original. :ork out the details o% a
tree#copying procedure that 5orks 5ith CH1I au)iliary memory.
")ercise& traversing a directed acyclic graph
A directed graph consists o% nodes and directed arcs+ 5here each arc leads %rom one node to another. A directed
graph is acclic i% the arcs %orm no cycles. Cne 5ay to ensure that a graph is acyclic is to label nodes 5ith distinct
integers and to dra5 each arc %rom a lo5er number to a higher number. Consider a binary directed acyclic graph+
5here each node has t5o pointer %ields+ L and 8+ to represent at most t5o arcs that lead out o% that node. An
e)ample is sho5n in ")hibit 21.13.
")hibit 21.13& A rooted acyclic graph.
21. 4ist structures
HaI :rite a program to visit every node in a directed acyclic graph reachable %rom a pointer called ;root;. Xou
are %ree to e)ecute procedure ;visit; %or each node as o%ten as you like.
HbI :rite a program similar to HaI 5here you are re0uired to e)ecute procedure ;visit; e)actly once per node.
=int: !odes may need to have additional %ields.
")ercise& counting nodes on a s0uare grid
Consider a net5ork superimposed on a s0uare grid& each node is connected to at most %our neighbors in the
directions east+ north+ 5est+ south H")hibit 21.1<I&
type nptr . _node&
node . record 7, D, ?, S- nptr& status- oolean end&
var origin- nptr&
")hibit 21.1<& A graph embedded in a s0uare grid.
A ;nil; pointer indicates the absence o% a neighbor. !eighboring nodes are doubly linked& i% a pointer in node p
points to node 0+ the reverse pointer o% 0 points to pL He.g.+ pv.: V 0 and 0v." V pI. The pointer ;origin; is ;nil; or
points to a node. Consider the problem o% counting the number o% nodes that can be reached %rom ;origin;. Assume
that the status %ield o% all nodes is initially set to %alse. Ao5 do you use this %ield? :rite a %unction nnHp& nptrI&
integerL to count the number o% nodes.
.olution
function nn*p- nptr+- integer&
egin
if p . nil cor p_.status then
return*0+
else egin
p_.status-. true&
return*" 9 nn*p_.7+ 9 nn*p_.D+ 9 nn*p_.?+ 9 nn*p_.S++
end
end&
")ercise& counting nodes in an arbitrary net5ork
:e generali6e the problem above to arbitrary directed graphs+ such as that o% ")hibit 21.1@+ 5here each node
may have any number o% neighbors. This graph is represented by a data structure de%ined by ")hibit 21.1> and the
type de%initions belo5. "ach node is linked to an arbitrary number o% other nodes.
")hibit 21.1@& An arbitrary HcyclicI directed graph.
222
")hibit 21.1>& A possible implementation as a list structure.
type nptr . _node& cptr . _cell&
node . record status- oolean& np- nptr& cp- cptr end&
cell . record np- nptr& cp- cptr end&
var origin- nptr&
The pointer ;origin; has the value ;nil; or points to a node. Consider the problem o% counting the number n o%
nodes that can be reached %rom ;origin;. The status %ield o% all nodes is initially set to %alse. Ao5 do you use it? :rite
a %unction nnHp& nptrI& integerL that returns n.
7inary search trees
A binary search tree is a binary tree T 5here each node ! stores a data element eH!I %rom a domain W on
5hich a total order ` is de%ined+ sub3ect to the %ollo5ing order condition& /or every node ! in T+ all elements in the
le%t subtree LH!I o% ! are e eH!I+ and all elements in the right subtree 8H!I o% ! are d eH!I. Let )1+ 2+ [ + )n be n
elements dra5n %rom the domain W.
Definition: A binary search tree %or )
1
+ )
2
+ [ + )
n
is a binary tree T 5ith n nodes and a one#to#one mapping
bet5een the n given elements and the n nodes+ such that
! in T !; LH!I !E 8H!I& eH!;I e eH!I e eH!EI
")ercise
.ho5 that the %ollo5ing statement is e0uivalent to this order condition& The inorder traversal o% the nodes o% T
coincides 5ith the natural order e o% the elements assigned to the nodes.
Remark: The order condition can be rela)ed to eH!;I ` eH!I e eH!EI to accommodate multiple occurrences o%
the same value+ 5ith only minor modi%ications to the statements and algorithms presented in this section. /or
simplicity;s sake 5e assume that all values in a tree are distinct.
The order condition permits binary search and thus guarantees a 5orst#case search time CHhI %or a tree o% height
h. Trees that are 5ell balanced Hin an intuitive senseL see the ne)t section %or a de%initionI+ that have not
degenerated into linear lists+ have a height h V CHlog nI and thus support search in logarithmic time.
21. 4ist structures
Basic operations on binary search trees are most easily implemented as recursive procedures. Consider a tree
represented as in the preceding section+ 5ith empty subtrees denoted by ;nil;. The %ollo5ing %unction ;%ind; searches
%or an element ) in a subtree pointed to by p. $t returns a pointer to a node containing ) i% the search is success%ul+
and ;nil; i% it is not.
function find*x- elt& p- nptr+- nptr&
egin
if p . nil then return*nil+
elsif x S p_.e then return*find*x, p_.8++
elsif x K p_.e then return*find*x, p_.5++
else { x ! pL"e } return*p+
end&
The %ollo5ing procedure ;insert; leaves the tree alone i% the element ) to be inserted is already stored in the tree.
The parameter p initially points to the root o% the subtree into 5hich ) is to be inserted.
procedure insert*x- elt& var p- nptr+&
egin
if p . nil then { new*p+& p_.e -. x& p_.8 -. nil& p_.5 -.
nil }
elsif x S p_.e then insert*x, p_.8+
elsif x K p_.e then insert*x, p_.5+
end&
!nitial call-
insert*x, root+&
To delete an element )+ 5e %irst have to %ind the node ! that stores ). $% this node is a lea% or semilea% Ha node
5ith only one subtreeI+ it is easily deletedL but i% it has t5o subtrees+ it is more e%%icient to leave this node in place
and to replace its element ) by an element %ound in a lea% or semilea% node+ and delete the latter H ")hibit 21.17I.
Thus 5e distinguish three cases&
1. $% ! has no child+ remove !.
2. $% ! has e)actly one child+ replace ! by this child node.
3. $% ! has t5o children+ replace ) by the largest element y in the le%t subtree+ or by the smallest element 6 in
the right subtree o% !. "ither o% these elements is stored in a node 5ith at most one child+ 5hich is removed
as in case H1I or H2I.
22<
")hibit 21.17& "lement ) is deleted 5hile preserving its node !. !ode ! is
%illed 5ith a ne5 value y+ 5hose old node is easier to delete.
A sentinel is again the key to an elegant iterative implementation o% binary search trees. $n a node 5ith no le%t or
right child+ the corresponding pointer points to the sentinel. This sentinel is a node that contains no elementL its le%t
pointer points to the root and its right pointer points to itsel%. The root+ i% it e)ists+ can only be accessed through the
le%t pointer o% the sentinel. The empty tree is represented by the sentinel alone H")hibit 21.1DI. A typical tree is
sho5n in ")hibit 21.19.
")hibit 21.1D& The empty binary tree is represented by the sentinel 5hich points to itsel%.
")hibit 21.19& A binary tree implemented as a list structure 5ith
sentinel.
The %ollo5ing implementation o% a dictionary as a binary search tree uses a sentinel accessed via the variable d&
type nptr . _node&
dictionary . nptr&
procedure create*var d- dictionary+&
egin {create sentinel } new*d+& d_.8 -. d& d_.5 -. d end&
21. 4ist structures
function memer*d- dictionary& x- elt+- oolean&
var p- nptr&
egin
d_.e -. x& { initiali)e element in sentinel }
p -. d_.8& { point to root, if it exists }
while x T p_.e do
if x S p_.e then p -. p_.8 else { x 6 pL"e } p -. p_.5&
return*p T d+
end&
2rocedure ;%ind; searches %or ). $% %ound+ p points to the node containing )+ and 0 to its parent. $% not %ound+ p
points to the sentinel and 0 to the parent#to#be o% a ne5 node into 5hich ) 5ill be inserted.
procedure find*d- dictionary& x- elt& var p, A- nptr+&
egin
d_.e -. x& p -. d_.8& A -. d&
while x T p_.e do egin
A -. p&
if x S p_.e then p -. p_.8 else { x 6 pL"e } p -. p_.5
end
end&
procedure insert*var d- dictionary& x- elt+&
var p, A- nptr&
egin
find*d, x, p, A+&
if p . d then egin { x is not yet in the tree }
new*p+& p_.e -. x& p_.8 -. d& p_.5 -. d&
if x J A_.e then A_.8 -. p else { x 6 JL"e } A_.5 -. p
end
end&
procedure delete*var d- dictionary& x- elt+&
var p, A, t- nptr&
egin
find*d, x, p, A+&
if p T d then { x has been found }
if *p_.8 T d+ and *p_.5 T d+ then egin
{ p has left and right children; find largest element in left
subtree }
t -. p_.8& A-. p&
while t_.5 T d do { A -. t& t -. t_.5 }&
if t_.e S A_.e then A_.8 -. t_.8 else { tL"e 6 JL"e }
A_.5 -. t_.8
p_.e -. t_.e&
end
else egin { p has at most one child }
if p_.8 T d then{ left child only } p -. p_.8
elsif p_.5 T d then{ right child only } p -. p_.5
else { p has no children }p -. d&
if x J A_.e then A_.8 -. p else { x 6 JL"e } A_.5 -. p
end
end&
$n the best case o% a completely balanced binary search tree %or n elements+ all leaves are on levels [log
2
n] or
[log
2
nOU 1+ and the search tree has the height [log
2

nO. The cost %or per%orming the ;member;+ ;insert;+ or ;delete;
operation is bounded by the longest path %rom the root to a lea% Hi.e. the height o% the treeI and is there%ore CHlog nI.
22>
:ithout any %urther provisions+ a binary search tree can degenerate into a linear list in the 5orst case. Then the cost
%or each o% the operations 5ould be CHnI.
:hat is the e)pected average cost %or the search operation in a randoml generated binary search tree?
E8andomly generatedE means that each permutation o% the n elements to be stored in the binary search tree has the
same probability o% being chosen as the input se0uence. /urthermore+ 5e assume that the tree is generated by
insertions only. There%ore+ each o% the n elements is e0ually likely to be chosen as the root o% the tree. Let pn be the
e)pected path length o% a randomly generated binary search tree storing n elements. Then
As sho5n in chapter 1> in the section R8ecurrence relationsS+ this recurrence relation has the solution
.ince the average search time in randomly generated binary search trees+ measured in terms o% the number o%
nodes visited+ is p
n

Q n and ln < o 1.3D>+ it %ollo5s that the cost is CHlog nI and there%ore only about <0 per cent
higher than in the case o% completely balanced binary search trees.
Balanced trees& general de%inition
$% insertions and deletions occurred at random+ and the assumption o% the preceding section 5as realistic+ 5e
could let search trees gro5 and shrink as they please+ incurring a modest increase o% <0 per cent in search time over
completely balanced trees. But real data are not random& they are typically clustered+ and long runs o%
monotonically increasing or decreasing elements occur+ o%ten as the result o% a previous processing step.
*n%ortunately+ such deviation %rom randomness degrades the per%ormance o% search trees.
To prevent search trees %rom degenerating into linear lists+ 5e can monitor their shape and restructure them
into a more balanced shape 5henever they have become too ske5ed. .everal classes o% balanced search trees
guarantee that each operation ;member;+ ;insert;+ and ;delete; can be per%ormed in time CHlog nI in the 5orst case.
.ince the 5ork to be done depends directly on the height o% the tree+ such a class B o% search trees must satis%y the
%ollo5ing t5o conditions HhT is the height o% a tree T+ nT is the number o% nodes in TI&
Balance condition: c d 0 T B& hT ` c K log2 nT
Rebalancing condition: $% an ;insert; or ;delete; operation+ per%ormed on a tree T B+ yields a tree T; B+ it
must be possible to rebalance T; in time CHlog nI to yield a tree TE B.
")ample& almost complete trees
The class o% almost complete binary search trees satis%ies the balance condition but not the restructuring
condition. $n the 5orst case it takes time CHnI to restructure such a binary search tree H")hibit 21.20I+ and i% ;insert;
and ;delete; are de%ined to include any rebalancing that may be necessary+ these operations cannot be guaranteed to
run in time CHlog nI.
21. 4ist structures
")hibit 21.20& 8estructuring& 5orst case
$n the ne)t t5o sections 5e present several classes o% balanced trees that meet both conditions& the height#
balanced or AFL#trees H,. Adel;son#Fel;skii and ". Landis+ 19>2I NAL >2O and various multi5ay trees+ such as B#
trees NB' 72+ Com 79O and their generali6ation+ Ha+bI#trees N'eh D<aO.
AFL#trees+ 5ith their small nodes that hold a single data element+ are used primarily %or storing data in main
memory. 'ulti5ay trees+ 5ith potentially large nodes that hold many elements+ are also use%ul %or organi6ing data
on secondary storage devices+ such as disks+ that allo5 direct access to si6able physical data blocks. $n this case+ a
node is typically chosen to %ill a physical data block+ 5hich is read or 5ritten in one access operation.
Height)balanced trees
Definition: A binary tree is height-balanced i%+ %or each node+ the heights o% its t5o subtrees di%%er by at most
one. Aeight#balanced search trees are also called AE/-trees. ")hibit 21.21 to ")hibit 21.23 sho5 various AFL#trees+
and one that is not.
")hibit 21.21& ")amples o% height#balanced trees
")hibit 21.22& ")ample o% a tree not height#balancedLthe marked node violates the balance condition.
A Emost#ske5edE AFL#tree Th is an AFL#tree o% height h 5ith a minimal number o% nodes. .tarting 5ith T0 and
T1 sho5n in ")hibit 21.23+ Th is obtained by attaching ThU1 and ThU2 as subtrees to a ne5 root.
22D
")hibit 21.23& 'ost ske5ed AFL trees o% heights h V 0 through h V <
The number o% nodes in a most#ske5ed AFL#tree o% height h is given by the recurrence relation
nh V nhU1 ] nhU2 ] 1+ n0 V 1+ n1 V 2.
$n the section on recurrence relations in the chapter entitled RThe mathematics o% algorithm analysisS+ it has
been sho5n that the recurrence relation
mh V mhU1 ] mhU2+ m0 V 0+ m1 V 1
has the solution
.ince nh V mh]3 U 1 5e obtain
.ince
it %ollo5s that
and there%ore nh behaves asymptotically as
21. 4ist structures
Applying the logarithm results in
There%ore+ the height o% a 5orst#case AFL#tree 5ith n nodes is about 1.<< K log2 n. Thus the class o% AFL#trees
satis%ies the balance condition+ and the ;member; operation can al5ays be per%ormed in time CHlog nI.
:e no5 sho5 that the class o% AFL#trees also satis%ies the rebalancing condition. Thus AFL#trees support
insertion and deletion in time CHlog nI. "ach node ! o% an AFL#tree has one o% the balance properties Q Hle%t#
leaningI+ j Hright#leaningI+ or U Hhori6ontalI+ depending on the relative height o% its t5o subtrees.
T5o local tree operations+ rotation and double rotation+ allo5 the restructuring o% height#balanced trees that
have been disturbed by an insertion or deletion. They split a tree into subtrees and rebuild it in a di%%erent 5ay.
")hibit 21.2< sho5s a node+ marked black+ that got out o% balance+ and ho5 a local trans%ormation builds an
e0uivalent tree H%or the same elements+ arranged in orderI that is balanced. "ach o% these trans%ormations has a
mirror image that is not sho5n. The algorithms %or insertion and deletion use these rebalancing operations as
described belo5.
")hibit 21.2<& T5o local rebalancing operations
$nsertion
A ne5 element is inserted as in the case o% a binary search tree. The balance condition o% the ne5 node becomes
U Hhori6ontalI. .tarting at the ne5 node+ 5e 5alk to5ard the root o% the tree+ passing along the message that the
height o% the subtree rooted at the current node has increased by one. At each node encountered along this path+ an
operation determined by the %ollo5ing rules is per%ormed. These rules depend on the balance condition o% the node
be%ore the ne5 element 5as inserted+ and on the direction %rom 5hich the node 5as entered Hi.e. %rom its le%t or
right childI.
230
Rule I1: $% the current node has balance condition U+ change it to Q or j depending on 5hether 5e entered %rom
the node;s le%t or %rom its right child. $% the current node is the root+ terminateL i% not+ continue to %ollo5 the path
up5ard.
Rule I: $% the current node has balance condition Q or j and is entered %rom the subtree that 5as previously
shorter+ change the balance condition toYand terminate Hthe height o% the subtree rooted at the current node has
not changedI.
Rule I!: $% the current node has balance condition Q or j and is entered %rom the subtree that 5as previously
taller+ the balance condition o% the current node is violated and gets restored as %ollo5s&
HaI $% the last t5o steps 5ere in the same direction Hboth %rom le%t children+ or both %rom right childrenI+ an
appropriate rotation restores all balances and the procedure terminates.
HbI $% the last t5o steps 5ere in opposite directions Hone %rom a le%t child+ the other %rom a right childI+ an
appropriate double rotation restores all balances and the procedure terminates.
The initial insertion travels along a path %rom the root to a lea%+ and the rebalancing process travels back up
along the same path. Thus the cost o% an insertion in an AFL#tree is CHhI+ or CHlog nI in the 5orst case. !ot ice that
an insertion calls %or at most one rotation or double rotation+ as sho5n in the e)ample in ")hibit 21.2@.
")ample
$nsert 1+ 2+ @+ 3+ <+ >+ 7 into an initially empty AFL#tree H")hibit 21.2@I. The balance condition o% a node is sho5n
belo5 it. Bold%aced nodes violate the balance condition.
21. 4ist structures
")hibit 21.2@& Trace o% consecutive insertions and the rebalancings they trigger
(eletion
An element is deleted as in the case o% a binary search tree. .tarting at the parent o% the deleted node+ 5alk
to5ards the root+ passing along the message that the height o% the subtree rooted at the current node has decreased
by one. At each node encountered+ per%orm an operation according to the %ollo5ing rules. These rules depend on
the balance condition o% the node be%ore the deletion and on the direction %rom 5hich the current node and its child
5ere entered.
Rule D1: $% the current node has balance condition U+ change it to j or Q depending on 5hether 5e entered %rom
the node;s le%t or %rom its right child+ and terminate Hthe height o% the subtree rooted at the current node has not
changedI.
Rule D: $% the current node has balance condition Q or j and is entered %rom the subtree that 5as previously
taller+ change the balance condition to U and continue up5ard+ passing along the message that the subtree rooted at
the current node has been shortened.
Rule D!: $% the current node has balance condition Q or j and is entered %rom the subtree that 5as previously
shorter+ the balance condition is violated at the current node. :e distinguish three subcases according to the
232
balance condition o% the other child o% the current node Hconsider also the mirror images o% the %ollo5ing
illustrationsI&
HaI
X Y
Z
a
b
X
Y
Z
b
a
rotation
An appropriate rotation restores the balance o% the current node 5ithout changing the height o% the subtree
rooted at this node. Terminate.
HbI
X
Y Z
b
a
X Y Z
a
b
rotation
A rotation restores the balance o% the current node. Continue up5ard+ passing along the message that the
subtree rooted at the current node has been shortened.
HcI
double rotation
W
a
b
c
a
b
c
X Y
Z W X Y Z
A double rotation restores the balance o% the current node. Continue up5ard+ passing along the message that
the subtree rooted at the current node has been shortened. .imilar trans%ormations apply i% either W or X+ but not
both+ are one level shorter than sho5n in this %igure. $% so+ the balance conditions o% some nodes di%%er %rom those
sho5n+ but this has no in%luence on the total height o% the subtree. $n contrast to insertion+ deletion may re0uire
more than one rotation or double rotation to restore all balances. .ince the cost o% a rotation or double rotation is
constant+ the 5orst#case cost %or rebalancing the tree depends only on the height o% the tree+ and thus the cost o% a
deletion in an AFL#tree is CHlog nI in the 5orst case.
'ulti5ay trees
!odes in a multi5ay tree may have a variable number o% children. As 5e are interested in balanced trees+ 5e add
t5o restrictions. /irst+ 5e insist that all leaves Hthe nodes 5ithout childrenI occur at the same depth. .econd+ 5e
constrain the number o% children o% all internal nodes by a lo5er bound a and an upper bound b. 'any varieties o%
multi5ay trees are kno5nL they di%%er in details+ but all are based on similar ideas. /or e)ample+ H2+3I#trees are
de%ined by the re0uirement that all internal nodes have either t5o or three children. :e generali6e this concept and
discuss Ha+bI#trees.
Definition: Consider a domain W on 5hich a total order ` is de%ined. Let a and b be integers 5ith 2 ` a and 2 K
a U 1 ` b. Let cH!I denote the number o% children o% node !. An 4a$b1-tree is an ordered tree 5ith the %ollo5ing
properties&
21. 4ist structures
All leaves are at the same level
2 ` cHrootI ` b
/or all internal nodes ! e)cept the root+ a ` cH!I ` b
A node 5ith k children contains k U 1 elements )1

e )2

e [ e )kU1 dra5n %rom WL the subtrees corresponding to
the k children are denoted by T1+ T2+ [ + Tk. An Ha+bI#tree supports EcH!I searchE in the same 5ay that a binary tree
supports binary search+ thanks to the %ollo5ing order condition&
y ` )i %or all elements y stored in subtrees T1+ [ + Ti
)i e 6 %or all elements 6 stored in subtrees Ti]1+ [ + Tk
Definition: Ha+bI#trees 5ith b V 2 K a U 1 are kno5n as 5-trees NB' 72+ Com 79O.
The algorithms 5e discuss operate on internal nodes+ sho5n in 5hite in ")hibit 21.2>+ and ignore the leaves+
sho5n in black. /or the purpose o% understanding search and update algorithms+ leaves can be considered %ictitious
entities used only %or counting. $n practice+ ho5ever+ things are di%%erent. The internal nodes merely constitute a
directory to a %ile that is stored in the leaves. A lea% is typically a physical storage unit+ such as a disk block+ that
holds all the records 5hose key values lie bet5een t5o Had3acentI elements stored in internal nodes.
")hibit 21.2>& ")ample o% a H3+@I#tree
The number n o% elements stored in the internal nodes o% an Ha+bI#tree o% height h is bounded by
and thus
this sho5s that the class o% Ha+bI#trees satis%ies the balance condition h V CHlog nI. :e sho5 that this class also
meets the rebalancing condition+ namely+ that Ha+bI#trees support insertion and deletion in time CHlog nI.
$nsertion
$nsertion o% a ne5 element ) begins 5ith a search %or ) that terminates unsuccess%ully at a lea%. Let ! be the
parent node o% this lea%. $% ! contained %e5er than b U 1 elements be%ore the insertion+ insert ) into ! and
terminate. $% ! 5as %ull+ 5e imagine b elements temporarily s0uee6ed into the over%lo5ing node !. Let m be the
median o% these b elements+ and use m to split ! into t5o& a le%t node !L populated by the Hb U 1I Q 2 elements
smaller than m+ and a right node !
8
populated by the Hb U 1I Q 2 elements larger than m. The condition 2 K a U 1 ` b
ensures that Hb U 1I Q2 Z a U 1+ in other 5ords+ that each o% the t5o ne5 nodes contains at least a U 1 elements.
The median element m is pushed up5ard into the parent node+ 5here it serves as a separator bet5een the t5o
ne5 nodes !L and !8 that no5 take the place %ormerly inhabited by !. Thus the problem o% insertion into a node at
a given level is replaced by the same problem one level higher in the tree. The ne5 separator element may be
absorbed in a non%ull parent+ but i% the parent over%lo5s+ the splitting process described is repeated recursively. At
23<
5orst+ the splitting process propagates to the root o% the tree+ 5here a ne5 root that contains only the median
element is created. Ha+bI#trees gro5 at the root+ and this is the reason %or allo5ing the root to have as %e5 as t5o
children.
(eletion
(eletion o% an element ) begins by searching %or it. As in the case o% binary search trees+ deletion is easiest at the
bottom o% the tree+ at a node o% ma)imal depth 5hose children are leaves. $% ) is %ound at a higher level o% the tree+
in a node that has internal nodes as children+ ) is the separator bet5een t5o subtrees TL and T8. :e replace ) by
another element 6+ either the largest element in TL or the smallest element in T8+ both o% 5hich are stored in a node
at the bottom o% the tree. A%ter this e)change+ the problem is reduced to deleting an element 6 %rom a node ! at the
deepest level.
$% deletion Ho% ) or 6I leaves ! 5ith at least a U 1 elements+ 5e are done. $% not+ 5e try to restore !;s occupancy
condition by stealing an element %rom an ad3acent sibling node '. $% there is no sibling ' that can spare an
element+ that is+ i% ' is minimally occupied+ ' and ! are merged into a single node L. L contains the a U 2 elements
o% !+ the a U 1 elements o% '+ and the separator bet5een ' and ! 5hich 5as stored in their parent node+ %or a total
o% 2 K Ha U 1I ` b U 1 elements. .ince the parent Ho% the old nodes ' and !+ and o% the ne5 node LI lost an element in
this merger+ the parent may under%lo5. As in the case o% insertion+ this under%lo5 can propagate to the root and
may cause its deletion. Thus Ha+bI#trees gro5 and shrink at the root.
Both insertion and deletion 5ork along a single path %rom the root do5n to a l ea% and HpossiblyI back up. Thus
their time is bounded by CHhI+ or e0uivalently+ by CHlog nI& Ha+bI#trees can be rebalanced in logarithmic time.
9morti:ed cost. The per%ormance o% Ha+bI#trees is better than the 5orst#case analysis above suggests. $t can be
sho5n that the total cost o% an se,uence of s insertions and deletions into an initially empty Ha+bI#tree is linear in
the length s o% the se0uence& 5hereas the 5orst#case cost o% a single operation is CHlog nI+ the amorti&ed cost per
operation is CH1I N'eh D<aO. Amorti6ed cost is a comple)ity measure that involves both an average and a 5orst#case
consideration. The average is taken over all operations in a se0uenceL the 5orst case is taken over all se0uences.
Although any one operation may take time CHlog nI+ 5e are guaranteed that the total o% all s operations in any
se0uence o% length s can be done in time CHsI+ as i% each single operation 5ere done in time CH1I.
")hibit 21.27& A slightly ske5ed H3+@I#tree.
")ercise& insertion and deletion in a H3+@I#tree
.tarting 5ith the H3+@I#tree sho5n in ")hibit 21.27+ per%orm the se0uence o% operations& insert 3D+ delete 10+
delete 12+ delete @0. (ra5 the tree a%ter each operation.
.olution
$nserting 3D causes a lea% and its parent to split H")hibit 21.2DI. (eleting 10 causes under%lo5+ remedied by
borro5ing an element %rom the le%t sibling H")hibit 21.29I. (eleting 12 causes under%lo5 in both a lea% and its
parent+ remedied by merging H")hibit 21.30I. (eleting @0 causes merging at the lea% level and borro5ing at the
parent level H")hibit 21.31I.
21. 4ist structures
")hibit 21.2D& !ode splits propagate to5ards the root
")hibit 21.29& A deletion is absorbed by borro5ing
")hibit 21.30& Another deletion propagates node merges to5ards the root
")hibit 21.31& !ode merges and borro5ing combined
43$F1-trees are the special case a V 2+ b V 3& each node has t5o or three children. ")hibit 21.32 omits the leaves.
.tarting 5ith the tree in state 1 5e insert the value 9& the rightmost node at the bottom level over%lo5s and splits+
the median D moves up into the parent. The parent also over%lo5s+ and the median > generates a ne5 root Hstate 2I.
The deletion o% 1 is absorbed 5ithout any rebalancing Hstate 3I. The deletion o% 2 causes a node to under%lo5+
remedied by stealing an element %rom a sibling& 2 is replaced by 3 and 3 is replaced by < Hstate <I. The deletion o% 3
23>
triggers the merger o% the nodes assigned to 3 and @L this causes an under%lo5 in their parent+ 5hich in turn
propagates to the root and results in a tree o% reduced height Hstate @I.
")hibit 21.32& Tracing insertions and deletions in a H2+3I#tree
As mentioned earlier+ multi5ay trees are particularly use%ul %or managing data on a disk. $% each node is
allocated to its o5n disk block+ searching %or a record triggers as many disk accesses as there are levels in the tree.
The depth o% the tree is minimi6ed i% the ma)imal %an#out b is ma)imi6ed. :e can pack more elements into a node
by shrinking their si6e. As the records to be stored are normally much larger than their identi%ying keys+ 5e store
keys only in the internal nodes and store entire records in the leaves H5hich 5e had considered to be empty until
no5I. Thus the internal nodes serve as an inde) that assigns to a key value the path to the corresponding lea%.
1. (esign and implement a list structure %or storing a sparse matri). Xour implementation should provide
procedures %or inserting+ deleting+ changing+ and reading matri) elements.
2. $mplement a %i%o 0ueue by a circular list using only one e)ternal pointer % and a sentinel. % al5ays points to
the sentinel and provides access to the head and tail o% the 0ueue.
3. $mplement a double#ended 0ueue Hde0ueI by a doubly linked list.
<. 5inar search trees and sorting A binary search tree given by the %ollo5ing declarations is used to manage
a set o% integers&
type nptr . _node
node . record 8, 5- nptr& x- integer end&
var root- nptr&
The empty tree is represented as root V nil.
HaI (ra5 the result o% inserting the se0uence >+ 1@+ <+ 2+ 7+ 12+ @+ 1D into the empty tree.
21. 4ist structures
HbI :rite a procedure smallestHvar )& integerIL 5hich returns the smallest number stored in the tree+ and a
procedure remove smallestL 5hich deletes it. $% the tree is empty both procedures should call a
procedure messageH;tree is empty;IL
HcI :rite a procedure sortL that sorts the numbers stored in var a& arrayN1 .. nO o% integerL by inserting the
numbers into a binary search tree+ then 5riting them back to the array in sorted order as it traverses
the tree.
HdI Analy6e the asymptotic time comple)ity o% ;sort; in a typical and in the 5orst case.
HeI (oes this approach lead to a sorting algorithm o% time comple)ity ( )
@. ")tend the implementation o% a dictionary as a binary search tree in the RBinary search treesS section to
support the operations ;succ; and ;pred; as de%ined in chapter 19 in the section R(ictionaryS.
>. Insertion and deletion in AE/-trees: .tarting 5ith an empty AFL#tree+ insert 1+ 2+ @+ >+ 7+ D+ 9+ 3+ <+ in this
order. (ra5 the AFL#tree a%ter each insertion. !o5 delete all elements in the opposite order o% insertion
Hi.e. in last#in#%irst#out orderI. (oes the AFL#tree go through the same states as during insertion but in
reverse order?
7. $mplement an AFL#tree supporting the dictionary operations ;insert;+ ;delete;+ ;member;+ ;pred;+ and ;succ;.
D. ")plain ho5 to %ind the smallest element in an Ha+bI#tree and ho5 to %ind the predecessor o% a given
element in an Ha+bI#tree.
9. $mplement a dictionary as a B#tree.
23D
--& Address computation
hashing
per%ect hashing
collision resolution methods& separate chaining+ coalesced chaining+ open addressing Hlinear probing and
double hashingI
deletions degrade per%ormance o% a hash table
2er%ormance does not depend on the number o% data elements stored but on the load %actor o% the hash table.
randomi6ation& trans%orm unkno5n distribution into a uni%orm distribution
")tendible hashing uses a radi) tree to adapt the address range dynamically to the contents to be storedL
deletions do not degrade per%ormance.
order#preserving e)tendible hashing
oncepts and terminology
The term address computation Halso hashing+ hash coding+ scatter storage+ or ke-to-address transformationsI
re%ers to many search techni0ues that aim to assign an address o% a storage cell to any ke value ) by means o% a
%ormula that depends on ) only. Assigning an address to ) independently o% the presence or absence o% other key
values leads to %aster access than is possible 5ith the comparative search techni0ues discussed in earlier chapters.
Although this goal cannot al5ays be achieved+ address computation does provide the %astest access possible in
many practical situations.
:e use the %ollo5ing concepts and terminology H")hibit 22.1I. The home address a o% ) is obtained by means o%
a hash function h that maps the ke domain W into the address space A Ni.e. a V hH)IO. The address range is A V a0+
1+ [ + m U 1b+ 5here m is the number o% storage cells available. The storage cells are represented by an array TN0 .. m
U 1O+ the hash tableL TNaO is the cell addressed by a A. TNhH)IO is the cell 5here an element 5ith key value ) is
preferentiall stored+ but alas+ not necessarily.
")hibit 22.1& The hash %unction h maps a Htypically largeI key domain W into a Hmuch smallerI
address space A.
22. Address computation
"ach cell has a capacity o% b d 0 elementsL b stands %or bucket capacit. The number n o% elements to be stored
is there%ore bounded by m K b. T5o cases are use%ully distinguished+ depending on 5hether the hash table resides on
disk or in central memory&
1. (isk or other secondary storage device& Considerations o% e%%iciency suggest that a bucket be identi%ied 5ith
a physical unit o% trans%er+ typically a disk block. .uch a unit is usually large compared to the si6e o% an
element+ and thus b d 1.
2. 'ain memory& Cell si6e is less important+ but the code is simplest i% a cell can hold e)actly one element Hi.e.
b V 1I.
/or simplicity o% e)position 5e assume that b V 1 unless other5ise statedL the generali6ation to arbitrary b is
straight%or5ard.
The key domain W is normally much larger than the number n o% elements to be stored and the number m o%
available cells TNaO. /or e)ample+ a table used %or storing a %e5 thousand identi%iers might have as its key domain
the set o% strings o% length at most 10 over the alphabet a;a;+ ;b;+ [ + ;6;+ ;0;+ [ + ;9;bL its cardinality is close to 3>
10
.
Thus in general the %unction h is many#to#one& (i%%erent key values map to the same address.
The content to be stored is a sample %rom the key domain& $t is not under the programmer;s control and is
usually not even kno5n 5hen the hash %unction and table si6e are chosen. Thus 5e must e)pect collisions+ that is+
events 5here more than b elements to be stored are assigned the same address. 0ollision resolution methods are
designed to handle this case by storing some o% the colliding elements else5here. The more collisions that occur+ the
longer the search time. .ince the number o% collisions is a random event+ the search time is a random variable.
Aash tables are kno5n %or e)cellent average per%ormance and %or terrible 5orst#case per%ormance+ 5hich+ one
hopes+ 5ill never occur.
Address computation techni0ues support the operations ;%ind; and ;insert; Hand to a lesser e)tent also ;delete;I in
e)pected time CH1I. This is a remarkable di%%erence %rom all other data structures that 5e have discussed so %ar+ in
that the average time comple)ity does not depend on the number n o% elements stored+ but on the load factor V
n Q Hm K bI+ or+ %or the special case b V 1& V n Q m. !ote that 0 ` ` 1.
Be%ore 5e consider the typical case o% a hash table+ 5e illustrate these concepts in t5o special cases 5here
everything is simpleL these represent ideals rarely attainable.
The special case of small (ey domains
$% the number o% possible key values is less than or e0ual to the number o% available storage cells+ h can map W
one#to#one into or onto A. "verything is simple and e%%icient because collisions never occur. Consider the %ollo5ing
e)ample&
W V a;a;+ ;b;+ [ + ;6;b+ A V a0+ [ + 2@b
hH)I V ordH)I U ordH;a;IL that is+
hH;a;I V 0+ hH;b;I V 1+ hH;c;I V 2+ [ + hH;6;I V 2@.
.ince h is one#to#one+ each key value ) is implied by its address hH)I. Thus 5e need not store the key values
e)plicitly+ as a single bit Hpresent Q absentI su%%ices&
var T- arrayF0 .. $(G of oolean&
function memer*x+- oolean&
egin return*TFh*x+G+ end&
2<0
procedure insert*x+&
egin TFh*x+G -. true end&
procedure delete*x+&
egin TFh*x+G -. false end&
The idea o% collision#%ree address computation can be e)tended to large key domains through a combination o%
address computation and list processing techni0ues+ as 5e 5ill see in the chapter E'etric data structuresE.
The special case of perfect hashing# table contents (no*n a priori
Certain common applications re0uire storing a set o% elements that never changes. The set o% reserved 5ords o% a
programming language is an e)ampleL 5hen the le)ical analy6er o% a compiler e)tracts an identi%ier+ the %irst issue
to be determined is 5hether this is a reserved 5ord such as ;begin; or ;5hile;+ or 5hether it is programmer de%ined.
The special case 5here the table contents are kno5n a priori+ and no insertions or deletions occur+ is handled more
e%%iciently by special#purpose data structures than by a general dictionary.
$% the elements )1+ )2+ [ + )n to be stored are kno5n be%ore the hash table is designed+ the underlying key domain
is not as important as the set o% actually occurring key values. :e can usually %ind a table si6e m+ not much larger
than the number n o% elements to be stored+ and an easily evaluated hash %unction h that assigns to each )i a uni0ue
address %rom the address space a0+ [ + m U 1b. $t takes some trial and error to %ind such a perfect hash function h
%or a given set o% elements+ but the bene%it o% avoiding collisions is 5ell 5orth the e%%ort Ythe code that implements a
collision#%ree hash table is simple and %ast. A per%ect hash %unction 5orks %or a static table onlyYa single insertion+
a%ter h has been chosen+ is likely to cause a collision and destroy the simplicity o% the concept and e%%iciency o% the
implementation. 2er%ect hash %unctions should be generated automatically by a program.
The %ollo5ing unrealistically small e)ample illustrates typical approaches to designing a per%ect hash table. The
task gets harder as the number m o% available storage cells is reduced to5ard the minimum possible+ that is+ the
number n o% elements to be stored.
")ample
$n designing a per%ect hash table %or the elements 17+ 20+ 2<+ 3D+ and @1+ 5e look %or arithmetic patterns. These
are most easily detected by considering the binary representations o% the numbers to be stored&
( # % $ " 0 it position
") 0 " 0 0 0 "
$0 0 " 0 " 0 0
$# 0 " " 0 0 0
%2 " 0 0 " " 0
(" " " 0 0 " "
:e observe that the least signi%icant three bits identi%y each element uni0uely. There%ore+ the hash %unction hH)I
V ) mod D maps these %ive elements collision#%ree into the address space A V a0+ [ + >b+ 5ith m V 7 and t5o empty
cells. An attempt to %urther economi6e space leads us to observe that the bits in positions 1+ 2+ and 3+ 5ith 5eights 2+
<+ and D in the binary number representation+ also identi%y each element uni0uely+ 5hile ranging over the address
space o% minimal si6e A V a0+ [ + <b. The %unction hH)I V H) div 2I mod D e)tracts these three bits and assigns the
%ollo5ing addresses&
B- ") $0 $# %2 ("
O- 0 $ # % "
A per%ect hash table has to store each element e)plicitly+ not 3ust a bit HpresentQabsentI. $n the e)ample above+
the elements 0+ 1+ 1>+ 17+ 32+ 33+ [ all map into address 0+ but only 17 is present in the table. The access %unction
;memberH)I; is implemented as a single statement&
return **h*x+ J #+ cand *TFh*x+G . x++&
The boolean operator ;cand; used here is understood to be the conditional and& "valuation o% the e)pression
proceeds %rom le%t to right and stops as soon as its value is determined. $n our e)ample+ hH)I d < su%%ices to assign
;%alse; to the e)pression HhH)I ` <I and HTNhH)IO V )I. Thus the ;cand; operator guarantees that the table declared as&
var T- arrayF0 .. #G of element&
is accessed 5ithin its inde) bounds.
/or table contents o% realistic si6e it is impractical to construct a per%ect hash %unction manuallyY5e need a
program to search e)haustively through the large space o% %unctions. The more slack m U n 5e allo5+ the denser is
the population o% per%ect %unctions and the 0uicker 5e 5ill %ind one. N'eh D<aO presents analytical results on the
comple)ity o% %inding per%ect hash %unctions.
")ercise& per%ect hash tables
(esign several per%ect hash tables %or the content a3+ 13+ @7+ 71+ D2+ 93b.
.olution
(esigning a per%ect hash table is like ans5ering a 0uestion o% the type& :hat is the ne)t element in the se0uence
1+ <+ 9+ [ ? There are in%initely many ans5ers+ but some are more elegant than others. Consider&
h 3 13 @7 71 D2 93 Address range
H) div 3I mod 7 1 < @ 2 > 3 N1 .. >O
) mod 13 3 0 @ > < 2 N0 .. >O
H) div <I mod D 0 3 > 1 < 7 N0 .. 7O
i% ) V 71 then < else ) mod 7 3 > 1 ; @ 2 N1 .. >O
onventional hash tables# collision resolution
$n contrast to the special cases discussed+ most applications o% address computation present the data structure
designer 5ith greater uncertainties and less %avorable conditions. Typically+ the underlying key domain is much
larger than the available address range+ and not much is kno5n about the elements to be stored. :e may have an
upper bound on n+ and 5e may kno5 the probability distribution that governs the random sample o% elements to be
stored. $n setting up a customer list %or a local business+ %or e)ample+ the number o% customers may be bounded by
the population o% the to5n+ and the distribution o% last names can be obtained %rom the telephone directoryYmany
names 5ill start 5ith A and .+ hardly any 5ith = and X. Cn the basis o% such in%ormation+ but in ignorance o% the
actual table contents to be stored+ 5e must choose the si6e m o% the hash table and design the hash %unction h that
maps the key domain W into the address space AV a0+ [ + m U 1b. :e 5ill then have to live 5ith the conse0uences o%
these decisions+ at least until 5e decide to rehash& that is+ resi6e the table+ redesign the hash %unction+ and reinsert
all the elements that 5e have stored so %ar.
Later sections present some pragmatic advice on the choice o% hL %or no5+ let us assume that an appropriate hash
%unction is available. 8egardless o% ho5 smart a hash %unction 5e have designed+ collisions Hmore than b elements
share the same home address o% a bucket o% capacity bI are inevitable in practice. Thus hashing re0uires techni0ues
2<2
%or handling collisions. :e present the three ma3or collision resolution techni0ues in use& separate chaining+
coalesced chaining+ and open addressing. The t5o techni0ues called chaining call upon list processing techni0ues to
organi6e over%lo5ing elements. Separate chaining is used 5hen these lists live in an over%lo5 area distinct %rom the
hash table properL coalesced chaining 5hen the lists live in unused parts o% the table. "pen addressing uses
address computation to organi6e over%lo5ing elements. "ach o% these three techni0ues comes in di%%erent
variationsL 5e illustrate one typical choice.
.eparate chaining
The memory allocated to the table is split into a primar and an overflow area. Any over%lo5ing cell or bucket in
the primary area is the head o% a list+ called the overflow chain+ that holds all elements that over%lo5 %rom that
bucket. ")hibit 22.2 sho5s si) elements inserted in the order )1+ )2+ [ . The %irst arrival resides at its home addressL
later ones get appended to the over%lo5 chain.
")hibit 22.2& .eparate chaining handles collisions in a separate over%lo5
area.
.eparate chaining is easy to understand& insert+ delete+ and search operations are simple. $n contrast to other
collision handling techni0ues+ this hybrid bet5een address computation and list processing has t5o ma3or
advantages& H1I deletions do not degrade the per%ormance o% the hash table+ and H2I regardless o% the number m o%
home addresses+ the hash table 5ill not over%lo5 until the entire memory is e)hausted. The si6e m o% the table has a
critical in%luence on the per%ormance. $% m n n+ over%lo5 chains are long and 5e have essentially a list processing
techni0ue that does not support direct access. $% m p n+ over%lo5 chains are short but 5e 5aste space in the table.
"ven %or the practical choice m o n+ separate chaining has some disadvantages&
T5o di%%erent accessing techni0ues are re0uired.
2ointers take up spaceL this may be a signi%icant overhead %or small elements.
'emory is partitioned into t5o separate areas that do not share space& $% the over%lo5 area is %ull+ the entire
table is %ull+ even i% there is still space in the array o% home cells. This consideration leads to the ne)t
techni0ue.
Coalesced chaining
The chains that emanate %rom over%lo5ing buckets are stored in the empty space in the hash table rather than in
a separate over%lo5 area H")hibit 22.3I. This has the advantage that all available space is utili6ed %ully He)cept %or
the overhead o% the pointersI. Ao5ever+ managing the space shared bet5een the t5o accessing techni0ues gets
complicated.
")hibit 22.3& Coalesced chaining handles collisions by building lists that share memory 5ith the hash
table.
The ne)t techni0ue has similar advantages Hin addition+ it incurs no overhead %or pointersI and disadvantagesL
all things considered+ it is probably the best collision resolution techni0ue.
Cpen addressing
Assign to each element ) W a probe se,uence a0

V hH)I+ a1+ a2+ [ o% addresses that %ills the entire address range
A. The intention is to store ) pre%erentially at a0+ but i% TNa0O is occupied then at a1+ and so on+ until the %irst empty
cell is encountered along the probe se0uence. The occupied cells along the probe se0uence are called the collision
path o% )Ynote that the collision path is a pre%i) o% the probe se0uence. $% 5e en%orce the invariant&
$% ) is in the table at TNaO and i% i precedes a in the probe se0uence %or )+ then TNiO is occupied. The %ollo5ing %ast
and simple loop that travels along the collision path can be used to search %or )&
a -. h*x+&
while TFaG T x and TFaG T empty do
a -. *next address in proe seAuence+&
Let us 5ork out the details so that this loop terminates correctly and the code is as concise and %ast as 5e can
make it.
The probe se0uence is de%ined by %ormulas in the program Han e)ample o% an implicit data structureI rather than
by pointers in the data as is the case in coalesced chaining.
")ample& linear probing
ai]1

V Hai

] 1I mod m is the simplest possible %ormula. $ts only disadvantage is a phenomenon called clustering.
Clustering arises 5hen the collision paths o% many elements in the table overlap to a large e)tent+ as is likely to
happen in linear probing. Cnce elements have collided+ linear probing 5ill store them in consecutive cells. All
elements that hash into this block o% contiguous occupied cells travel along the same collision path+ thus
2<<
lengthening this blockL this in turn increases the probability that %uture elements 5ill hash into this block. Cnce this
positive %eedback loop gets started+ the cluster keeps gro5ing.
%ouble hashing is a special type o% open addressing designed to alleviate the clustering problem by letting
di%%erent elements travel 5ith steps o% di%%erent si6e. The probe se0uence is de%ined by the %ormulas
a0 V hH)I+ V gH)I d 0+ ai]1 V Hai

] I mod m+ m prime
g is a second hash %unction that maps the key space W into N1 .. m U 1O.
T5o important important details must be solved&
The probe se0uence o% each element must span the entire address range A. This is achieved i% m is relatively
prime to every step si6e + and the easiest 5ay to guarantee this condition is to choose m prime.
The termination condition o% the search loop above is& TNaO V ) or TNaO V empty. An unsuccess%ul search H)
not in the tableI can terminate only i% an address a is generated 5ith TNaO V empty. :e have already insisted
that each probe se0uence generates all addresses in A. $n addition+ 5e must guarantee that the table
contains at least one empty cell at all timesYthis serves as a sentinel to terminate the search loop.
The %ollo5ing declarations and procedures implement double hashing. :e assume that the comparison
operators V and f are de%ined on W+ and that W contains a special value ;empty;+ 5hich di%%ers %rom all values to be
stored in the table. /or e)ample+ a string o% blanks might denote ;empty; in a table o% identi%iers. :e choose to
identi%y an unsuccess%ul search by simply returning the address o% an empty cell.
const m . C & { si)e of hash table - must be prime. }
empty . C &
type key . C & addr . 0 .. m 0 "& step . " .. m 0 "&
var T- arrayFaddrG of key&
n- integer& { number of elements currently stored in , }
function h*x- key+- addr& { hash function for home address }
function g*x- key+- step& { hash function for step }
procedure init&
var a- addr&
egin
n -. 0&
for a -. 0 to m 0 " do TFaG -. empty
end&
function find*x- key+- addr&
var a- addr& d- step&
egin
a -. h*x+& d -. g*x+&
while *TFaG T x+ and *TFaG T empty+ do a -. *a 9 d+ mod m&
return*a+
end&
function insert*x- key+- addr&
var a- addr& d- step&
egin
a -. h*x+& d -. g*x+&
while TFaG T empty do egin
if TFaG . x then return*a+&
a -. *a 9 d+ mod m
end&
if n S m 0 " then { n -. n 9 "& TFaG -. x } else errM
msg*,tale is full,+&
Algorithms and Data Structures 2<@ A ,lobal Te)t
return*a+
end&
(eletion o% elements creates problems+ as is the case in many types o% hash tables. An element to be deleted
cannot simply be replaced by ;empty;+ or else it might break the collision paths o% other elements still in the tableY
recall the basic invariant on 5hich the correctness o% open addressing is based. The idea o% rearranging elements in
the table so as to re%ill a cell that 5as emptied but needs to remain %ull is 0uickly abandoned as too complicatedYi%
deletions are numerous+ the programmer ought to choose a data structure that %ully supports deletions+ such as
balanced trees implemented as list structures. A limited number o% deletions can be accommodated in an open
address hash table by using the %ollo5ing techni0ue.
At any time+ a cell is in one o% three states&
empty H5as never occupied+ the initial state o% all cellsI
occupied HcurrentlyI
deleted Hused to be occupied but is currently %reeI
A cell in state ;empty; terminates the %ind loopL a cell in state ;empty; or in state ;deleted; terminates the insert
loop. The state diagram sho5n in ")hibit 22.< describes the transitions possible in the li%etime o% a cell. (eletions
degrade the per%ormance o% a hash table+ because a cell+ once occupied+ never returns to the virgin state ;empty;
5hich alone terminates an unsuccess%ul %ind. "ven i% an e0ual number o% insertions and deletions keeps a hash table
at a lo5 load %actor + unsuccess%ul %inds 5ill ultimately scan the entire table+ as all cells dri%t into one o% the states
;occupied; or ;deleted;. Be%ore this occurs+ the table ought to be rehashedL that is+ the contents are inserted into a
ne5+ initially empty table.
")hibit 22.<& This state diagram describes possible li%e cycles o% a cell& Cnce occupied+ a cell
5ill never again be as use%ul as an empty cell.
")ercise& hash table 5ith deletions
'odi%y the program above to implement double hashing 5ith deletions.
hoice of hash function# randomi,ation
$n conventional terminology+ hashing is based on the concept o% randomi:ation. The purpose o% randomi6ing
is to trans%orm an unknown distribution over the key domain W into a uni%orm distribution+ and to turn
consecutive samples that may be dependent into independent samples. This task appears to call %or magic+ and
indeed+ there is little or no mathematics that applies to the construction o% hash %unctionsL but there are
commonsense observations 5orth remembering. These observations are primarily Edon;tsE. They stem %rom
properties that sets o% elements 5e 5ish to store %re0uently possess+ and thus are based on some kno5ledge about
the populations to be stored. $% 5e assumed strictly nothing about these populations+ there 5ould be little to say
about hash %unctions& an order#preserving proportional mapping o% W into A 5ould be as good as any other
%unction. But in practice it is not+ as the %ollo5ing e)amples sho5.
2<>
1. A /ortran compiler might use a hash table to store the set o% identi%iers it encounters in a program being
compiled. The rules o% the language and human habits conspire to make this set a highly biased sample
%rom the set o% legal /ortran identi%iers. E*ample: $nteger variables begin 5ith $+ + -+ L+ '+ !L this
convention is likely to generate a cluster o% identi%iers that begin 5ith one o% these letters. E*ample:
.uccessive identi%iers encountered cannot be considered independent samples& $% W and X have occurred+
there is a higher chance %or 4 to %ollo5 than %or :8-A,. E*ample: /re0uently+ 5e see se0uences o%
identi%iers or statement numbers 5hose character codes %orm arithmetic progressions+ such as A1+ A2+ A3+
[ or 10+ 20+ 30+ [ .
2. All %ile systems re0uire or encourage the use o% naming conventions+ so that most %ile names begin or end
5ith one o% 3ust a %e5 pre%i)es or su%%i)es+ such as KKK..X.+ KKK.BA-+ KKK.CB. An individual user+ or a user
community+ is likely to generate additional conventions+ so that most %ile names might begin+ %or e)ample+
5ith the initials o% the names o% the people involved. The %iles that store this te)t+ %or e)ample+ are
structured according to ;part; and ;chapter;+ so 5e are currently in %ile 2@ C22. $n some directories+ %ile
names might be sorted alphabetically+ so i% they are inserted into a table in order+ 5e process a monotonic
se0uence.
The purpose o% a hash %unction is to break up all regularities that might be present in the set o% elements to
be stored. This is most reliably achieved by EhashingE the elements+ a 5ord %or 5hich the dictionary o%%ers
the %ollo5ing e)planations& H1I %rom the /rench hache+ Ebattle#a)EL H2I to chop into small piecesL H3I to
con%use+ to muddle. Thus+ to appro)imate the elusive goal o% randomi6ation+ a hash %unction destroys
patterns+ including+ un%ortunately+ the order e de%ined on W. Aashing typically proceeds in t5o steps.
1. Convert the element ) into a number PH)I. $n most cases PH)I is an integer+ occasionally+ it is a real
number 0 ` PH)I e 1. :henever possible+ this conversion o% ) into PH)I involves no action at all& The
representation o% )+ 5hatever type ) may be+ is reinterpreted as the representation o% the number PH)I.
:hen ) is a variable#length item+ %or e)ample a string+ the representation o% ) is partitioned into pieces
o% suitable length that are E%oldedE on top o% each other. /or e)ample+ the %our#letter 5ord ) V ;hash; is
encoded one letter per byte using the 7#bit A.C$$ code and a leading 0 as 01101000 01100001 01110011
01101000. $t may be %olded to %orm a 1>#bit integer by e)clusive#or o% the leading pair o% bytes 5ith the
trailing pair o% bytes&
0110100001100001
)or 01110011011010000
0001101100001001 5hich represents PH)I V 27 K 2
D
] 9 V >921.
.uch %olding+ by itsel%+ is not hashing. 2atterns in the representation o% elements easily survive %olding.
/or e)ample+ the leading 0 5e have used to pad the 7#bit A.C$$ code to an D#bit byte remains a 6ero
regardless o% ). $% 5e had padded 5ith a trailing 6ero+ all PH)I 5ould be even. Because PH)I o%ten has the
same representation as )+ or a closely related one+ 5e drop PHI and use ) slightly ambiguously to denote
both the original element and its interpretation as a number.
2. .cramble ) Nmore precisely+ PH)IO to obtain hH)I. Any scrambling techni0ue is a sensible try+ as long as
it avoids %airly obvious pit%alls. 8ules o% thumb&
"ach bit o% an address hH)I should depend on all bits o% the key value ). $n particular+ don;t ignore
any part o% ) in computing hH)I. Thus hH)I V ) mod 2
13
is suspect+ as only the least signi%icant 13 bits
o% ) a%%ect hH)I.
'ake sure that arithmetic progressions such as Ch1+ Ch2+ Ch3+ [ get broken up rather than being
mapped into arithmetic progressions. Thus hH)I V ) mod k+ 5here k is signi%icantly smaller than the
table si6e m+ is suspect.
Avoid any %unction that cannot produce a uni%orm distribution o% addresses. Thus hH)I V )
2
is
suspectL i% ) is uni%ormly distributed in N0+ 1O+ the distribution o% )
2
is highly ske5ed.
A hash %unction must be %ast and simple. All o% the desiderata above are obtained by a hash %unction o% the type&
hH)I V ) mod m
5here m is the table si6e and a prime number+ and ) is the key value interpreted as an integer.
!o hash %unction is guaranteed to avoid the 5orst case o% hashing+ namely+ that all elements to be stored collide
on one address Hthis happens here i% 5e store only multiples o% the prime mI. Thus a hash %unction must be 3udged
in relation to the data it is being asked to store+ and usually this is possible only a%ter one has begun using it.
Aashing provides a per%ect e)ample %or the in3unction that the programmer must think about the data+ analy6e its
statistical properties+ and adapt the program to the data i% necessary.
!erformance analysis
:e analy6e open addressing 5ithout deletions assuming that each address t
i
is chosen independently o% all
other addresses %rom a uni%orm distribution over A. This assumption is reasonable %or double hashing and leads to
the conclusion that the average cost %or a search operation in a hash table is CH1I i% 5e consider the load %actor to
be constant. :e analy6e the average number o% probes e)ecuted as a %unction o% in t5o cases& *HI %or an
unsuccess%ul search+ and .HI %or a success%ul search.
Let pi denote the probability o% using e*actl i probes in an unsuccess%ul search. This event occurs i% the %irst $ U
1 probes hit occupied cells+ and the i#th probe hits an empty cell& p
i
V
iU1
K H1 U I. Let 0i denote the probability that
at least i probes are used in an unsuccess%ul searchL this occurs i% the %irst i U 1 inspected cells are occupied& 0i

V
iU1
.
0i can also be e)pressed as the sum o% the probabilities that 5e probe e)actly 3 cells+ %or 3 running %rom i to m. Thus
5e obtain
The number o% probes e)ecuted in a success%ul search %or an element ) e0uals the number o% probes in an
unsuccess%ul search %or the same element ) be%ore it is inserted into the hash table. N!ote& This holds only 5hen
elements are never relocated or deletedO. Thus the average number o% probes needed to search %or the i#th element
inserted into the hash table is *HHi U 1I Q mI+ and .HI can be computed as the average o% *HI+ %or increasing in
discrete steps %rom 0 to . $t is a reasonable appro)imation to let vary continuously in the range %rom 0 to &
2<D
")hibit 22.@ suggests that a reasonable operating range %or a hash table keeps the load %actor bet5een 0.2@ and
0.7@. $% is much smaller+ 5e 5aste space+ i% it is larger than 7@ per cent+ 5e get into a domain 5here the
per%ormance degrades rapidly. !ote& $% all searches are success%ul+ a hash table per%orms 5ell even i% loaded up to
9@ per centYunsuccess%ul searching is the killerM
Table 22.1& The average number o% probes per search gro5s rapidly as the load %actor approaches 1.
h 0.2@ 0.@ 0.7@ 0.9 0.9@ 0.99
*HI 1.3 2.0 <.0 10.0 20.0 100.0
.HI 1.2 1.< 1.D 2.> 3.2 <.7
")hibit 22.@& The average number o% probes per search gro5s rapidly as the load %actor approaches 1.
Thus the hash table designer should be able to estimate n 5ithin a %actor o% 2Ynot an easy task. An incorrect
guess may 5aste memory or cause poor per%ormance+ even table over%lo5 %ollo5ed by a crash. $% the programmer
becomes a5are that the load %actor lies outside this range+ she may rehashYchange the si6e o% the table+ change the
hash %unction+ and reinsert all elements previously stored.
6/tendible hashing
$n contrast to standard hashing methods+ e)tendible %orms o% hashing allo5 %or the dynamic e)tension or
shrinkage o% the address range into 5hich the hash %unction maps the keys. This has t5o ma3or advantages& H1I
'emory is allocated only as needed Hit is unnecessary to determine the si6e o% the address range a prioriI+ and H2I
deletion o% elements does not degrade per%ormance. As the address range changes+ the hash %unction is changed in
such a 5ay that only a %e5 elements are assigned a ne5 address and need to be stored in a ne5 bucket. The idea that
makes this possible is to map the keys into a very large address space+ o% 5hich only a portion is active at any given
time.
Farious e)tendible hashing methods di%%er in the 5ay they represent and manage a smaller active address
range o% variable si6e that is a subrange o% a larger virtual address range. $n the %ollo5ing 5e describe the method
o% e)tendible hashing that is especially 5ell suited %or storing data on secondary storage devicesL in this case an
address points to a physical block o% secondary storage that can contain more than one element. An address is a bit
string o% ma)imum length kL ho5ever+ at any time only a pre%i) o% d bits is used. $% all bit strings o% length k are
represented by a so#called radi* tree o% height k+ the active part o% all bit strings is obtained by using only the upper
d levels o% the tree Hi.e. by cutting the tree at level dI. ")hibit 22.> sho5s an e)ample %or d V 3.
")hibit 22.>& Address space organi6ed as a binary radi) tree.
The radi) tree sho5n in ")hibit 22.> H5ithout the nodes that have been clippedI describes an active address
range 5ith addresses a00+ 010+ 011+ 1b that are considered as bit strings or binary numbers. To each active node
5ith address s there corresponds a bucket B that can store b records. $% a ne5 element has to be inserted into a %ull
bucket B+ then B is split& $nstead o% B 5e %ind t5o t5in buckets B0 and B1 5hich have a one bit longer address than B+
and the elements stored in B are distributed among B0 and B1 according to this bit. The ne5 radi) tree no5 has to
point to the t5o data buckets B0 and B1 instead o% BL that is+ the active address range must be e)tended locally Hby
moving the broken line in ")hibit 22.>I. $% the block 5ith address 00 over%lo5s+ t5o ne5 t5in blocks 5ith addresses
000 and 001 5ill be created 5hich are represented by the corresponding nodes in the tree. $% the over%lo5ing bucket
B has depth d+ then d is incremented by 1 and the radi) tree gro5s by one level.
$n e)tendible hashing the clipped radi) tree is represented by a directory that is implemented by an array. Let d
be the ma)imum number o% bits that are used in one o% the bit strings %or %orming an addressL in the e)ample above+
d V 3. Then the directory consists o% 2
d
entries. "ach entry in this directory corresponds to an address and points to
a physical data bucket 5hich contains all elements that have been assigned this address by the hash %unction h. The
directory %or the radi) tree in ")hibit 22.> looks as sho5n in ")hibit 22.7.
")hibit 22.7& The active address range o% the tree in ")hibit 22.> implemented as an array.
The bucket 5ith address 010 corresponds to a node on level 3 o% the radi) tree+ and there is only one entry in the
directory corresponding to this bucket. $% this bucket over%lo5s+ the directory and data buckets are reorgani6ed as
sho5n in ")hibit 22.D. T5o t5in buckets that 3ointly contain %e5er than b elements are merged into a single bucket.
This keeps the average bucket occupancy at a high 70 per cent even in the presence o% deletions+ as probabilistic
analysis predicts and simulation results con%irm. Bucket merging may lead to halving the directory. A %ormerly
large %ile that shrinks to a much smaller si6e 5ill have its directory shrink in proportion. Thus e)tendible hashing+
unlike conventional hashing+ su%%ers no permanent per%ormance degradation under deletions.
2@0
")hibit 22.D& An over%lo5ing bucket may trigger doubling o% the directory.
A virtual radi/ tree# order)preserving e/tendible hashing
Aashing+ in the usual sense o% the 5ord+ destroys structure and thus buys uni%ormity at the cost o% order.
")tendible hashing+ on the other hand+ is practical 5ithout randomi6ation and thus needs not accept its inevitable
conse0uence+ the destruction o% order. A uni%orm distribution o% elements is not nearly as important&
!onuni%ormity causes the directory to be deeper and thus larger than it 5ould be %or a uni%orm distribution+ but it
a%%ects neither access time nor bucket occupancy. And the directory is only a small space overhead on top o% the
space re0uired to store the data& $t typically contains only one or a %e5 pointers+ say a do6en bytes+ per data bucket
o%+ say 1k bytesL it adds perhaps a %e5 percent to the total space re0uirement o% the table+ so its gro5th is not
critical. Thus e)tendible hashing remains %easible 5hen the identity is used as the address computation %unction h+
in 5hich case data is accessible and can be processed se0uentially in the order ` de%ined on the domain W.
:hen h preserves order+ the 5ord hashing seems out o% place. $% the directory resides in central memory and the
data buckets on disk+ 5hat 5e are implementing is a virtual memory organi6ed in the %orm o% a radi) tree o%
unbounded si6e. $n contrast to conventional virtual memory+ 5hose address space gro5s only at one end+ this
address space can gro5 any5here& $t is a virtual radi) tree.
As an e)ample+ consider the domain W o% character strings up to length 32+ say+ and assume that elements to be
stored are sampled according to the distribution o% the %irst letter in "nglish 5ords. :e obtain an appro)imate
distribution by counting pages in a dictionary H")hibit 22.9I. "ncode the blank as 00000+ ;a; as 00001+ up to ;6; as
11011+ so that ;aah;+ %or e)ample+ has the code 00001 00001 01000 00000 [ H29 0uintuples o% 6eros pad ;aah;
to32lettersI. This address computation %unction h is almost an identity& $t maps a; ;+ ;a;+ [ + ;6;b
32
one#to#one into a0+
1b
1>0
. .uch an order#preserving address computation %unction supports many use%ul types o% operations& %or
e)ample+ range 0ueries such as Elist in alphabetic order all the 5ords stored %rom ;uni); to ;)inu; E.
")hibit 22.9& 8elative %re0uency o% 5ords beginning 5ith a given letter in :ebster;s dictionary.
$% there is one page o% 5ords starting 5ith W %or 1>0 pages o% 5ords starting 5ith .+ this suggests that i% our
active address space is partitioned into e0ually si6ed intervals+ some intervals may be populated 1>0 times more
densely than others. This translates into a directory that may be 1>0 times larger than necessary %or a uni%orm
distribution+ or+ since directories gro5 as po5ers o% 2+ may be 12D or 2@> times larger. This sounds like a lot but
may 5ell be bearable+ as the %ollo5ing estimates sho5.
Assume that 5e store 10
@
records on disk+ 5ith an average occupancy o% 100 records per bucket+ re0uiring about
1000 buckets. A uni%orm distribution generates a directory 5ith one entry per bucket+ %or a total o% 1k entries+ say
2k or <k bytes. The nonuni%orm distribution above re0uires the same number o% buckets+ about 1+000+ but
generates a directory o% 2@>k entries. $% a pointer re0uires 2 to < bytes+ this amounts to 0.@ to 1 'byte. This is less o%
a memory re0uirement than many applications re0uire on today;s personal computers. $% the application 5arrants
it He.g. %or an on#line reservation systemI 1 'byte o% memory is a small price to pay.
Thus 5e see that %or large data sets+ e)tendible hashing appro)imates the ideal characteristics o% the special case
5e discussed in this chapter;s section on Rthe special case o% small key domainsS. All it takes is a disk and a central
memory o% a si6e that is standard today but 5as practically in%easible a decade ago+ impossible t5o decades ago+ and
unthought o% three decades ago.
1. (esign a per%ect hash table %or the elements 1+ 10+ 1<+ 20+ 2@+ and 2>.
2. The si) names AL+ /L+ ,A+ !C+ .C and FA must be distinguished %rom all other ordered pairs o% uppercase
letters. To solve this problem+ these names are stored in the array T such that they can easily be %ound by
means o% a hash %unction h.
type addr . 0 .. )&
pair . record c", c$- ,O, .. ,H, end&
var T- array FaddrG of pair&
HaI :rite a
function h *name- pair+- adr&
5hich maps the si) names onto di%%erent addresses in the range ;adr;.
2@2
HbI :rite a
procedure initTale&
5hich initiali6es the entries o% the hash table T.
HcI :rite a
function memer *name- pair+- oolean&
5hich returns %or any pair o% uppercase letters 5hether it is stored in T.
3. Consider the hash %unction hH)I V ) mod 9 %or a table having nine entries. Collisions in this hash table are
resolved by coalesced chaining. (emonstrate the insertion o% the elements 1<+ 19+ 10+ >+ 11+ <2+ 21+ D+ and 1.
<. Consider inserting the keys 1<+ 1+ 19+ 10+ >+ 11+ <2+ 21+ D+ and 17 into a hash table o% length m V 13 using open
addressing 5ith the hash %unction hH)I V ) mod m. .ho5 the result o% inserting these elements using
HaI Linear probing.
HbI (ouble hashing 5ith the second hash %unction gH)I V 1 ] ) mod Hm]1I.
@. $mplement a dictionary supporting the operations ;insert;+ ;delete;+ and ;member; as a hash table 5ith
double hashing.
>. $mplement a dictionary supporting the operations ;insert;+ ;delete;+ ;member;+ ;succ;+ and ;pred; by order#
preserving e)tendible hashing.
-0& @etric data structures
organi6ing the embedding space versus organi6ing its contents
0uadtrees and octtrees. grid %ile. t5o#disk#access principle
simple geometric ob3ects and their parameter spaces
region 0ueries o% arbitrary shape
appro)imation o% comple) ob3ects by enclosing them in simple containers
:rgani,ing the embedding space versus organi,ing its contents
'ost o% the data structures discussed so %ar organi6e the set o% elements to be stored depending primarily+ or
even e)clusively+ on the relative values o% these elements to each other and perhaps on their order o% insertion into
the data structure. C%ten+ the only assumption made about these elements is that they are dra5n %rom an ordered
domain+ and thus these structures support only comparative search techni0ues& the search argument is compared
against stored elements. The shape o% data structures based on comparative search varies dynamically 5ith the set
o% elements currently storedL it does not depend on the static domain %rom 5hich these elements are samples. These
techni0ues organi6e the particular contents to be stored rather than the embedding space.
The data structures discussed in this chapter mirror and organi6e the domain %rom 5hich the elements are
dra5nYmuch o% their structure is determined be%ore the %irst element is ever inserted. This is typically done on the
basis o% %i)ed points o% re%erence 5hich are independent o% the current contents+ as inch marks on a measuring scale
are independent o% 5hat is being measured. /or this reason 5e call data structures that organi6e the embedding
space metric data structures. They are o% increasing importance+ in particular %or spatial data+ such as needed in
computer#aided design or geographic data processing. Typically+ these domains e)hibit a much richer structure
than a mere order& $n t5o# or three#dimensional "uclidean space+ %or e)ample+ not only is order de%ined along an
line Hnot 3ust the coordinate a)esI+ but also distance bet5een any t5o points. 'ost 0ueries about spatial data
involve the absolute position o% elements in space+ not 3ust their relative position among each other. A typical 0uery
in graphics+ %or e)ample+ asks %or the %irst ob3ect intercepted by a given ray o% light. Computing the ans5er involves
absolute position Hthe location o% the rayI and relative order Hnearest along the rayI. A data structure that supports
direct access to ob3ects according to their position in space can clearly be more e%%icient than one based merely on
the relative position o% elements.
The terms Eorgani6ing the embedding spaceE and Eorgani6ing its contentsE suggest t5o e)tremes along a
spectrum o% possibilities. As 5e have seen in previous chapters+ ho5ever+ many data structures are hybrids that
combine %eatures %rom distinct types. This is particularly true o% metric data structures& They al5ays have aspects o%
address computation needed to locate elements in space+ and they o%ten use list processing techni0ues %or e%%icient
memory utili6ation.
Algorithms and Data Structures 2@< A ,lobal Te)t
23. /etric data structures
'adi/ trees$ tries
:e have encountered binary radi) trees+ and a possible implementation+ in chapter 22 in the section R")tendible
hashingS. 8adi) trees 5ith a branching %actor+ or fan-out+ greater than 2 are ubi0uitous. The (e5ey decimal
classi%ication used in libraries is a radi) tree 5ith a %an#out o% 10. The hierarchical structure o% many te)tbooks+
including this one+ can be seen as a radi) tree 5ith a %an#out determined by ho5 many subsections at depth d ] 1
are packed into a section at depth d.
As another e)ample+ consider tries+ a type o% radi) tree that permits the retrieval o% variable#length data. As 5e
traverse the tree+ 5e check 5hether or not the node 5e are visiting has any successors. Thus the trie can be very
long along certain paths. As an e)ample+ consider a trie containing 5ords in the "nglish language. $n ")hibit 23.1
belo5+ the %our 5ords ;a;+ ;at;+ ;ate;+ and ;be; are sho5n e)plicitly. The letter ;a; is a 5ord and is the %irst letter o% other
5ords. The %ield corresponding to ;a; contains the value 1+ signaling that 5e have spelled a valid 5ord+ and there is a
pointer to longer 5ords beginning 5ith ;a;. The letter ;b; is not a 5ord+ thus is marked by a 0+ but it is the beginning
o% many 5ords+ all %ound by %ollo5ing its pointer. The string ;aa; is neither a 5ord nor the beginning o% a 5ord+ so its
%ield contains 0 and its pointer is ;nil;.
")hibit 23.1& A radi) tree over the alphabet o% letters stores Hpre%i)es o%I 5ords.
Cnly a %e5 5ords begin 5ith ;ate;+ but among these there are some long ones+ such as ;atelectasis;. $t 5ould be
5aste%ul to introduce eight additional nodes+ one %or each o% the characters in ;lectasis;+ 3ust to record this 5ord+
5ithout making signi%icant use o% the %an#out o% 2> provided at each node. Thus tries typically use an Eover%lo5
techni0ueE to handle long entries& The pointer %ield o% the pre%i) ;ate; might point to a te)t %ield that contains
;Hate#Ilectasis; and ;Hate#Ilier;.
.uadtrees and octtrees
Consider a s0uare recursively partitioned into 0uadrants. ")hibit 23.2 23.2 sho5s such a s0uare partitioned to
the depth o% <. There are < 0uadrants at depth 1+ separated by the thickest linesL < K < Hsub#I0uadrants separated by
slightly thinner linesL <
3
Hsub#sub#I0uadrants separated by yet thinner linesL and %inally+ <
<
V 2@> lea% 0uadrants
separated by the thinnest lines. The partitioning structure described is a ,uadtree+ a particular type o% radi) tree o%
%an#out <. The root corresponds to the entire s0uare+ its < children to the < 0uadrants at depth 1+ and so on+ as
sho5n in the ")hibit 23.2.
2@@
")hibit 23.2& A 0uarter circle digiti6ed on a 1> K 1> grid+ and its representation as a <#level 0uadtree.
A 0uadtree is the obvious t5o#dimensional analog o% the one#dimensional binary radi) tree 5e have seen.
Accordingly+ 0uadtrees are %re0uently used to represent+ store+ and process spatial data+ such as images. The %igure
sho5s a 0uarter circle+ digiti6ed on a 1> K 1> grid o% pi)els. This image is most easily represented by a 1> K 1> array o%
bits. The 0uadtree provides an alternative representation that is advantageous %or images digiti6ed to a high level o%
resolution. 'ost graphic images in practice are digiti6ed on rectangular grids o% any5here %rom hundreds to
thousands o% pi)els on a side& %or e)ample+ @12 K @12. $n a 0uadtree+ only the largest 0uadrants o% constant color
Hblack or 5hite+ in our e)ampleI are represented e)plicitlyL their sub0uadrants are implicit.
The 0uadtree in ")hibit 23.2 is interpreted as %ollo5s. C% the %our children o% the root+ the north5est 0uadrant+
labeled 1+ is simple& entirely 5hite. This %act is recorded in the root. The other three children+ labeled 0+ 2+ and 3+
contain both black and 5hite pi)els. As their description is not simple+ it is contained in three 0uadtrees+ one %or
each 0uadrant. 2ointers to these sub0uadtrees emanate %rom the corresponding %ields o% the root.
The south5estern 0uadrant labeled 2 in turn has %our 0uadrants at depth 2. Three o% these+ labeled 2.0+ 2.1+ and
2.2+ are entirely 5hiteL no pointers emanate %rom the corresponding %ields in this node. .ub0uadrant 2.3 contains
both black and 5hite pi)elsL thus the corresponding %ield contains a pointer to a sub#sub0uadtree.
$n this discussion 5e have introduced a notation to identi%y every 0uadrant at any depth o% the 0uadtree. The
root is identi%ied by the null stringL a 0uadrant at depth d is uni0uely identi%ied by a string o% d radi)#< digits. This
string can be interpreted in various 5ays as a number e)pressed in base <. Thus accessing and processing a
0uadtree is readily reduced to arithmetic.
Breadth#%irst addressing
Label the root 0+ its children 1+ 2+ 3+ <+ its grand children @ through 20+ and so on+ one generation a%ter the other.
Algorithms and Data Structures 2@> A ,lobal Te)t
0
" $ % #
( ' ) 2 9 "0 "" "$ "% "# "( "' ") "2 "9 $0
!otice that the children o% any node i are < K i ] 1+ < K i ] 2+ < K i ] 3+ < K i ] <. The parent o% node i is Hi U 1I div <.
This is similar to the address computation used in the heap o% R$mplicit data structuresS+ a binary tree 5here each
node i has children 2 K i and 2 K i ] 1L and the parent o% node i is obtained as i div 2.
")ercise
The string o% radi) < digits along a path %rom the root to any node is called the path address o% this node.
$nterpret the path address as an integer+ most signi%icant digit %irst. These integers label the nodes at depth d d 0
consecutively %rom 0 to <
d
U 1. (evise a %ormula that trans%orms the path address into the breadth#%irst address.
This %ormula can be used to store a 0uadtree as a one#dimensional array.
(ata compression
The representation o% an image as a 0uadtree is sometimes much more compact than its representation as a bit
map. T5o conditions must hold %or this to be true&
1. The image must be %airly large+ typically hundreds o% pi)els on a side.
2. The image must have large areas o% constant value HcolorI.
The 0uadtree %or the 0uarter circle above+ %or e)ample+ has only 1< nodes. A bit map o% the same image re0uires
2@> bits. :hich representation re0uires more storage? Certainly the 0uadtree. $% 5e store it as a list+ each node
must be able to hold %our pointers+ say < or D bytes. $% a pointer has value ;nil;+ indicating that its 0uadrant needs no
re%inement+ 5e need a bit to indicate the color o% this 0uadrant H5hite or blackI+ or a total o% < bits. $% 5e store the
0uadtree breadth#%irst+ no pointers are needed as the node relationships are e)pressed by address computationL
thus a node is reduced to %our three#valued %ields H;5hite;+ ;black;+ or ;re%ine;I+ conveniently stored in D bits+ or 1 byte.
This implicit data structure 5ill leave many unused holes in memory. Thus 0uadtrees do not achieve data
compression %or small images.
Ccttrees
")actly the same idea %or three#dimensional space as 0uadtrees are %or t5o#dimensional space& A cube is
recursively partitioned into eight octants+ using three orthogonal planes.
Spatial data structures# ob;ectives and constraints
'etric data structures are used primarily %or storing spatial data+ such as points and simple geometric ob3ects
embedded in a multidimensional space. The most important ob3ectives a spatial data structure must meet include&
1. "%%icient handling o% large+ dynamically varying data sets in interactive applications
2. /ast access to ob3ects identi%ied in a %ully speci%ied 0uery
3. "%%icient processing o% pro)imity 0ueries and region 0ueries o% arbitrary shape
<. A uni%ormly high memory utili6ation
Achieving these ob3ectives is sub3ect to many constraints+ and results in trade#o%%s.
<anaging disks. By Elarge data setE 5e mean one that must be stored on diskL only a small %raction o% the data
can be kept in central memory at any one time. 'any data structures can be used in central memory+ but the choice
is much more restricted 5hen it comes to managing disks because o% the 5ell#kno5n Ememory speed gapE
2@7
phenomenon. Central memory is organi6ed in small physical units Ha byte+ a 5ordI 5ith access times o%
appro)imately 1 microsecond+ 10
U>
second. (isks are organi6ein large physical blocks H@12 bytes to @kilobytesI 5ith
access times ranging %rom 10 to 100 milliseconds H10
U2
to 10
U1
secondI. Compared to central memory+ a disk delivers
data blocks typically 10
3
times larger 5ith a delay 10
<
times greater. $n terms o% the data rate delivered to the
central processing unit&
the disk is a storage device 5hose e%%ectiveness is 5ithin an order o% magnitude o% that o% central memory. The large
si6e o% a physical disk block is a potential source o% ine%%iciency that can easily reduce the use%ul data rate o% a disk a
hundred%old or a thousand%old. Accessing a couple o% bytes on disk+ say a pointer needed to traverse a list+ takes
about as long as accessing the entire disk block. Thus the game o% managing disks is about minimi&ing the number
of disk accesses.
%ynamically varying data. The ma3ority o% computer applications today are interactive. That means that
insertions+ deletions+ and modi%ications o% data are at least as %re0uent as operations that merely process %i)ed data.
(ata structures that entail a systematic degradation o% per%ormance 5ith continued use Hsuch as ever#lengthening
over%lo5 chains+ or an ever#increasing number o% cells marked EdeletedE in a conventional hash tableI are
unsuitable. Cnly structures that automatically adapt their shape to accommodate ever#changing contents can
provide uni%orm response times.
=nstantaneous response. $nteractive use o% computers sets another ma3or challenge %or data management&
the goal o% providing Einstantaneous responseE to a %ully speci%ied 0uery. E/ullyE speci%ied means that every
attribute relevant %or the search has been provided+ and that at most one element satis%ies the 0uery. $magine the
user clicking an icon on the screen+ and the ob3ect represented by the icon appears instantaneously. $n human
terms+ EinstantaneousE is a 5ell#de%ined physiological 0uantity+ namely+ about o% a second+ the limit o% human time
resolution. $deally+ an interactive system retrieves any single element %ully speci%ied in a 0uery 5ithin 0.1 second.
Two-disk-access principle. :e have already stated that in today;s technology+ a disk access typically takes
%rom tens o% milliseconds. Thus the goal o% retrieving any single element in 0.1 second translates into Eretrieve any
element in at most a %e5 disk accessesE. /ortunately+ it turns out that use%ul data structure can be designed that
access data in a t5o#step process& H1I access the correct portion o% a directory+ and H2I access the correct data
bucket. *nder the assumption that both data and directory are so large that they are stored on disk+ 5e call this the
two-disk-access principle'
Proximity 8ueries and region 8ueries of arbitrary shape. The simplest e)ample o% a pro)imity 0uery is
the operation ;ne)t;+ 5hich 5e have o%ten encountered in one#dimensional data structure traversals& ,iven a pointer
to an element+ get the ne)t element Hthe successor or the predecessorI according to the order de%ined on the
domain. Another simple e)ample is an interval or range 0uery such as Eget all ) bet5een 13 and 17E. This
generali6es directly to k#dimensional orthogonal range ,ueries such as the t5o#dimensional 0uery Eget all H)1+ )2I
5ith 13 ` )1

e 17 and 3 ` )2

e <E. $n geometric computation+ %or e)ample+ many other instances o% pro)imity 0ueries
are important+ such as the Enearest neighborE Hin any directionI+ or intersection 0ueriesamong ob3ects. 8egion
0ueries o% arbitrary shape Hnot 3ust rectangularI are able to e)press a variety o% geometric conditions.
Algorithms and Data Structures 2@D A ,lobal Te)t
'niformly high memory utili:ation. Any data structure that adapts its shape to dynamically changing
contents is likely to leave Eunused holesE in storage space& space that is currently unused+ and that cannot
conveniently be used %or other purposes because it is %ragmented. :e have encountered this phenomenon in
multi5ay trees such as B#trees and in hash tables. $t is practically unavoidable that dynamic data structures use
their allocated space to less than 100s+ and an average space utili6ation o% @0s is o%ten tolerable. The danger to
avoid is a built#in bias that drives space utili6ation to5ard 0 5hen the %ile shrinksYelements get deleted but their
space is not relin0uished. The grid %ile+ to be discussed ne)t+ achieves an average memory utili6ation o% about 70s
regardless o% the mi) o% insertions or deletions.
The grid file
The grid %ile is a metric data structure designed to store points and simple geometric ob3ects in
multidimensional space so as to achieve the ob3ectives stated above. This section describes its architecture+ access
and update algorithms+ and properties. 'ore details can be %ound in N!A. D<O and NAin D@O.
.cales+ directory+ buckets
Consider as an e)ample a t5o#dimensional domain& the Cartesian product W1 u W2+ 5here W1

V 0 .. 1999 is a
subrange o% the integers+ and W2

V a .. 6 is the ordered set o% the 2> characters o% the "nglish alphabet. 2airs o% the
%orm H)1+ )2I+ such as H19DD+ 5I+ are elements %rom this domain.
The bit map is a natural data structure %or storing a set . o% elements %rom W1 u W2. $t may be declared as
var T- arrayFB
"
, B
$
G of oolean&
5ith the convention that
TFx
"
, x
$
G . true *x
"
, x
$
+ S.
Basic set operations are per%ormed by direct access to the array element corresponding to an element& %indH)1+
)2I is simply the boolean e)pression TN)1+ )2OL insertH)1+ )2I is e0uivalent to TN)1+ )2O&V ;true;+ deleteH)1+ )2I is
e0uivalent to TN)1+ )2O &V ;%alse;. The bit map %or our small domain re0uires an a%%ordable @2k bits. Bit maps %or
realistic e)amples are rarely a%%ordable+ as the %ollo5ing reasoning sho5s. /irst+ consider that ) and y are 3ust keys
o% records that hold additional data. $% space is reserved in the array %or this additional data+ an array element is not
a bit but as many bytes as are needed+ and all the absent records+ %or elements H)1+ )2I .+ 5aste a lot o% storage.
.econd+ most domains are much larger than the e)ample above& the three#dimensional "uclidean space+ %or
e)ample+ 5ith elements H)+ y+ 6I taken as triples o% 32#bit integers+ or ><#bit %loating#point numbers+ re0uires bit
maps o% about 10
30
and 10
>0
bits+ respectively. /or comparison;s sake& a large disk has about 10
10
bits.
.ince large bit maps are e)tremely sparsely populated+ they are amenable to data compression. The grid %ile is
best understood as a practical data compression techni0ue that stores huge+ sparsely populated bit maps so as to
support direct access. 8eturning to our e)ample+ imagine a historical database inde)ed by the year o% birth and the
%irst letter o% the name o% scientists& thus 5e %ind ;ohn von !eumann; under H1903+ vI. Cur database is pictured as a
cloud o% points in the domain sho5n in ")hibit 23.3L because 5e have more scientists Hor at least+ more recordsI in
recent years+ the density increases to5ard the right. .toring this database implies packing the records into buckets
o% %i)ed capacity to hold c He.g. c V 3I records. The %igure sho5s the domain partitioned by orthogonal hyperplanes
into bo)#shaped grid cells+ none o% 5hich contains more than c points.
2@9
")hibit 23.3& Cells o% a grid partition adapt their si6e so that no cell is populated by more than c points.
A grid %ile %or this database contains the %ollo5ing components&
/inear scales sho5 ho5 the domain is currently partitioned.
The director is an array 5hose elements are in one#to#one correspondence 5ith the grid cellsL each entry
points to a data bucket that holds all the records o% the corresponding grid cell.
Access to the record H1903+ vI proceeds through three steps&
1. .cales trans%orm key values to array indices& H1903+ vI becomes H@+ <I. .cales contain small amounts o%
data+ 5hich is kept in central memoryL thus this step re0uires no disk access.
2. The inde) tuple H@+ <I provides direct access to the correct element o% the directory. The directory may be
large and occupy many pages on disk+ but 5e can compute the address o% the correct directory page and in
one disk access retrieve the correct directory element.
3. The directory element contains a pointer Hdisk addressI o% the correct data bucket %or H1903+ vI+ and the
second disk access retrieves the correct record& NH1903+ vI+ ohn von !eumann [O.
(isk utili6ation
The grid %ile does not allocate a separate bucket to each grid cell Ythat 5ould lead to an unacceptably lo5 disk
utili6ation. ")hibit 23.< suggests+ %or e)ample+ that the t5o grid cells at the top right o% the directory share the same
bucket. Ao5 this bucket sharing comes about+ and ho5 it is maintained through splitting o% over%lo5ing buckets+
and merging sparsely populated buckets+ is sho5n in the %ollo5ing.
")hibit 23.<& The search %or a record 5ith key values H1903+ vI starts 5ith the scales and
proceeds via the directory to the correct data bucket on disk.
The dynamics o% splitting and merging
The dynamic behavior o% the grid %ile is best e)plained by tracing an e)ample& 5e sho5 the e%%ect o% repeated
insertions in a t5o#dimensional %ile. $nstead o% sho5ing the grid directory+ 5hose elements are in one#to#one
correspondence 5ith the grid blocks+ 5e dra5 the bucket pointers as originating directly %rom the grid blocks.
$nitially+ a single bucket A+ o% capacity c V 3 in our e)ample+ is assigned to the entire domain H")hibit 23.@I.
:hen bucket A over%lo5s+ the domain is split+ a ne5 bucket B is made available+ and those records that lie in one
hal% o% the space are moved %rom the old bucket to the ne5 one H")hibit 23.>I. $% bucket A over%lo5s again+ its grid
block Hi.e. the le%t hal% o% the spaceI is split according to some splitting policy& :e assume the simplest splitting
policy o% alternating directions. Those records o% A that lie in the lo5er#le%t grid block o% ")hibit 23.7 are moved to a
ne5 bucket C. !otice that as bucket B did not over%lo5+ it is le%t alone& $ts region no5 consists o% t5o grid blocks.
/or e%%ective memory utili6ation it is essential that in the process o% re%ining the grid partition 5e need not
necessarily split a bucket 5hen its region is split.
")hibit 23.@& A gro5ing grid %ile starts 5ith a single bucket allocated to the entire key space.
2>1
")hibit 23.>& An over%lo5ing bucket triggers a re%inement o% the space partition.
")hibit 23.7& Bucket A has been split into A and C+ but the contents o% B remain unchanged.
Assuming that records keep arriving in the lo5er#le%t corner o% the space+ bucket C 5ill over%lo5. This 5ill trigger
a %urther re%inement o% the grid partition as sho5n in ")hibit 23.D+ and a splitting o% bucket C into C and (. The
history o% repeated splitting can be represented in the %orm o% a binary tree+ 5hich imposes on the set o% buckets
currently in use Hand hence on the set o% regions o% these bucketsI a twin sstem Halso called a budd sstemI& "ach
bucket and its region have a uni0ue t5in %rom 5hich it split o%%. $n ")hibit 23.D+ C and ( are t5ins+ the pair HC+ (I is
A;s t5in+ and the pair HA+ HC+ (II is B;s t5in.
")hibit 23.D& Bucket regions that span several cells ensure high disk utili6ation.
(eletions trigger merging operations. $n contrast to one#dimensional storage+ 5here it is su%%icient to merge
buckets that split earlier+ merging policies %or multidimensional grid %iles need to be more general in order to
maintain a high occupancy.
Simple geometric ob;ects and their parameter spaces
Consider a class o% simple spatial ob3ects+ such as aligned rectangles in the plane Hi.e. 5ith sides parallel to the
a)esI. :ithin its class+ each ob3ect is de%ined by a small number o% parameters. /or e)ample+ an aligned rectangle is
determined by its center Hc)+ cyI and the hal%#length o% each side+ d) and dy.
An ob3ect de%ined 5ithin its class by k parameters can be considered to be a point in a k#dimensional parameter
space. /or e)ample+ an aligned rectangle becomes a point in %our#dimensional space. All o% the geometric and
topological properties o% an ob3ect can be deduced %rom the class it belongs to and %rom the coordinates o% its
corresponding point in parameter space.
(i%%erent choices o% the parameter space %or the same class o% ob3ects are appropriate+ depending on
characteristics o% the data to be processed. .ome considerations that may determine the choice o% parameters are&
1. Distinction between location parameters and e*tension parameters' /or some classes o% simple ob3ects it
is reasonable to distinguish location parameters+ such as the center Hc)+ cyI o% an aligned rectangle+ %rom
e)tension parameters+ such as the hal%#sides d) and dy. This distinction is al5ays possible %or ob3ects that
can be described as Cartesian products o% spheres o% various dimensions. /or e)ample+ a rectangle is the
product o% t5o one#dimensional spheres+ a cylinder the product o% a one#dimensional and a t5o#
dimensional sphere. :henever this distinction can be made+ cone#shaped search regions generated by
pro)imity 0ueries as described in the ne)t section have a simple intuitive interpretation& The subspace o%
the location parameters acts as a EmirrorE that re%lects a 0uery.
2. Independence of parameters$ uniform distribution' As an e)ample+ consider the class o% all intervals on a
straight line. $% intervals are represented by their le%t and right endpoints+ l) and r)+ the constraint l) ` r)
restricts all representations o% these intervals by points Hl)+ r)I to the triangle above the diagonal. Any data
structure that organi6es the embedding space o% the data points+ as opposed to the particular set o% points
that must be stored+ 5ill pay some overhead %or representing the unpopulated hal% o% the embedding space.
A coordinate trans%ormation that distributes data all over the embedding space leads to more e%%icient
storage. The phenomenon o% nonuni%orm data distribution can be 5orse than this. $n most applications+ the
building blocks %rom 5hich comple) ob3ects are built are much smaller than the space in 5hich they are
embedded+ as the si6e o% a brick is small compared to the si6e o% a house. $% so+ parameters such as l) and r)
that locate boundaries o% an ob3ect are highly dependent on each other. ")hibit 23.9 sho5s short intervals
on a long line clustering along the diagonal+ leaving large regions o% a large embedding space unpopulatedL
5hereas the same set o% intervals represented by a location parameter c) and an e)tension parameter d)
%ills a smaller embedding space in a much more uni%orm 5ay. :ith the assumption o% bounded d)+ this data
distribution is easier to handle.
2>3
")hibit 23.9& A set o% intervals represented in t5o di%%erent parameter spaces.
'egion 1ueries of arbitrary shape
$ntersection is a basic component o% other pro)imity 0ueries+ and thus deserves special attention. CA( design
rules+ %or e)ample+ o%ten re0uire di%%erent ob3ects to be separated by some minimal distance. This is e0uivalent to
re0uiring that ob3ects surrounded by a rim do not intersect. ,iven a subset o% a class o% simple spatial ob3ects 5ith
parameter space A+ 5e consider t5o types o% 0ueries&
point 0uery ,iven a 0uery point 0+ %ind all ob3ects A %or 5hich 0 A.
point set 0uery ,iven a 0uery set = o% points+ %ind all ob3ects A that intersect =.
Point 8uery. /or a 0uery point 0 compute the region in A that contains all points representing ob3ects in that
overlap 0.
1. Consider the class o% intervals on a straight line. An interval given by its center c) and its hal% length d)
overlaps a point 0 5ith coordinate 0) i% and only i% c) U d) ` 0) ` c) ] d).
2. The class o% aligned rectangles in the plane H5ith parameters c)+ cy+ d)+ dyI can be treated as the Cartesian
product o% t5o classes o% intervals+ one along the )#a)is+ the other along the y#a)is H")hibit 23.10I. All
rectangles that contain a given point 0 are represented by points in %our#dimensional space that lie in the
Cartesian product o% t5o point#in#interval 0uery regions. The region is sho5n by its pro3ections onto the c)#
d) plane and the cy#dy plane.
Algorithms and Data Structures 2>< A ,lobal Te)t
")hibit 23.10& A set o% aligned rectangles represented as a set o% points in a %our#dimensional
parameter space. A point 0uery is trans%ormed into a cone#shaped region 0uery.
3. Consider the class o% circles in the plane. :e represent a circle as a point in three#dimensional space by the
coordinates o% its center Hc)+ cyI and its radius r as parameters. All circles that overlap a point 0 are
represented in the corresponding three#dimensional space by points that lie in the cone 5ith verte) 0
sho5n in ")hibit 23.11. The a)is o% the cone is parallel to the r#a)is Hthe e)tension parameterI+ and its
verte) 0 is considered a point in the c)#cy plane Hthe subspace o% the location parametersI.
")hibit 23.11& .earch cone %or a point 0uery %or circles in the plane.
Point set 8uery. ,iven a 0uery set = o% points+ the region in A that contains all points representing ob3ects A
that intersect = is the union o% the regions in A that results %rom the point 0ueries %or each point 0 =. The
union o% cones is a particularly simple region in A i% the 0uery set = is a simple spatial ob3ect.
2>@
1. Consider the class o% intervals on a straight line. An interval i V Hc)+ d)I intersects a 0uery interval = V Hc0+
d0I i% and only i% its representing point lies in the shaded region sho5n in ")hibit 23.12L this region is given
by the ine0ualities c) U d) ` c0 ] d0 and c) ] d) Z c0 U d0.
")hibit 23.12& An interval 0uery+ as a union o% point 0ueries+ again gets trans%ormed into a search cone.
2. The class o% aligned rectangles in the plane is again treated as the Cartesian product o% t5o classes o%
intervals+ one along the )#a)is+ the other along the y#a)is. $% = is also an aligned rectangle+ all rectangles
that intersect = are represented by points in %our#dimensional space lying in the Cartesian product o% t5o
interval intersection 0uery regions.
3. Consider the class o% circles in the plane. All circles that intersect a line segment L are represented by points
lying in the cone#shaped solid sho5n in ")hibit 23.13. This solid is obtained by embedding L in the c)#cy
plane+ the subspace o% the location parameters+ and moving the cone 5ith verte) at 0 along L.
Algorithms and Data Structures 2>> A ,lobal Te)t
")hibit 23.13& .earch region as a union o% cones.
6valuating region 1ueries *ith a grid file
:e have seen that pro)imity 0ueries on spatial ob3ects lead to search regions signi%icantly more comple) than
orthogonal range 0ueries. The grid %ile allo5s the evaluation o% irregularly shaped search regions in such a 5ay that
the comple)ity o% the region a%%ects C2* time but not disk accesses. The latter limits the per%ormance o% a data base
implementation. A 0uery region = is matched against the scales and converted into a set $ o% inde) tuples that re%er
to entries in the directory. Cnly a%ter this preprocessing do 5e access disk to retrieve the correct pages o% the
directory and the correct data buckets 5hose regions intersect = H")hibit 23.1<I.
")hibit 23.1<& The cells o% a grid partition that overlap an arbitrary 0uery region = are determined by
merely looking up the scales.
"nteraction bet*een 1uery processing and data access
The point o% the t5o preceding sections 5as to sho5 that in a metric data structure+ intricate computations
triggered by pro)imity 0ueries can be preprocessed to a remarkable e)tent before the ob3ects involved are retrieved.
2>7
=uery preprocessing may involve a signi%icant amount o% computation based on small amounts o% au)iliary dataY
the scales and the 0ueryYthat are kept in central memory. The %inal access o% data %rom disk is highly selectiveY
data retrieved has a high chance o% being part o% the ans5er.
Contrast this to an approach 5here an ob3ect can be accessed only by its name He.g. the part numberI because
the geometric in%ormation about its location in space is only included in the record %or this ob3ect but is not part o%
the accessing mechanism. $n such a database+ all ob3ects might have to be retrieved in order to determine 5hich
ones ans5er the 0uery. ,iven that disk access is the bottleneck in most database applications+ it pays to preprocess
0ueries as much as possible in order to save disk accesses.
The integration o% 0uery processing and accessing mechanism developed in the preceding sections 5as made
possible by the assumption o% simple ob3ects+ 5here each instance is described by a small number o% parameters.
:hat can 5e do 5hen %aced 5ith a large number o% irregularly shaped ob3ects?
Comple)+ irregularly shaped spatial ob3ects can be represented or appro)imated by simpler ones in a variety o%
5ays+ %or e)ample& decomposition+ as in a 0uad tree tessellation o% a %igure into dis3oint raster s0uaresL
representation as a cover o% overlapping simple shapesL and enclosing each ob3ect in a container chosen %rom a
class o% simple shapes. The container techni0ue allo5s e%%icient processing o% pro)imity 0ueries because it preserves
the most important properties %or pro)imity#based access to spatial ob3ects+ in particular& $t does not break up the
ob3ect into components that must be processed separately+ and it eliminates many potential tests as unnecessary Hi%
t5o containers don;t intersect+ the ob3ects 5ithin 5on;t eitherI. As an e)ample+ consider %inding all polygons that
intersect a given 0uery polygon+ given that each o% them is enclosed in a simple container such as a circle or an
aligned rectangle. Testing t5o polygons %or intersection is an e)pensive operation compared to testing their
containers %or intersection. The cheap container test e)cludes most o% the polygons %rom an e)pensive+ detailed
intersection check.
Any appro)imation techni0ue limits the primitive shapes that must be stored to one or a %e5 types& %or e)ample+
aligned rectangles or bo)es. An instance o% such a type is determined by a %e5 parameters+ such as coordinates o% its
center and its e)tension+ and can be considered to be a point in a Hhigher#dimensionalI parameter space. This
trans%ormation reduces ob3ect storage to point storage+ increasing the dimensionality o% the problem 5ithout loss o%
in%ormation. Combined 5ith an e%%icient multi#dimensional data structure %or point storage it is the basis %or an
e%%ective implementation o% databases %or spatial ob3ects.
")ercises
1. (ra5 three 0uadtrees+ one %or each o% the < K D pi)el rectangles A+ B and C outlined in ")hibit 23.1@.
Algorithms and Data Structures 2>D A ,lobal Te)t
")hibit 23.1@& The location o% congruent ob3ects greatly a%%ects the comple)ity o% a 0uadtree
representation.
2. Consider a grid %ile that stores points lying in a t5o#dimensional domain& the Cartesian product W1 u W2+
5here W1

V 0 .. 1@ and W2

V 0 .. 1@ are subranges o% the integers. Buckets have a capacity o% t5o points.
HaI $nsert the points H2+ 3I+ H13+ 1<I+ H3+ @I+ H>+ 9I+ H10+ 13I+ H11+ @I+ H1<+ 9I+ H7+ 3I+ H1@+ 11I+ H9+ 9I+ and H11+ 10I
into the initially empty grid %ile and sho5 the state o% the scales+ the directory+ and the buckets a%ter
each insert operation. Buckets are split such that their shapes remain as 0uadratic as possible.
HbI (elete the points H10+ 13I+ H9+ 9I+ H11+ 10I+ and H1<+ 9I %rom the grid %ile obtained in aI and sho5 the state
o% the scales+ the directory+ and the buckets a%ter each delete operation. Assume that a%ter deleting a
point in a bucket this bucket may be merged 5ith a neighbor bucket i% their 3oint occupancy does not
e)ceed t5o points. /urther+ a boundary should be removed %rom its scale i% there is no longer a bucket
that is split 5ith respect to this boundary.
HcI :ithout imposing %urther restrictions a deadlock situation may occur a%ter a se0uence o% delete
operations& !o bucket can merge 5ith any o% its neighbors+ since the resulting bucket region 5ould no
longer be rectangular. $n the e)ample sho5n in ")hibit 23.1> the shaded ovals represent bucket
regions. (evise a merging policy that prevents such deadlocks %rom occurring in a t5o#dimensional
grid %ile.
2>9
")hibit 23.1>& This e)ample sho5s bucket regions that cannot be merged pair5ise.
3. Consider the class o% circles in the plane represented as points in three#dimensional parameter space as
proposed in chapter 23 in the section R8egion 0ueries o% arbitrary shapeS. (escribe the search regions in
the parameter space HaI %or all the circles intersecting a given circle C+ HbI %or all the circles contained in C+
and HcI %or all the circles enclosing .
!art B"# "nteraction bet*een
algorithms and data
structures# case studies in
geometric computation
Crgani6ing and processing "uclidean space
$n 2art $$$ 5e presented a varied sample o% algorithms that use simple+ mostly static+ data structures. 2art F 5as
dedicated to dynamic data structures+ and 5e presented the corresponding access and update algorithms. $n this
%inal part 5e illustrate the use o% these dynamic data structures by presenting algorithms 5hose e%%iciency depends
crucially on them+ in particular on priority 0ueues and dictionaries. :e choose these algorithms %rom
computational geometry+ a recently developed discipline o% great practical importance 5ith applications in
computer graphics+ computer#aided design+ and geographic databases.
$% data structures are tools %or organi6ing sets o% data and their relationships+ geometric data processing poses
one o% the most challenging tests. The ability to organi6e data embedded in the "uclidean space in such a 5ay as to
re%lect the rich relationships due to location He.g. touching or intersecting+ contained in+ distanceI is o% utmost
importance %or the e%%iciency o% algorithms %or processing spatial data. (ata structures developed %or traditional
commercial data processing 5ere o%ten based on the concept o% one primary key and several subordinate secondary
keys. This asymmetry %ails to support the e0ual role played by the Cartesian coordinate a)es )+ y+ 6+ [ o% "uclidean
space. $% one spatial a)is+ say )+ is identi%ied as the primar ke+ there is a danger that 0ueries involving the other
a)es+ say y and 6+ become inordinately cumbersome to process+ and there%ore slo5. /or the sake o% simplicity 5e
concentrate on t5o#dimensional geometric problems+ and in particular on the highly success%ul class o% plane#
s5eep algorithms. .5eep algorithms do a remarkably good 3ob at processing t5o#dimensional space e%%iciently
using t5o distinct one#dimensional data structures+ one %or organi6ing the )#a)is+ the other %or the y#a)is.
-2& Sample problems and
algorithms
The nature o% geometric computation& three problems and algorithms chosen to illustrate the variety o%
issues encountered&
Conve) hull yields to simple and e%%icient algorithms+ straight%or5ard to implement and analy6e.
Cb3ects 5ith special properties+ such as conve)ity+ are o%ten much simpler to process than are general
ob3ects.
Fisibility problems are surprisingly comple)L even i% this comple)ity does not sho5 in the design o% an
algorithm+ it sneaks into its analysis.
Geometry and geometric computation
Classical geometry+ shaped by the ancient ,reeks+ is more a)iomatic than constructive& $t emphasi6es a)ioms+
theorems+ and proo%s+ rather than algorithms. The typical statement o% "uclidean geometry is an assertion about all
geometric con%igurations 5ith certain properties He.g. the theorem o% 2ythagoras& E$n a right#angled triangle+ the
s0uare on the hypotenuse c is e0ual to the sum o% the s0uares on the t5o catheti a and b& c
2
V a
2
] b
2E
I or an
assertion o% e)istence He.g. the parallel a)iom& E,iven a line L and a point 2 L+ there is e)actly one line parallel to
L passing through 2EI. Constructive solutions to problems do occur+ but the theorems about the impossibilit o%
constructive solutions steal the glory& EXou cannot trisect an arbitrary angle using ruler and compass only+E and the
proverbial E$t is impossible to s0uare the circle.E
Computational geometry+ on the other hand+ starts out 5ith problems o% construction so simple that+ until the
1970s+ they 5ere dismissed as trivial& E,iven n line segments in the plane+ are they %ree o% intersections? $% not+
compute HconstructI all intersections.E This problem is only trivial 5ith respect to the e*istence o% a constructive
solution. As 5e 5ill soon see+ the 0uestion is %ar %rom trivial i% interpreted as& Ao5 efficientl can 5e obtain the
ans5er?
Computational geometry has some appealing %eatures that make it ideal %or learning about algorithms and data
structures& HaI The problem statements are easily understood+ intuitively meaning%ul+ and mathematically rigorousL
right a5ay the student can try his o5n hand at solving them+ 5ithout having to 5orry about hidden subtleties or a
lot o% re0uired background kno5ledge. HbI 2roblem statement+ solution+ and every step o% the construction have
natural visual representations that support abstract thinking and help in detecting errors o% reasoning. HcI These
algorithms are practicalL it is easy to come up 5ith e)amples 5here they can be applied.
Appealing as geometric computation is+ 5riting geometric programs is a demanding task. T5o traps lie hiding
behind the obvious combinatorial intricacies that must be mastered+ and they are particularly dangerous 5hen they
occur together& HaI degenerate con%igurations+ and HbI the pit%alls o% numerical computation due to discreti6ation
and rounding errors. (egenerate con%igurations+ such as those 5e discussed in R.traight lines and circlesS on
24. $ample problems and algorithms
intersecting line segments+ are special cases that o%ten re0uire special code. $t is not al5ays easy to envision all the
kinds o% degeneracies that may occur in a given problem. A con%iguration may be degenerate %or a speci%ic
algorithm+ 5hereas it may be nondegenerate %or a di%%erent algorithm solving the same problem. 8ounding errors
tend to cause more obviously disastrous conse0uences in geometric computation than+ say+ in linear algebra or
di%%erential e0uations. :hereas the traditional analysis o% rounding errors %ocuses on bounding their cumulative
value+ geometry is concerned primarily 5ith a stringent all#or#nothing 0uestion& Aave errors impaired the
topological consistency o% the data? H8emember the pathology o% the braided straight lines.I
$n this 2art F$ 5e aim to introduce the reader to some o% the central ideas and techni0ues o% computational
geometry. /or simplicity;s sake 5e limit coverage to t5o#dimensional "uclidean geometry # most problems become
a lot more complicated 5hen 5e go %rom t5o# to three#dimensional con%igurations. :e %ocus on a type o% algorithm
that is remarkably 5ell suited %or solving t5o#dimensional problems e%%iciently& s5eep algorithms. To illustrate
their generality and e%%ectiveness+ 5e use plane#s5eep to solve several rather distinct problems. :e 5ill see that
s5eep algorithms %or di%%erent problems can be assembled %rom the same building blocks& a skeleton s5eep
program that s5eeps a line across the plane based on a 0ueue o% events to be processed+ and transition procedures
that update the data structures Ha dictionary or table+ and perhaps other structuresI at each event and maintain a
geometric invariant. .5eeps sho5 convincingly ho5 the dynamic data structures o% 2art F are essential %or the
e%%iciency.
The problems and algorithms 5e discuss deal 5ith very simple ob3ects& points and line segments. Applications o%
geometric computation such as CA(+ on the other hand+ typically deal 5ith very comple) ob3ects made up o%
thousands o% polygons. The simplicity o% these algorithms does not deter %rom their utility. Comple) ob3ects get
processed by being broken into their primitive parts+ such as points+ line segments+ and triangles. The algorithms
5e present are some o% the most basic subroutines o% geometric computation+ 5hich play a role analogous to that o%
a s0uare root routine %or numerical computation& As they are called untold times+ they must be correct and e%%icient.
onve/ hull# a multitude of algorithms
The problem o% computing the conve) hull AH.I o% a set . consisting o% n points in the plane serves as an
e)ample to demonstrate ho5 the techni0ues o% computational geometry yield the concise and elegant solution that
5e presented in RAlgorithm animationS. The conve) hull o% a set . o% points in the plane is the smallest conve)
polygon that contains the points o% . in its interior or on its boundary. $magine a nail sticking out above each point
and a tight rubber band surrounding the set o% nails.
'any di%%erent algorithms solve this simple problem. Be%ore 5e present in detail the algorithm that %orms the
basis o% the program ;Conve)Aull; o% chapter 3+ 5e brie%ly illustrate the main ideas behind three others. 'ost
conve) hull algorithms have an initiali6ation step that uses the %act that 5e can easily identi%y t5o points o% . that
lie on the conve) hull AH.I& %or e)ample+ t5o points 2
min
and 2
ma)
5ith minimal and ma)imal )#coordinate+
respectively. Algorithms that gro5 conve) hulls over increasing subsets can use the segment as a HdegenerateI
conve) hull to start 5ith. Cther algorithms use the segment to partition . into an upper and a lo5er subset+ and
compute the upper and the lo5er part o% the hull AH.I separately.
1. :arvis%s march Nar 73O starts at a point on AH.I+ say 2
min
+ and ;5alks around; by computing+ at each point
2+ the ne)t tangent to .+ characteri6ed by the property that all points o% . lie on the same side o% 2=
273
")hibit 2<.1& The Egi%t#5rappingE approach to building the conve) hull.
2. Divide-and-con,uer comes to mind& .ort the points o% . according to their )#coordinate+ use the median )#
coordinate to partition . into a le%t hal% .L and a right hal% .8+ apply this conve) hull algorithm recursively to
each hal%+ and merge the t5o solutions AH.LI and AH.8I by computing the t5o common e)terior tangents to
AH.LI and AH.8I H")hibit 2<.2I. Terminate the recursion 5hen a set has at most three points.
")hibit 2<.2& (ivide#and#con0uer applies to many problems on spatial data.
3. Guickhull NByk 7DO+ N"dd 77O+ N,. 79O uses divide#and#con0uer in a di%%erent 5ay. :e start 5ith t5o points
on the conve) hull AH.I+ say 2min and 2ma). $n general+ i% 5e kno5 Z 2 points on AH.I+ say 2+ =+ 8 in ")hibit
2<.3+ these de%ine a conve) polygon contained in AH.I. H(ra5 the appropriate picture %or 3ust t5o points
2min and 2ma) on the conve) hull.I There can be no points o% . in the shaded sectors that e)tend out5ard
%rom the vertices o% the current polygon+ 2=8 in the e)ample. Any other points o% . must lie either in the
polygon 2=8 or in the regions e)tending out5ard %rom the sides.
")hibit 2<.3& Three points kno5n to lie on the conve) hull identi%y regions devoid o% points.
/or each side+ such as 2= in ")hibit 2<.<+ let T be a point farthest %rom 2= among all those in the region
e)tending out5ard %rom 2=+ i% there are any. T must lie on the conve) hull+ as is easily seen by considering
the parallel to 2= that passes through T. Aaving processed the side 2=+ 5e e)tend the conve) polygon to
include T+ and 5e no5 must process 2 additional sides+2T and T=. The reader 5ill observe a %ormal analogy
bet5een 0uicksort HR.orting and its comple)itySI and 0uickhull+ 5hich has given the latter its name.
")hibit 2<.<& The point T %arthest %rom identi%ies a ne5 region o% e)clusion
HshadedI.
<. $n an incremental scan or sweep 5e sort the points o% . according to their )#coordinates+ and use the
segment 2min2ma) to partition . into an upper subset and a lo5er subset+ as sho5n in ")hibit 2<.@. /or
simplicity o% presentation+ 5e reduce the problem o% computing AH.I to the t5o separate problems o%
computing the upper hull *H.I Ni.e. the upper part o% AH.IO+ sho5n in bold+ and the lo5er hull LH.I+ dra5n
as a thin line. Cur notation and pictures are chosen to describe *H.I.
")hibit 2<.@& .eparate computations %or the upper hull and the lo5er hull.
Let 21+ [ + 2n be the points o% . sorted by )#coordinate+ and let *i

V *H21+ [ + 2iI be the upper hull o% the %irst i
points. *1

V 21 may serve as an initiali6ation. /or i V 2 to n 5e compute *i %rom *iU1+ as ")hibit 2<.> sho5s. .tarting
5ith the tentative tangent 2i2iU1

sho5n as a thin dashed line+ 5e retrace the upper hull *iU1 until 5e reach the actual
tangent& in our e)ample+ the bold dashed line 2i22. The tangent is characteri6ed by the %act that %or 3 V 1+ [ + iU1+ it
minimi6es the angle Ai+3 bet5een 2i23

and the vertical.
27@
")hibit 2<.>& ")tending the partial upper hull *H2
1
+ [ + 2
iU1
I to the ne)t point 2
i
The program ;Conve)Aull; presented in RAlgorithm animationS as an e)ample %or algorithm animation is 5ritten
as an on#line algorithm& 8ather than reading all the data be%ore starting the computation+ it accepts one point at a
time+ 5hich must lie to the right o% all previous ones+ and immediately e)tends the hull *iU1 to obtain *i. Thanks to
the input restriction that the points are entered in sorted order+ ;Conve)Aull; becomes simpler and runs in linear
time. This e)plains the t5o#line main body&
6ointHero& { sets first point and initiali)es all necessary
variables }
while Dext5ight do <omputeTangent&
There remain a %e5 programming details that are best e)plained by relating /ig. 2<.> to the declarations&
var x, y, dx, dy- arrayF0 .. nmaxG of integer&
- arrayF0 .. nmaxG of integer& { backpointer }
n- integer& { number of points entered so far }
px, py- integer& { new point }
The coordinates o% the points 2i are stored in the arrays ) and y. 8ather than storing angles such as Ai+3+ 5e store
0uantities proportional to cosHAi+3I and sinHAi+3I in the arrays d) and dy. The array b holds back pointers %or retracing
the upper hull back to5ard the le%t& bNiO V 3 implies that 23 is the predecessor o% 2i in *i. This e)plains the key
procedure o% the program&
procedure <omputeTangent& { from 3
n
! (px, py to M
nD&
}
var i- integer&
egin
i -. FnG&
while dyFnG 1 dxFiG K dyFiG 1 dxFnG do egin { dy7n8*dx7n8 6
dy7i8*dx7i8 }
i -. FiG&
dxFnG -. xFnG 0 xFiG& dyFnG -. yFnG 0 yFiG&
MoveTo*px, py+& 8ine*0dxFnG, 0dyFnG+&
FnG -. i
end&
MoveTo*px, py+& 6enSiLe*$, $+& 8ine*0dxFnG, 0dyFnG+& 6enDormal
end& { 9ompute,angent }
The algorithm implemented by ;Conve)Aull; is based on ,raham;s scan N,ra 72O+ 5here the points are ordered
according to the angle as seen %rom a %i)ed internal point+ and on NAnd 79O.
The uses of conve/ity# basic operations on polygons
The conve) hull o% a set o% points or ob3ects Hi.e. the smallest conve) set that contains all ob3ectsI is a model
problem in geometric computation+ 5ith many algorithms and applications. :hy? As 5e stated in the introductory
section+ applications o% geometric computation tend to deal 5ith comple) ob3ects that o%ten consist o% thousands o%
primitive parts+ such as points+ line segments+ and triangles. $t is o%ten e%%ective to appro)imate a comple)
con%iguration by a simpler one+ in particular+ to package it in a container o% simple shape. 'any pro)imity 0ueries
can be ans5ered by processing the container only. Cne o% the most %re0uent 0ueries in computer graphics+ %or
e)ample+ asks 5hat ob3ect+ i% any+ is %irst struck by a given ray. $% 5e %ind that the ray misses a container+ 5e in%er
that it misses all ob3ects in it 5ithout looking at themL only i% the ray hits the container do 5e start the costly
analysis o% all the ob3ects in it.
The conve) hull is o%ten a very e%%ective container. Although not as simple as a rectangular bo)+ say+ conve)ity is
such a strong geometric property that many algorithms that take time CHnI on an arbitrary polygon o% n vertices
re0uire only time CHlog nI on conve) polygons. Let us list several such e)amples. :e assume that a polygon , is
given as a HcyclicI se0uence o% n vertices andQor n edges that trace a closed path in the plane. 2olygons may be sel%#
intersecting+ 5hereas simple polygons may not. A simple polygon partitions the plane into t5o regions& the interior+
5hich is simply connected+ and the e)terior+ 5hich has a hole.
2oint#in#polygon test
,iven a simple polygon , and a 0uery point 2 Hnot on ,I+ determine 5hether 2 lies inside or outside the
polygon.
T5o closely related algorithms that 5alk around the polygon solve this problem in time CHnI. The %irst one
computes the winding number o% , around 2. $magine an observer at 2 looking at a verte)+ say F+ 5here the 5alk
starts+ and turning on her heels to keep 5atching the 5alker H")hibit 2<.7I. The observer 5ill make a %irst HpositiveI
turn + %ollo5ed by a HnegativeI turn w+ %ollo5ed by [ + until the 5alker returns to the starting verte) F. The sum ]
w ] [ o% all turning angles during one complete tour o% , is& 2K i% 2 is inside ,+ and 0 i% 2 is outside ,.
277
")hibit 2<.7& 2oint#in polygon test by adding up all turning angles.
The second algorithm computes the crossing number o% , 5ith respect to 2. (ra5 a semi#in%inite ray 8 %rom 2
in any direction H")hibit 2<.DI. (uring the 5alk around the polygon , %rom an arbitrary starting verte) F back to F+
keep track o% 5hether the current oriented edge intersects 8+ and i% so+ 5hether the edge crosses 8 %rom belo5 H]1I
or %rom above HU1I. The sum o% all these numbers is ]1 i% 2 is inside ,+ and 0 i% 2 is outside ,.
")hibit 2<.D& 2oint#in polygon test by adding up crossing numbers.
2oint#in#conve)#polygon test
/or a conve) polygon = 5e use binary search to per%orm a point#in#polygon test in time CHlog nI. Consider the
hierarchical decomposition o% = illustrated by the conve) 12#gon sho5n in ")hibit 2<.9. :e choose three
Happro)imatelyI e0uidistant vertices as the vertices o% an innermost core triangle+ painted black. E"0uidistantE here
re%ers not to any "uclidean distance+ but rather to the number o% vertices to be traversed by traveling along the
perimeter o% =. /or a 0uery point 2 5e %irst ask+ in time CH1I+ 5hich o% the seven regions de%ined by the e)tended
edges o% this triangular core contains 2. These seven regions sho5n in ")hibit 2<.10 are all EtrianglesE Halbeit si) o%
them e)tend to in%inityI+ in the sense that each one is de%ined as the intersection o% three hal%#spaces. /our o% these
regions provide a de%inite ans5er to the 0uery E$s 2 inside =+ or outside =?E Cne region Hsho5n hatched in ")hibit
2<.10I provides the ans5er ;$n;+ three the ans5er ;Cut;. The remaining three regions+ labeled ;*ncertain;+ lead
recursively to a ne5 point#in#conve)#polygon test+ %or the same 0uery point 2+ but a ne5 conve) polygon =; 5hich is
the intersection o% = 5ith one o% the uncertain regions. As =; has only about n Q 3 vertices+ the depth o% recursion is
CHlog nI. Actually+ a%ter the %irst comparison against the innermost triangular core o% =+ 5e have no longer a general
point#in#conve)#polygon problem+ but one 5ith additional in%ormation that makes all but the %irst test steps o% a
binary search.
")hibit 2<.9& Aierarchical appro)imation o% a conve) 12#gon as a 3#level tree o% triangles. The root is in
black+ its children are in dark grey+ grandchildren in light grey.
")hibit 2<.10& The plane partitioned into %our regions o% certainty and three o% uncertainty.
The latter are processed recursively.
Bisibility in the plane# a simple algorithm *hose analysis is not
'any computer graphics programs are dominated by visibility problems& ,iven a con%iguration o% ob3ects in
three#dimensional space+ and given a point o% vie5+ 5hat is visible? (o6ens o% algorithms %or hidden#line or hidden#
sur%ace elimination have been developed to solve this everyday problem that our visual system per%orms Eat a
glanceE. $n contrast to the problems discussed above+ visibility is surprisingly comple). :e give a hint o% this
comple)ity by describing some o% the details buried belo5 the smooth sur%ace o% a EsimpleE version& computing the
visibility o% line segments in the plane.
Problem: ,iven n line segments in the plane+ compute the se0uence o% HsubIsegments seen by an observer at
in%inity Hsay+ at y V U_I.
279
The comple)ity o% this problem 5as une)pected until discovered in 19D> N:. DDO. /ortunately+ this comple)ity
is revealed not by re0uiring complicated algorithms+ but in the analysis o% the inherent comple)ity o% the geometric
problem. The e)ample sho5n in ")hibit 2<.11 illustrates the input data. The endpoints H21+ 210I+ H22+ 2DI+ H2@+ 212I o%
the three line segments labeled 1+ 2+ 3 are givenL other points are computed by the algorithm. The re0uired result is
a list o% visible segments+ each segment described by its endpoints and by the identi%ier o% the line o% 5hich it is a
part&
H21+ 23+ 1I+ H23+ 2<+ 2I+ H2@+ 2>+ 3I+ H2>+ 2D+ 2I+ H27+ 29+ 3I+ H29+ 210+ 1I+ H211+ 212+ 3I
")hibit 2<.11& ")ample& Three line segments seen %rom belo5 generate seven visible subsegments.
$n search o% algorithms+ the reader is encouraged to 5ork out the details o% the %irst idea that might come to
mind& /or each o% the n
2
ordered pairs HLi+ L3I o% line segments+ remove %rom Li the subsegment occluded by L3.
Because Li can get cut into as many as n pieces+ it must be managed as a se0uence o% subsegments. /inding the
endpoints o% L3 in this se0uence 5ill take time CHlog nI+ leading to an overall algorithm o% time comple)ity CHn
2
K log
nI.
A%ter the reader has mastered the s5eep algorithm %or line intersection presented in R2lane#s5eep& a general#
purpose algorithm %or t5o#dimensional problems illustrated using line segment intersectionS+ he 5ill see that its
straight%or5ard application to the line visibility problem re0uires time CHHn ] kI K log nI+ 5here k CHn
2
I is the
number o% intersections. Thus plane#s5eep appears to do all the 5ork the brute#%orce algorithm above does+
organi6ed in a systematic le%t#to#right %ashion. $t keeps track o% all intersections+ most o% 5hich may be invisible. $t
has the potential to 5ork in time CHn K log nI %or many realistic data con%igurations characteri6ed by k CHnI+ but
not in the 5orst case.
Divide-and-con,uer yields a simple t5o#dimensional visibility algorithm 5ith a better 5orst#case per%ormance.
$% n V 0 or 1+ the problem is trivial. $% n d 1+ partition the set o% n line segments into t5o Happro)imateI halves+ solve
both subproblems+ and merge the results. There is no constraint on ho5 the set is halved+ so the divide step is easy.
The con0uer step is taken care o% by recursion. 'erging amounts to computing the minimum o% t5o piece5ise Hnot
necessarily continuousI linear %unctions+ in time linear in the number o% pieces. The e)ample 5ith n V < sho5n in
")hibit 2<.12 illustrates the algorithm. %12 is the visible %ront o% segments 1 and 2+ %3< o% segments 3 and <+ minH%12+
%3<I o% all %our segments H")hibit 2<.13I.
")hibit 2<.12& The %our line segments 5ill be partitioned into subsets a1+ 2b and a3+
<b.
")hibit 2<.13& The min operation merges the solutions o% this divide#and#con0uer
algorithm.
The time comple)ity o% this divide#and#con0uer algorithm is obtained as %ollo5s. ,iven that at each level o%
recursion the relevant sets o% line segments can be partitioned into Happro)imateI halves+ the depth o% recursion is
CHlog nI. A merge step that processes v visible subsegments takes linear time CHvI. Together+ all the merge steps at
2D1
a given depth process at most F subsegments+ 5here F is the total number o% visible subsegments. Thus the total
time is bounded by CHF K log nI. Ao5 large can F be?
.urprising theoretical results
Let FHnI be the number o% visible subsegments in a given con%iguration o% n lines+ i.e. the si6e o% the output o% the
visibility computation. /or tiny n+ the 5orst cases NFH2I V <+ FH3I V DO are sho5n in ")hibit 2<.1<. An attempt to
%ind 5orst#case con%igurations %or general n leads to e)amples such as that sho5n in /igure 2<.1@+ 5ith FHnI V @Kn U
D.
")hibit 2<.1<& Con%igurations 5ith the largest number o% visible subsegments.
/igure 2<.1@& A %amily o% con%igurations 5ith @Kn U D visible subsegments.
Xou 5ill %ind it di%%icult to come up 5ith a class o% con%igurations %or 5hich FHnI gro5s %aster. $t is tempting to
con3ecture that FHnI CHnI+ but this con3ecture is very hard to prove # %or the good reason that it is %alse+ as 5as
discovered in N:. DDO. $t turns out that FHnI Hn K HnII+ 5here HnI+ the inverse o% Ackermann;s %unction Hsee
RComputability and comple)ityS+ ")ercise 2I+ is a monotonically increasing %unction that gro5s so slo5ly that %or
practical purposes it can be treated as a constant+ call it .
Let us present some o% the steps o% ho5 this surprising result 5as arrived at. Cccasionally+ simple geometric
problems can be tied to deep results in other branches o% mathematics. :e trans%orm the t5o#dimensional visibility
problem into a combinatorial string problem. By numbering the given line segments+ 5alking along the )#a)is %rom
le%t to right+ and 5riting do5n the number o% the line segment that is currently visible+ 5e obtain a se0uence o%
numbers H")hibit 2<.1>I.
")hibit 2<.1>& The (avenport#.chin6el se0uence associated 5ith a con%iguration o%
segments.
A geometric con%iguration gives rise to a se0uence u
1
+ u
2
+ [ + u
m
5ith the %ollo5ing properties&
1. 1 ui

n %or 1 i m Hnumbers identi%y line segmentsI.
2. ui

f ui]1 %or 1 i m U 1 Hno t5o consecutive numbers are e0ualI.
3. There are no %ive indices 1 a e b e c e d e e m such that ua

V uc

V ue

V r and ub

V ud

V s+ r f s. This
condition captures the geometric properties o% t5o intersecting straight lines& $% 5e ever see r+ s+ r+ s
Hpossibly separatedI+ 5e 5ill never see r again+ as this 5ould imply that r and s intersect more than once
H")hibit 2<.17I.
")hibit 2<.17& The subse0uence r+ s+ r+ s e)cludes %urther occurrences o% r.
")ample
The se0uence %or the e)ample above that sho5s m @ n U D is
1+ 2+ 1+ 3+ 1+ [ + 1+ nU1+ 1+ nU1+ nU2+ nU3+ [ + 3+ 2+ n+ 2+ n+ 3+ n+ [ + n+ nU2+ n+ nU1+ n.
2D3
.e0uences 5ith the properties 1 to 3+ called Davenport-Schin&el se,uences+ have been studied in the conte)t o%
linear di%%erential e0uations. The ma)imal length o% a (avenport#.chin6el se0uence is k K n K HnI+ 5here k is a
constant and HnI is the inverse o% Ackermann;s %unction Hsee RComputability and comple)ityS+ ")ercise 2I NA. D>O.
:ith increasing n+ HnI approaches in%inity+ albeit very slo5ly. This dampens the hope %or a linear upper bound %or
the visibility problem+ but does not yet disprove the con3ecture. /or the latter+ 5e need an inverse& /or any given
(avenport#.chin6el se0uence there e)ists a corresponding geometric con%iguration 5hich yields this se0uence. An
e)plicit construction is given in N:. DDO. This establishes an isomorphism bet5een the t5o#dimensional visibility
problem and the (avenport#.chin6el se0uences+ and sho5s that the si6e o% the output o% the t5o#dimensional
visibility problem can be superlinear # a result that challenges our geometric intuition.
")ercises
1. ,iven a set o% points .+ prove that the pair o% points %arthest %rom each other must be vertices o% the conve)
hull AH.I.
2. Assume a model o% computation in 5hich the operations addition+ multiplication+ and comparison are
available at unit cost. 2rove that in such a model Hn K log nI is a lo5er bound %or computing+ in order+ the
vertices o% the conve) hull AH.I o% a set . o% n points. =int: .ho5 that every algorithm 5hich computes the
conve) hull o% n given points can be used to sort n numbers.
3. Complete the second algorithm %or the point#in#polygon test in chapter 2< in the section RThe uses o%
conve)ity& basic operations on polygonsS 5hich computes the crossing number o% the polygon , around
point 2 by addressing the special cases that arise 5hen the semi#in%inite ray 8 emanating %rom 2 intersects
a verte) o% , or overlaps an edge o% ,.
<. Consider an arbitrary Hnot necessarily simpleI polygon , H")hibit 2<.1DI. 2rovide an interpretation %or the
5inding number 5H,+ 2I o% , around an arbitrary point 2 not on ,+ and prove that 5H,+ 2I Q 2K o% 2 is
al5ays e0ual to the crossing number o% 2 5ith respect to any ray 8 emanating %rom 2.
")hibit 2<.1D& :inding number and crossing number o% a polygon , 5ith respect to 2.
@. (esign an algorithm that computes the area o% an n#verte) simple+ but not necessarily conve) polygon in
HnI time.
>. :e consider the problem o% computing the intersection o% t5o conve) polygons 5hich are given by their
lists o% vertices in cyclic order.
HaI .ho5 that the intersection is again a conve) polygon.
HbI (esign an algorithm that computes the intersection. :hat is the time comple)ity o% your algorithm?
Algorithms and Data Structures 2D< A ,lobal Te)t
7. Intersection test for line / and Hconve*I polgon G $% an Hin%initely e)tendedI line L intersects a polygon =+
it must intersect one o% =;s edges. Thus a test %or intersection o% a given line L 5ith a polygon can be
reduced to repeated test o% L %or intersection 5ith Nsome o%O =;s edges.
HaI 2rove that+ in general+ a test %or line#polygon intersection must check at least n U 2 o% =;s edges. =int:
*se an adversary argument. $% t5o edges remain unchecked+ they could be moved so as to invalidate
the ans5er.
HbI (esign a test that 5orks in time CHlog nI %or decoding 5hether a line L intersects a conve) polygon =.
D. (ivide#and#con0uer algorithms may divide the space in 5hich the data is embedded+ rather than the set o%
data Hthe set o% linesI. (escribe an algorithm %or computing the se0uence o% visible segments that partitions
the space recursively into vertical stripes+ until each stripe is Esimple enoughEL describe ho5 you choose the
boundaries o% the stripesL state advantages and disadvantages o% this algorithm as compared to the one
described in chapter 2< in the section RFisibility in the plane& a simple algorithm 5hose analysis is notS.
Analy6e the asymptotic time comple)ity o% this algorithm.
2D@
-4& !lane)s*eep# a general)
purpose algorithm for t*o)
dimensional problems
illustrated using line segment
intersection
line segment intersection test
turning space dimensions into time dimensions
updating a y table and detecting intersections
s5eeping across and intersection
2lane#s5eep is an algorithm schema %or t5o#dimensional geometry o% great generality and e%%ectiveness+ and
algorithm designers are 5ell advised to try it %irst. $t 5orks %or a surprisingly large set o% problems+ and 5hen it
5orks+ tends to be very e%%icient. 2lane#s5eep is easiest to understand under the assumption o% nondegenerate
con%igurations. A%ter e)plaining plane#s5eep under this assumption+ 5e remark on ho5 degenerate cases can be
handled 5ith plane#s5eep.
The line segment intersection test
:e present a plane#s5eep algorithm N.A 7>O %or the line segment intersection test&
:iven n line segments in the plane, determine whether any two
intersect&
and if so, compute a witness *i.e. a pair of segments that
intersect+.
Bounds on the comple)ity o% this problem are easily obtained. The literature on computational geometry He.g.
N2. D@OI proves a lo5er bound Hn K log nI. The obvious brute %orce approach o% testing all n K Hn U 1I Q 2 pairs o%
line segments re0uires Hn
2
I time. This 5ide gap bet5een n K log n and n
2
is a challenge to the algorithm designer+
5ho strives %or an optimal algorithm 5hose asymptotic running time CHn K log nI matches the lo5er bound.
(ivide#and#con0uer is o%ten the %irst attempt to design an algorithm+ and it comes in t5o variants illustrated in
/ig. 2@.1& H1I (ivide the data+ in this case the set o% line segments+ into t5o subsets o% appro)imately e0ual si6e Hi.e.
n Q 2 line segmentsI+ or H2I divide the embedding space+ 5hich is easily cut in e)act halves.
Algorithms and Data Structures 2D> A ,lobal Te)t
25. 5lane!s6eep: a general!purpose algorithm for t6o!dimensional problems illustrated using line segment
intersection
")hibit 2@.1& T5o 5ays o% applying divide#and#con0uer to a set o% ob3ects embedded in the plane.
$n the %irst case+ 5e hope %or a separation into subsets .1 and .2 that permits an e%%icient test 5hether any line
segment in .1 intersects some line segment in .2. ")hibit 2@.1 sho5s the ideal case 5here .1 and .2 do not interact+
but o% course this cannot al5ays be achieved in a nontrivial 5ayL and even i% . can be separated as the %igure
suggests+ %inding such a separating line looks like a more %ormidable problem than the original intersection
problem. Thus+ in general+ 5e have to test each line segment in .1 against every line segment in .2+ a test that may
take Hn
2
I time.
The second approach o% dividing the embedding space has the un%ortunate conse0uence o% e%%ectively increasing
our data set. "very segment that straddles the dividing line gets EcutE Hi.e. processed t5ice+ once %or each hal%
spaceI. The t5o resulting subproblems 5ill be o% si6e n; and nE+ respectively+ 5ith n; ] nE d n+ in the 5orst case n; ]
nE V 2 K n. At recursion depth d 5e may have 2
d
K n subsegments to process. !o optimal algorithm is kno5n that
uses this techni0ue.
The key idea in designing an optimal algorithm is the observation that those line segments that intersect a
vertical line L at abscissa ) are totally ordered& A segment s lies belo5 segment t+ 5ritten s eL

t+ i% both intersect L at
the current position ) and the intersection o% s 5ith L lies belo5 the intersection o% t 5ith L. :ith respect to this
order a line segment may have an upper and a lo5er neighbor+ and ")hibit 2@.2 sho5s that s and t are neighbors at
).
")hibit 2@.2& The s5eep line L totally orders the segments that intersect L.
:e describe the intersection test algorithm under the assumption that the con%iguration is nondegenerate Hi.e.
no three segments intersect in the same pointI. /or simplicity;s sake 5e also assume that no segment is vertical+ so
every segment has a le%t endpoint and a right endpoint. The latter assumption entails no loss o% generality& /or a
vertical segment+ 5e can arbitrarily de%ine the lo5er endpoint to be the Ele%t endpointE+ thus imposing a
le)icographic H)+ yI#order to re%ine the )#order. :ith the important assumption o% non#degeneracy+ t5o line
segments s and t can intersect at )
0
only i% there e)ists an abscissa ) e )
0
5here s and t are neighbors. Thus it
2D7
su%%ices to test all segment pairs that become neighbors at some time during a le%t#to#right s5eep o% L # a number
that is usually signi%icantly smaller than n K Hn U 1I Q 2.
As the s5eep line L moves %rom le%t to right across the con%iguration+ the order eL among the line segments
intersecting L changes only at endpoints o% a segment or at intersections o% segments. As 5e intend to stop the
s5eep as soon as 5e discover an intersection+ 5e need to per%orm the intersection test only at the le%t and right
endpoints o% segments. A segment t is tested at its le%t endpoint %or intersection 5ith its lo5er and upper neighbors.
At the right endpoint o% t 5e test its lo5er and upper neighbor %or intersection H")hibit 2@.3I.
The algorithm terminates as soon as 5e discover an intersecting pair o% segments. ,iven n segments+ each o%
")hibit 2@.3& Three pair5ise intersection tests charged to segment t.
5hich may generate three intersection tests as sho5n in ")hibit 2@.3 Ht5o at its le%t+ one at its right endpointI+ 5e
per%orm the CH1I pair5ise segment intersection test at most 3 K n times. This linear bound on the number o% pairs
tested %or intersection might raise the hope o% %inding a linear#time algorithm+ but so %ar 5e have counted only the
geometric primitive& E(oes a pair o% segments intersect # yes or no?E Aiding in the background 5e %ind bookkeeping
operations such as E/ind the upper and lo5er neighbor o% a given segmentE+ and these turn out to be costlier than
the geometric ones. :e 5ill %ind neighbors e%%iciently by maintaining the order eL in a data structure called a y#
table during the entire s5eep.
The s(eleton# Turning a space dimension into a time dimension
The name plane-sweep is derived %rom the image o% s5eeping the plane %rom le%t to right 5ith a vertical line
H%ront+ or cross sectionI+ stopping at every transition point HeventI o% a geometric con%iguration to update the cross
section. All processing is done at this moving %ront+ 5ithout any backtracking+ 5ith a look#ahead o% only one point.
The events are stored in the )#0ueue+ and the current cross section is maintained by the y#table. The skeleton o% a
plane#s5eep algorithm is as %ollo5s&
initB& inita&
while not emptyB do { e -. nextB& transition*e+ }
The procedures ;initW; and ;initX; initiali6e the )#0ueue and the y#table. ;ne)tW; returns the ne)t event in the )#
0ueue+ ;emptyW; tells us 5hether the )#0ueue is empty. The procedure ;transition;+ the advancing mechanism o% the
s5eep+ embodies all the 5ork to be done 5hen a ne5 event is encounteredL it moves the %ront %rom the slice to the
le%t o% an event e to the slice immediately to the right o% e.
Data structures
/or the line segment intersection test+ the )#0ueue stores the le%t and right endpoints o% the given line segments+
ordered by their )#coordinate+ as events to be processed 5hen updating the vertical cross section. "ach endpoint
stores a re%erence to the corresponding line segment. :e compare points by their )#coordinates 5hen building the
Algorithms and Data Structures 2DD A ,lobal Te)t
intersection
)#0ueue. /or simplicity o% presentation 5e assume that no t5o endpoints o% line segments have e0ual )# or y#
coordinates. The only operation to be per%ormed on the )#0ueue is ;ne)tW;& it returns the ne)t event Hi.e. the ne)t le%t
or right endpoint o% a line segment to be processedI. The cost %or initiali6ing the )#0ueue is CHn K log nI+ the cost %or
per%orming the ;ne)tW; operation is CH1I.
The y#table contains those line segments that are currently intersected by the s5eep line+ ordered according to
eL. $n the slice bet5een t5o events+ this order does not change+ and the y#table needs no updating H")hibit 2@.<I.
The y#table is a dictionary that supports the operations ;insertX;+ ;deleteX;+ ;succX;+ and ;predX;. :hen entering the
le%t endpoint o% a line segment s 5e %ind the place 5here s is to be inserted in the ordering o% the y#table by
comparing s to other line segments t already stored in the y#table. :e can determine 5hether s eL

t or t eL

s by
determining on 5hich side o% t the le%t endpoint o% s lies. As 5e have seen in chapter 1< in the section R$ntersectionS+
this tends to be more e%%icient than computing and comparing the intersection points o% s and t 5ith the s5eep line.
$% 5e implement the dictionary as a balanced tree He.g. an AFL treeI+ the operations ;insertX; and ;deleteX; are
per%ormed in CHlog nI time+ and ;succX; and ;predX; are per%ormed in CH1I time i% additional pointers in each node
o% the tree point to the successor and predecessor o% the line segment stored in this node. .ince there are 2 K n
events in the )#0ueue and at most n line segments in the y#table the space comple)ity o% this plane#s5eep algorithm
is CHnI.
")hibit 2@.<& The y#table records the varying state o% the s5eep line L.
Fpdating the y)table and detecting an intersection
The procedure ;transition; maintains the order eL o% the line segments intersecting the s5eep line and per%orms
intersection tests. At a le%t endpoint o% a segment t+ t is inserted into the y#table and tested %or intersection 5ith its
lo5er and upper neighbors. At the right endpoint o% t+ t is deleted %rom the y#table and its t5o %ormer neighbors are
tested. The algorithm terminates 5hen an intersection has been %ound or all events in the )#0ueue have been
processed 5ithout %inding an intersection&
procedure transition*e- event+&
egin
s -. segment*e+&
if left6oint*e+ then egin
inserta*s+&
2D9
if intersect*preda*s+, s+ or intersect *s, succa*s++ then
terminate*,intersection found,+
end
else { e is right endpoint of s } egin
if intersect*preda*s+, succa*s++ then
terminate*,intersection found,+&
deletea*s+
end
end&
:ith at most 2 K n events+ and a call o% ;transition; costing time CHlog nI+ this plane#s5eep algorithm needs CHn K
log nI time to per%orm the line segment intersection test.
S*eeping across intersections
The plane#s5eep algorithm %or the line segment intersection test is easily adapted to the %ollo5ing more general
problem NBC 79O&
,iven n line segments+ report all intersections.
$n addition to the le%t and right endpoints+ the )#0ueue no5 stores intersection points as eventsYany
intersection detected is inserted into the )#0ueue as an event to be processed. :hen the s5eep line reaches an
intersection event the t5o participating line segments are s5apped in the y#table H")hibit 2@.@I. The ma3or increase
in comple)ity as compared to the segment intersection test is that no5 5e must process not only 2 K n events+ but 2 K
n ] k events+ 5here k is the number o% intersections discovered as 5e s5eep the plane. A con%iguration 5ith n Q 2
segments vertical and n Q 2 hori6ontal sho5s that+ in the 5orst case+ k Hn
2
I+ 5hich leads to an CHn
2
K log nI
algorithm+ certainly no improvement over the brute#%orce comparison o% all pairs. $n most realistic con%igurations+
say engineering dra5ings+ the number o% intersections is much less than CHn
2
I+ and thus it is in%ormative to
introduce the parameter k in order to get an output#sensitive bound on the comple)ity o% this algorithm Hi.e. a
bound that adapts to the amount o% data needed to report the result o% the computationI.
")hibit 2@.@& .5eeping across an intersection.
Cther changes are comparatively minor. The )#0ueue must be a priority 0ueue that supports the operation
;insertW;L it can be implemented as a heap. The cost %or initiali6ing the )#0ueue remains CHn K log nI. :ithout
%urther analysis one might presume that the storage re0uirement o% the )#0ueue is CHn ] kI+ 5hich implies that the
cost %or calling ;insertW; and ;ne)tW; remains CHlog nI+ since k CHn
2
I. A more detailed analysis N2. 91O+ ho5ever+
sho5s that the si6e o% the )#0ueue never e)ceeds CHn K Hlog nI
2
I. :ith a slight modi%ication o% the algorithm NBro D1O
intersection
it can even be guaranteed that the si6e o% the )#0ueue never e)ceeds CHnI. The cost %or e)changing t5o intersecting
line segments in the y#table is CHlog nI+ the costs %or the other operations on the y#table remain the same. .ince
there are 2 K n le%t and right endpoints and k intersection events+ the total cost %or this algorithm is CHHn ] kI K log
nI. As most realistic applications are characteri6ed by k CHnI+ reporting all intersections o%ten remains an CHn K
log nI algorithm in practice. A time#optimal algorithm that %inds all int ersecting pairs o% line segments in CHn K log n
] kI time using CHn ] kI storage space is described in NC" 92O.
Degenerate configurations$ numerical errors$ robustness
The discussion above is based on several assumptions o% nondegeneracy+ some o% minor and some o% ma3or
importance. Let us e)amine one o% each type.
:henever 5e access the )#0ueue H;ne)tW;I+ 5e used an implicit assumption that no t5o events Hendpoints or
intersectionsI have e0ual )#coordinates. The order o% processing events o% e0ual )#coordinate is irrelevant.
Assuming that no t5o events coincide at the same point in the plane+ le)icographic H)+ yI#ordering is a convenient
systematic 5ay to de%ine ;ne)tW;.
'ore serious %orms o% degeneracy arise 5hen events coincide in the plane+ such as more than t5o segments
intersecting in the same point. This type o% degeneracy is particularly di%%icult to handle in the presence o%
numerical errors+ such as rounding errors. $n the con%iguration sho5n in ")hibit 2@.> an endpoint o% u lies e)actly
or nearly on segment s. :e may not care 5hether the intersection routine ans5ers ;yes; or ;no; to the 0uestion E(o s
and u intersect?E but 5e certainly e)pect a ;yes; 5hen asking E(o t and u intersect?E This e)ample sho5s that the
slightest numerical inaccuracy can cause a serious error& The algorithm may %ail to report the intersection o% t and
u+ 5hich it 5ould clearly see i% it bothered to look # but the algorithm looks the other 5ay and never asks the
0uestion E(o t and u intersect?E
")hibit 2@.>& A degenerate con%iguration may lead to inconsistent
results.
The trace o% the plane#s5eep %or reporting intersections may look as %ollo5s&
1. s is inserted into the y#table
2. t is inserted above s into the y#table+ and s and t are tested %or intersection& !o intersection is %ound
3. u is inserted belo5 s in the y#table Hsince the evaluation o% the %unction sH)I may conclude that the le%t
endpoint o% u lies belo5 sIL s and u are tested %or intersection+ but the intersection routine may conclude
that s and u do not intersect& u remains belo5 s
291
<. (elete u %rom the y#table
@. (elete s %rom the y#table
>. (elete t %rom the y#table
!otice the calamity that struck at the critical step 3. The evaluation o% a linear e)pression sH)I and the
intersection routine %or t5o segments both arrived at a result that+ in isolation+ is reasonable 5ithin the tolerance o%
the underlying arithmetic. The t5o results together are inconsistentM $% the evaluation o% sH)I concludes that the le%t
endpoint o% u lies belo5 s+ the intersection routine must conclude that s and u intersectM $% these t5o geometric
primitives %ail to coordinate their ans5ers+ catastrophe may strike. $n our e)ample+ u and t never become neighbors
in the y#table+ so their intersection gets lost.
")ercises
1. .ho5 that there may be Hn
2
I intersections in a set o% n line segments.
2. (esign a plane#s5eep algorithm that determines in CHn K log nI time 5hether t5o simple polygons 5ith a
total o% n vertices intersect.
3. (esign a plane#s5eep algorithm that determines in CHn K log nI time 5hether any t5o disks in a set o% n
disks intersect.
<. (esign a plane#s5eep algorithm that solves the line visibility problem discussed in chapter 2< in the section
RFisibility in the plane& a simple algorithm 5hose analysis is notS in time CHHn ] kI K log nI+ 5here k CHn
2
I
is the number o% intersections o% the line segments.
@. ,ive a con%iguration 5ith the smallest possible number o% line segments %or 5hich the %irst intersection
point reported by the plane#s5eep algorithm in chapter 2@ in the section R.5eeping across intersectionsS is
not the le%tmost intersection point.
>. Adapt the plane#s5eep algorithm presented in chapter 2@ in the section R.5eeping across intersectionsS to
detect all intersections among a given set o% n hori6ontal or vertical line segments. Xou may assume that the
line segments do not overlap. :hat is the time comple)ity o% this algorithm i% the hori6ontal and vertical
line segments intersect in k points?
7. (esign a plane#s5eep algorithm that %inds all intersections among a given set o% n rectangles all o% 5hose
sides are parallel to the coordinate a)es. :hat is the time comple)ity o% your algorithm?
2#. )he closest pair
-5& The closest pair
Applying+ implementing and analy6ing plane s5eep
*sing plane s5eep on three or more dimensions
.5eep algorithms solve many kinds o% pro)imity problems e%%iciently. :e present a simple s5eep that solves the
t5o#dimensional closest pair problem elegantly in asymptotically optimal time. :e e)plain 5hy s5eeping
generali6es easily+ but not e%%iciently+ to multidimensional closest pair problems.
The problem
:e consider the t5o#dimensional closest pair problem& ,iven a set . o% n points in the plane %ind a pair o%
points 5hose distance is smallest H")hibit 2>.1I. :e measure distance using the metric dk+

%or any k Z 1+ or d_+
de%ined as&
")hibit 2>.1& $denti%y a closest pair among n points in the plane.
.pecial cases o% interest include the E'anhattan metricE d1+ the E"uclidean metricE d2+ and the Ema)imum
metricE d_. ")hibit 2>.2 sho5s the EcirclesE o% radius 1 centered at a point p %or some o% these metrics.
")hibit 2>.2& The results o% this chapter remain valid 5hen distances are measured in various metrics.

The closest pair problem has a lo5er bound Hn K log nI in the algebraic decision tree model o% computation N2.
D@O. $ts solution can be obtained in asymptotically optimal time CHn K log nI as a special case o% more general
problems+ such as ;all#nearest#neighbors; NA!. 92O H%or each point+ %ind a nearest neighborI+ or constructing the
Foronoi diagram N.A 7@O. These general approaches call on po5er%ul techni0ues that make the resulting algorithms
harder to understand than one 5ould e)pect %or a simply stated problem such as E%ind a closest pairE. The divide#
and#con0uer algorithm presented in NB. 7>O solves the closest pair problem directly in optimal 5orst#case time
comple)ity Hn K log nI using the "uclidean metric d2. :hereas the recursive divide#and#con0uer algorithm
293
involves an intricate argument %or combining the solutions o% t5o e0ually si6ed subsets+ the iterative plane#s5eep
algorithm NA!. DDO uses a simple incremental update& .tarting 5ith the empty set o% points+ keep adding a single
point until the %inal solution %or the entire set is obtained. A similar plane#s5eep algorithm solves the closest pair
problem %or a set o% conve) ob3ects NBA 92O.
!lane)s*eep applied to the closest pair problem
The skeleton o% the general s5eep algorithm presented in chapter 2@ in the section RThe skeleton& turning a
space dimension into a time dimensionS+ 5ith the data structures )#0ueue and y#table+ is adapted to the closest pair
problem as sho5n in ")hibit 2>.3. The *-,ueue stores the points o% the set .+ ordered by their )#coordinate+ as
events to be processed 5hen updating the vertical cross section. T5o pointers into the )#0ueue+ ;tail; and ;current;+
partition . into %our dis3oint subsets&
1. The discarded points to the le%t o% ;tail; are not accessed any longer
2. The active points bet5een ;tail; HinclusiveI and ;current; He)clusiveI are being 0ueried
3. The current transition point+ p+ is being processed
<. The %uture points have not yet been looked at
The y#table stores the active points only+ ordered by their y#coordinate.
")hibit 2>.3& *pdating the invariant as the ne)t point p is processed.
:e need to compare points by their )#coordinates 5hen building the )#0ueue+ and by their y#coordinates 5hile
s5eeping. /or simplicity o% presentation 5e assume that no t5o points have e0ual )# or y#coordinates. 2oints 5ith
e0ual )# or y#coordinates are handled by imposing an arbitrary+ but consistent+ total order on the set o% points. :e
achieve this by de%ining t5o le)icographic orders& e
)
to be used %or the )#0ueue+ e
y
%or the y#table&
The program o% the %ollo5ing section initiali6es the )#0ueue and y#table 5ith the t5o le%tmost points being
active+ 5ith e0ual to their distance+ and starts the s5eep 5ith the third point.
The distinction bet5een discarded and active points is motivated by the %ollo5ing argument. :hen a ne5 point
p is encountered 5e 5ish to ans5er the 0uestion 5hether this point %orms a closest pair 5ith one o% the points to its
le%t. :e keep a pair o% closest points seen so %ar+ along 5ith the corresponding minimal distance . There%ore+ all
candidates that may %orm a ne5 closest pair 5ith the point p on the s5eep line lie in a hal% circle centered at p+ 5ith
radius .
The key 0uestion to be ans5ered in striving %or e%%iciency is ho5 to retrieve 0uickly all the points seen so %ar that
lie inside this hal% circle to the le%t o% p+ in order to compare their distance to p against the minimal distance seen
so %ar. :e may use any help%ul data structure that organi6es the points seen so %ar+ as long as 5e can update this
data structure e%%iciently across a transition. A circle Hor hal%#circleI 0uery is comple)+ at least 5hen embedded in a
plane#s5eep algorithm that organi6es data according to an orthogonal coordinate system. A rectangle 0uery can be
ans5ered more e%%iciently. Thus 5e replace the hal%#circle 0uery 5ith a bounding rectangle 0uery+ accepting the %act
that 5e might include some e)traneous points+ such as 0.
The rectangle 0uery in ")hibit 2>.3 is implemented in t5o steps. /irst+ 5e cut o%% all the points to the le%t at
distance Z %rom the s5eep line. These points lie bet5een ;tail; and ;current; in the )#0ueue and can be discarded
easily by advancing ;tail; and removing them %rom the y#table. .econd+ 5e consider only those points 0 in the #slice
5hose vertical distance %rom p is less than & \0y

U py\ e . These points can be %ound in the y#table by looking at
successors and predecessors starting at the y#coordinate o% p. $n other 5ords+ 5e maintain the %ollo5ing invariant
across a transition&
1. x is the minimal distance bet5een a pair o% points seen so %ar Hdiscarded or activeI.
2. The active points H%ound in the )#0ueue bet5een ;tail; and ;current;+ and stored in the y#table ordered by y#
coordinatesI are e)actly those that lie in the interior o% a x#slice to the le%t o% the s5eep line.
3. There%ore+ processing the transition point p involves three steps&
<. (elete all points 0 5ith 0) ` p) U x%rom the y#table. They are %ound by advancing ;tail; to the right.
@. $nsert p into the y#table.
>. /ind all points 0 in the y#table 5ith \0y U py\ e x by looking at the successors and predecessors o% p. $% such
a point 0 is %ound and its distance %rom p is smaller than x+ update x and the closest pair %ound so %ar.
"mplementation
$n the %ollo5ing implementation the )#0ueue is reali6ed by an array that contains all the points sorted by their )#
coordinate. ;closestLe%t; and ;closest8ight; describe the pair o% closest points %ound so %ar+ n is the number o% points
under consideration+ and t and c determine the positions o% ;tail; and ;current;&
xQueue- arrayF" .. maxDG of point&
closest8eft, closest5ight- point&
t, c, n- " .. maxD&
29@
The xMAueue is initialiLed y
procedure initB&
,initB, stores all the points into the xMAueue, ordered y their xM
coordinates.
The empty yMtale is created y
procedure inita&
O new point is inserted into the yMtale y
procedure inserta*p- point+&
O point is deleted from the yMtale y
procedure deletea*p- point+&
The successor of a point in the yMtale is returned y
function succa*p- point+- point&
The predecessor of a point in the yMtale is returned y
function preda*p- point+- point&
The initiali6ation part o% the plane#s5eep is as %ollo5s&
initB& inita&
closest8eft -. xQueueF"G& closest5ight -. xQueueF$G&
delta -. distance*closest8eft, closest5ight+&
inserta*closest8eft+& inserta*closest5ight+&
c -. %&
The events are processed by the loop&
while c J n do egin transition& c -. c 9 "& { next event }
end&
The procedure ;transition; encompasses all the 5ork to be done %or a ne5 point&
procedure transition&
egin
{ step &= remove points outside the -slice from the y-table }
current -. xQueueFcG&
while current.x 0 xQueueFtG.x I delta do egin
deletea*xQueueFtG+& t -. t 9 "
end&
{ step 5= insert the new point into the y-table }
inserta*current+&
{ step #a= check the successors of the new point in the y-table }
check -. current&
repeat
check -. succa*check+&
new@elta -. distance*current, check+&
if new@elta S delta then egin
delta -. new@elta&
closest8eft -. check& closest5ight -. current&
end&
until check.y 0 current.y K delta&
{ step #b= check the predecessors of the new point in the y-
table }
check -. current&
repeat
check -. preda*check+&
new@elta -. distance*current, check+&
if new@elta S delta then egin
delta -. new@elta&
closest8eft -. check& closest5ight -. current&
end&
until current.y 0 check.y K delta&
end& { transition }
Analysis
:e sho5 that the algorithm described can be implemented so as to run in 5orst#case time CHn K log nI and space
CHnI.
$% the y#table is implemented by a balanced binary tree He.g. an AFL#tree or a 2#3#treeI the operations ;insertX;+
;deleteX;+ ;succX;+ and ;predX; can be per%ormed in time CHlog nI. The space re0uired is CHnI.
;initW; builds the sorted )#0ueue in time CHn K log nI using space CHnI. The procedure ;deleteX; is called at most
once %or each point and thus accumulates to CHn K log nI. "very point is inserted once into the y#table+ thus the calls
o% ;insertX; accumulate to CHn K log nI.
There remains the problem o% analy6ing step 3. The loop in step 3a calls ;succX; once more than the number o%
points in the upper hal% o% the bounding bo). .imilarly+ the loop in step 3b calls ;predX; once more than the number
o% points in the lo5er hal% o% the bounding bo). A standard counting techni0ue sho5s that the bounding bo) is
sparsely populated& /or any metric dk+ the bo) contains no more than a small+ constant number ck o% points+ and %or
any k+ ck

` D. Thus ;succX; and ;predX; are called no more than 10 times+ and step 3 costs time CHlog nI.
The key to this counting is the %act that no t5o points in the y#table can be closer than + and thus not many o%
them can be packed into the bounding bo) 5ith sides and 2 K . :e partition this bo) into the eight pair5ise
dis3oint+ mutually e)haustive regions sho5n in ")hibit 2>.<. These regions are hal% circles o% diameter in the
'anhattan metric d1+ and 5e %irst argue our case only 5hen distances are measured in this metric. !one o% these
hal%#circles can contain more than one point. $% a hal%#circle contained t5o points at distance + they 5ould have to
be at opposite ends o% the uni0ue diameter o% this hal%#circle. These endpoints lie on the le%t or the right boundary
o% the bounding bo)+ and these t5o boundary lines cannot contain any points+ %or the %ollo5ing reasons&
!o active point can be located on the le%t boundary o% the bounding bo)L such a point 5ould have been
thro5n out 5hen the #slice 5as last updated.
!o active point can e)ist on the right boundary+ as that )#coordinate is preempted by the transition point p
being processed Hremember our assumption o% une0ual )#coordinatesI.
297
")hibit 2>.<& Cnly %e5 points at pair5ise distance Z can populate a bo) o% si6e 2 K by .
:e have sho5n that the bounding bo) can hold no more than eight points at pair5ise distance Z 5hen using
the 'anhattan metric d1. $t is 5ell kno5n that %or any points p+ 0+ and %or any k d 1& d1Hp+0I d dkHp+0I d d_Hp+0I. Thus
the bounding bo) can hold no more than eight points at pair5ise distance Z 5hen using any distance dk or d_.
There%ore+ the calculation o% the predecessors and successors o% a transition point costs time CHlog nI and
accumulates to a total o% CHn K log nI %or all transitions. .umming up all costs results in CHn K log nI time and CHnI
space comple)ity %or this algorithm. .ince Hn K log nI is a lo5er bound %or the closest pair problem+ 5e kno5 that
this algorithm is optimal.
S*eeping in three or more dimensions
To gain insight into the po5er and limitation o% s5eep algorithms+ let us e)plore 5hether the algorithm
presented generali6es to higher#dimensional spaces. :e illustrate our reasoning %or three#dimensional space+ but
the same conclusion holds %or any number o% dimensions d 2. All o% the %ollo5ing steps generali6e easily.
.ort all the points according to their )#coordinate into the )#0ueue. .5eep space 5ith a y#6 plane+ and in
processing the current transition point p+ assume that 5e kno5 the closest pair among all the points to the le%t o% p+
and their distance . Then to determine 5hether p %orms a ne5 closest pair+ look at all the points inside a hal%#
sphere o% radius centered at p+ e)tending to the le%t o% p. $n the hope o% implementing this sphere 0uery e%%iciently+
5e enclose this hal% sphere in a bounding bo) o% side length 2 K in the y# and 6#dimension+ and in the )#
dimension. $nside this bo) there can be at most a small+ constant number ck o% points at pair5ise distance Z 5hen
using any distance dk or d_.
:e implement this bo) 0uery in t5o steps& H1I by cutting o%% all the points %arther to the le%t o% p than + 5hich is
done by advancing ;tail; in the )#0ueue+ and H2I by per%orming a s0uare 0uery among the points currently in the y#6#
table H5hich all lie in the #slice to the le%t o% the s5eep planeI+ as sho5n in ")hibit 2>.@. !o5 5e have reached the
only place 5here the three#dimensional algorithm di%%ers substantially. $n the t5o#dimensional case+ the
corresponding one#dimensional interval 0uery can be implemented e%%iciently in time CHlog nI using %ind+
predecessor+ and successor operations on a balanced tree+ and using the kno5ledge that the si6e o% the ans5er set is
bounded by a constant. $n the three#dimensional case+ the corresponding t5o#dimensional orthogonal range 0uery
cannot in general be ans5ered in time CHlog nI Hper retrieved pointI using any o% the kno5n data structures.
.traight%or5ard search re0uires time CHnI+ resulting in an overall time CHn
2
I %or the space s5eep. This is not an
interesting result %or a problem that admits the trivial CHn
2
I algorithm o% comparing every pair.
")hibit 2>.@& .5eeping a plane across three#dimensional space. $deas generali6e+ but e%%iciency does not.
.5eeping reduces the dimensionality o% a geometric problem by one+ by replacing one space dimension by a
Etime dimensionE. 8educing a t5o#dimensional problem to a se0uence o% one#dimensional problems is o%ten
e%%icient because the total order de%ined in one dimension allo5s logarithmic search times. $n contrast+ reducing a
three#dimensional problem to a se0uence o% t5o#dimensional problems rarely results in a gain in e%%iciency.
")ercises
1. Consider the %ollo5ing modi%ication o% the plane#s5eep algorithm %or solving the closest pair problem NBA
92O. :hen encountering a transition point p do not process all points 0 in the y#table 5ith \0y U py\ e + but
test only 5hether the distance o% p to its successor or predecessor in the y#table is smaller than . :hen
deleting a point 0 5ith 0) ` p)

U %rom the y#table test 5hether the successor and predecessor o% 0 in the y#
table are closer than . $% a pair o% points 5ith a smaller distance than the current is %ound update and
the closest pair %ound so %ar. 2rove that this modi%ied algorithm %inds a closest pair :hat is the time
comple)ity o% this algorithm?
2. (esign a divide#and#con0uer algorithm 5hich solves the closest pair problem. :hat is the time comple)ity
o% your algorithm? =int: 2artition the set o% n points by a vertical line into t5o subsets o% appro)imately n Q
2 points. .olve the closest pair problem recursively %or both subsets. $n the con0uer step you should use the
%act that is the smallest distance bet5een any pair o% points both belonging to the same subset. A point
%rom the le%t subset can only have a distance smaller than to a point in the right subset i% both points lie in
a 2 K #slice to the le%t and to the right o% the partitioning line. There%ore+ you only have to match points
lying in the le%t #slice against points lying in the right #slice.
299

Algorithms and Data Structures

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Algorithms and Data Structures

Uploaded by

Copyright:

Available Formats

Algorithms and Data Structures

With Applications to Graphics and Geometry

!nfinity, often used for a

Sum and product

continue doubling and halving

is also called transitive hull or transitive closure+ since

o% all strings over A. A

is countable+ and so is its subset L+ as it is in one#to#one

This book is licensed under a Creative Commons Attribution 3.0 License

denote the set o% all

You might also like