# EÆ ient Algorithms

Summer Semester 2000

Prof. Dr. Sami Khuri1

Draft, Version of February 18, 2001

1

These are the le ture notes of a ourse taught at Te hnis he Universit
at M
un hen.

The notes were typed by Christian Osendorfer and Johannes Altaner.

Contents
1 Maximum Flow

1.1 Flow networks . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.1.1 Multiple sour es and sinks . . . . . . . . . . . . . . . . .
1.2 The Ford-Fulkerson method . . . . . . . . . . . . . . . . . . . .
1.2.1 Residual networks . . . . . . . . . . . . . . . . . . . . .
1.3 Max-Flow Min-Cut Theorem . . . . . . . . . . . . . . . . . . .
1.4 The basi Ford-Fulkerson algorithm . . . . . . . . . . . . . . .
1.5 Analysis of the Ford-Fulkerson algorithm . . . . . . . . . . . . .
1.6 Variations of the Ford-Fulkerson algorithm . . . . . . . . . . . .
1.6.1 Edmonds-Karp algorithm . . . . . . . . . . . . . . . . .
1.6.2 Pre ow-push algorithms . . . . . . . . . . . . . . . . . .
1.6.3 The pre ow-push pro edure . . . . . . . . . . . . . . . .
1.6.4 Corre tness of the pre ow-push algorithm . . . . . . . .
1.6.5 Complexity of the pre ow-push algorithm . . . . . . . .
1.6.6 Spe i

implementations of the pre ow-push algorithm
1.6.7 FiFo pre ow-push algorithm . . . . . . . . . . . . . . .
1.7 Appli ations of the maximum ow problem . . . . . . . . . . .
1.7.1 S heduling on uniform parallel ma hines . . . . . . . . .
1.7.2 Distributed omputing on a two-pro essor omputer . .
1.7.3 The baseball elimination problem . . . . . . . . . . . . .

2 Graph Mat hing

2.1 Maximum Bipartite Mat hing . . . . . . . . . . . .
2.2 Appli ations . . . . . . . . . . . . . . . . . . . . . .
2.2.1 Bipartite Personnel Assignment . . . . . . .
2.2.2 Non bipartite Personnel Assignment . . . .
2.3 Transversals for families of subsets . . . . . . . . .
2.4 Hall's theorem . . . . . . . . . . . . . . . . . . . .
2.5 Maximum Mat hings and Minimum Vertex Covers
2.5.1 The Bottlene k Problem . . . . . . . . . . .
2.6 Weighted mat hing . . . . . . . . . . . . . . . . . .

3 String Mat hing

3.1 Naive String Mat hing . . . . . . . . .
3.2 String Mat hing Algorithms [CLR90℄ .
3.3 Knuth-Morris-Pratt Algorithm [KMP℄
3.3.1 Analysis of Pre

. . . . . . . . . . . . . . . . . . . . . . . . 3. . . . . . . . . . . . . . . . . . . . . . . . . . . .3.x-Fun tion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4. .1 Analysis of Rabin-Karp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3. . . 3.2 Analysis of KMP . . . . . . . . . . . . . . . . . . . . .5 Boyer-Moore . . . .1 The Bad-Chara ter Heuristi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3. . . . . . . . . . . . . . . . . . i . . . . . . . . . 3. . . . . . . . . . . . . . 1 1 5 5 6 12 13 16 18 18 18 20 24 25 29 30 33 33 36 38 46 50 53 53 54 54 55 58 61 62 64 64 65 65 70 70 71 73 74 75 . . . . . . . . . . . .5. . . . . . . . . . .4 Rabin-Karp Algorithm . .

3. . . . .5 3. . . . . . . . . . .3 The Set-Covering Problem . . .3. . .4 3. .6 The Good-SuÆx Heuristi . .1 Analysis of Greedy-Set-Cover 4. . .1 The vertex over problem . .5. . . . . . . . .5. . . 4. . . . . . . . . . . . .W. . . . . . . . . . . . . . . . . Longest Common Subsequen e . . . . 4. . . . . Analysis of Boyer-Moore . . . . . . . . . . . . . . . . . . . . . .2 3. . . . . . . ii . . . . . More Appli ations: Computer Viruses . . . . . . . . . . . . . . . . . . . . . . . . . . .5. . .3 3. . . . . . 77 79 79 81 83 84 84 85 87 88 90 . . . . . . . . . . . . . . . . The Longest Upsequen e [by E. . .5. . . . . . .4 The Subset-Sum problem . . . .2 The Traveling-Salesperson Problem . .5. . . . . . 4. . . . . . . .Dijkstra℄ 4 Approximation Algorithms 4. . . . . . . . . . .

) ow of ele tri ity . We assume that: a.) f (u. be ause f (u. There are two spe ial verti es: the sour e and the sink.) ow of email messages e. v) = f (v.) every vertex lies on some path from s to t (so the graph is onne ted) A ow in G is a real-valued fun tion f : V  V ! R su h that for all u 2 V . 1.) ow of ommodities being transported a ross tru k routes The ow of the material at any point in the system is the rate at whi h the material moves. v 2 V : a.) ow of phone alls d. u) = f (u.1 Flow networks A ow network is a dire ted graph G = (V. v) = 0 if (u. u) { skew symmetry Note that this implies that f (u. v)  (u.) P f (u. su h as mat hings used to solve s heduling and assignment problems. v)  0. v) = 0 for all u 2 V nfs. u) = 0. Ea h edge (u.) (u. and the minimum ut problem. tg { ow onservation 2 v V 1 . the edge and vertex onne tivity problems. Ea h edge has a apa ity whi h is the maximum rate at whi h the material an ow through the onduit.) ow of oil or water through a system of pipelines b.Chapter 1 Maximum Flow Flow in a network an be: a. v) 62 E b. The study of network ow bridges several diverse and seemingly unrelated areas of ombinatorial optimization. Dire ted edges in ow networks are onduits for the material. E ) with two spe ial verti es: the sour e s 2 V and the sink t 2 V [CLR90℄.) f (u. u) . v) 2 E has a nonnegative apa ity (u. Material ows through all other verti es without stopping at the verti es. Verti es are onduit jun tions. v) { apa ity onstraint b.

In other words: X jf j = f (s. The value of a ow f is the total net ow out of the sour e and is denoted by jf j. v) 2 v V The maximum- ow problem onsists in . v) is alled the net ow from vertex u to vertex v.The quantity f (u.

the positive net ow entering v 2 V where v 6= s and v 6= t.u)>0 By the ow onservation property. alled rates. and are then sent to Bar elona to be shipped to the rest of Europe.v )>0 The positive net ow leaving a vertex v 2 V is given by: X f (v. The Andalu ia A eituna Company (AAC) buys bla k and green olives form the farmers in southern Spain and puts them in jars in Malaga.nding a ow of maximum value from s to t for a given ow network G with sour e s and sink t. is equal to the positive net ow leaving vertex v 2 V . The positive net ow entering a vertex v 2 V is given by: X f (u.f (u. The jars that are not destined for lo al onsumption are pla ed in one-ton ontainers. v) u2V. The tru ks of BCC travel over spe Camio i.f (v. AAC does not own tru ks and therefore leases spa e on tru ks from the Bar elona  n Company (BCC) tru king ompany. u) 2 u V.

Only (u. v) > 0. v) rates per day an go from u to v. and Valen ia. In our Example we have: s = Malaga and t = Bar elona. Madrid. Only positive net ows are shown in the diagram. The rate at whi h olives enter an intermediate ity is equal to the rate at whi h olives leave. v) is labeled with f (u. For example: AAC an move a maximum number of 8 rates of olives from Madrid to Ali ante per day. If f (u. then edge (u. Ali ante. v). Example: 2 . AAC an ship at most (u. Re all that olives annot pile up in intermediate ities. The intermediate ities are: Salaman a. v)= (u. v) rates per day between ea h pair of ities u and v. Zaragoza.ed routes between major ities in Spain and have only limited apa ity for AAC's rates sin e BCC also does business with other manufa turers. So AAC has no ontrol over the routes and apa ities of the tru ks of BCC. The goal of AAC is to ship the largest number p of rates per day that an rea h Bar elona (that same day).

Capa ity Flow Salaman a 10=12 Malaga 5=10 Zaragoza 3=10 5=8 8 10=10 5=10 12=20 Madrid 8 3=4 8=8 8 3=8 Bar elona 5=8 2=10 Ali ante Valen ia The ow in the above example is: X jf j = f (Malaga. v) = 25 2 Is this the maximum possible ow ? v V For the above .

Example: u 3=4 2=5 v We an transform it into the following s enario by \ an elling" 2 units of ow in ea h dire tion: u 1=4 v We do not show the edge from v to u sin e the ow is zero (f (v. or from v to u but not both. Note that the apa ity onstraint and ow onservation are still satis. If we have positive ow in both dire tions. we an say that positive ow goes either from u to v. Note that the ow from u to v in both ases is one unit. u) = 0).2 Remark: Without loss of generality. we an transform the s enario into an equivalent one whi h has only positive ow.

Z ) + f (Y. f (X.. Z ) = f (X. for X  V . f (X. Z  V and X \ Y = . v) 2 2 V s u Xv Y Remarks: Let G = (V. Y ) = f (u. X ) for all X  V and Y  V 3. X ) + f (Z. X ) = 0 for all X  V 2. we have: (a) f (X [ Y. Y  V . Y ) = f (Y. X [ Y ) = f (Z. v) by de. t) for any ow f Pin a ow network Proof: jf j = v2V f (s. E ) be a ow network and f a ow in G. Z ) (b) f (Z. Y ) Result : jf j = f (V.ed 2 Notation: For X  V and Y  V we write: XX f (X. 1.

. 4 .nition XX X X = f (u. . v) u2V v 2V u2V ns v 2V s s . v) f (u.

jf j =
=
=
=
=
=

XX

X X

f (u; v)

f (u; v)
2 2
u2V ns v 2V
f (V; V ) f (V nfsg; V ) by de

nition of f (X; Y )
f (V ns; V ) by Remark 1
f (V; V ns) by Remark 2
f (V; V fsg + ftg [ ftg) by Remark 3b)
f (V; t) by ow onservation
u V v V

2
1.1.1 Multiple sour es and sinks
A maximum ow problem may have more than one sour e, s1 , s2 , . . . , sm , and
more than one sink, t1 , t2 , . . . , t3 . Redu e the problem with multiple sour es and
sinks to one with only one sour e s0 and only one sink t0 by adding edges in the
following way: (s0 ; si ) and osts (s0 ; si ) = 1 for i = 1; 2; : : : ; m and (ti ; t0 ) and
osts (ti ; t0 ) = 1 for i = 1; 2; : : : ; n.
Example: If AAC puts olives in jars in enters s1 , s2 , s3 , s4 , s5 and ships them
to t1 , t2 , t3 , t4 ; reate a enter s0 and edges with apa ities (s0 ; si ) = 1 for
i = 1; 2; 3; 4; 5 and enter t0 with edges with apa ities (ti ; t0 ) = 1 for i = 1; 2; 3; 4.
s1
t
Zaragoza 1
Salaman a
10
1

1
1
s0

s2

1 s
1 Malaga
3

1

s4
s5

12

10

8 8
10

10

t2

20

8 4
Ali ante

8
10

8
Bar elona t

3

8

1

t0

1
1

8
t4

Valen ia

2

1.2 The Ford-Fulkerson method
Solving the maximum ow problem onsists in

nding paths from s to t and pushing
the maximum ow over these paths.
The Ford-Fulkerson method is based on three ideas: 
residual network 
augmenting paths 
uts
It is alled a method and not an algorithm be ause Ford-Fulkerson's method onsists
of several di erent algorithms with di erent running times.
5

s1 ) = 0 ( 10) sin e f (s1 . v ) = 8 3=5 f (v . v ) and (v . v ) = 8 ( 3) = 11 (Note that f (v . and f (s1 . and a ow f in G.Ford-Fulkerson-Method initialize ow f to 0 while there exists an augmenting path P do augment ow f along P return f An augmenting path is a path from s to t along whi h we an push more ow. v1 = Salaman a. v ) f (v . So = 10 So f (v . s ) labeled with the residual apa ities. v) = (u. v): f (u.1 Residual networks Given a ow network G = (V. v ) = (v . v) Example: Consider the edge joining Malaga to Salaman a in our previous example: 10=12 s1 Malaga v1 Salaman a Let s1 = Malaga. s ) = 10. s1 ) f (v1 . (s1 . E ) with sour e s. 1. 3=8 8 v2 Madrid v5 Valen ia = (v . v1 ) = 10 = 10. v ) f (v . v) f (u. we get: 5 2 f (v2 . sink t. v) is the net ow we an push from u to v before ex eeding the apa ity (u.2. v ) = 0 + ( f (v . The residual apa ity of edge (u. and f (v1 . v1 ) = 12.) By repla ing the values between Madrid and Valen ia by the residual apa ities. v1 ) f (s1 . s1 ) = f (v1 . v ) by edges (s . Let us repla e edge (s . v5 ) 2 5 2 5 5 5 2 5 2 2 2 5 6 . v1 ) = 12 10 = 2. 1 1 1 1 1 10 2 s1 1 1 1 v1 Similarly for edge (Madrid. Valen ia). v )) = 3.

2 As pointed out in the example. 5 5 5 5 2 5 5 2 2 2 5 2 5 v1 3=7 2 2 4=6 4=4 v3 t 2=3 2=4 v4 v2 s 2 path in G 4 s v1 3 4 2 4 v2 v3 1 2 2 v4 t 2 edges in the residual graph Gf Note that the path in G an be augmented by 2 units. Example: Let us onsider the ow network of the AAC example with ow fun tion f given by the BCC. E ) and a ow f . where: Ef = f(u. So we onstru t the residual network Gf indu ed by f . new edges may emerge in Gf and thus Ef is rarely equal to E . whi h an el out 3 of the units that go from v to v . v ) = 8. 7 . v) 2 V  V su h that f (u. v ) = 11 is greater than the apa ity (v . the edges of G are repla ed by edges in Gf labeled with the residual apa ities.5 11 v2 v5 Note that when the net ow is negative. Example: Consider the following path through G. v) > 0g In other words. Ef ) indu ed by f . given a ow network G = (V. and thus we have 11 3 = 8 rates that travel from v to v . f (v . then the residual apa ity f (v . Are we violating the apa ity from v to v ? No. is the graph Gf = (V. v ) = 3 in the se ond example. f (v . v ) = 3. 2 The residual network of G. sin e in reality we have 3 rates that go from v to v . Note that this is only one path ( onservation law does not hold sin e we don't see the rest of the ow network).

v) an appear in Gf only if at least one of (u. Ef ) is the residual network indu ed by f . Let f 0 be a ow in Gf . Gf = (V. De. we have: jEf j  2jE j In our example: jE j = 15 and jEf j = 22 Also note that Gf is itself a ow network with apa ities given by f . There exists a relationship between a ow in G and a ow in Gf . u) appears in the original network G. v) and (v.Salaman a 10=12 3=10 5=8 8 10=10 Malaga 8=8 Bar elona 12=20 Madrid 8 3=4 5=10 Zaragoza 5=10 8 3=8 5=8 2=10 Ali ante Valen ia Given ow network with ow f v1 10 2 10 s 5 5 7 3 13 5 5 v4 8 3 8 12 5 v2 11 1 t 11 8 v v 2 residual network indu ed by f 3 5 3 5 Sin e an edge (u. E ) is a ow network with ow f . Result: G = (V.

v) = f (u. v). Then: 8 .ne the ow sum f + f 0 to be f + f 0 : (V  V ) ! R). where (f + f 0 )(u. v) + f 0 (u.

v) f (u. v) = 2 jf j + jf 0 j v V 2 v V 2 Re all that an augmenting path is a path su h that we an push more ow through it. v)  f (u. v)) v 2V X X 0 = f (s. v) 2 2 v V v V = 0+0 = 0 b. v) f (u. By . v)) = (f + f 0 )(v. v) = ) (f + f 0 )(u. u) = (f (u. an augmenting path p is a simple path from s to t in the residual network Gf . u) f 0 (v. (f + f )(u. u) ii. tg: X X (f + f 0 )(u.) ow onservation: for all u 2 V nfs. e. v)℄ (u. v) (u.) f + f 0 is a ow in G b:) jf + f 0 j = jf j + jf 0 j Proof: a:) We show that f + f 0 is a ow in G: i. v) + f 0 (u. v) iii. v) = (f (u. Given a ow network G = (V. v) v 2V X = (f (s. v)) v 2V v 2V X X 0 = f (u. v) f (u. v) f (u. v) + [ (u. v) = f (v.) apa ity onstraints: (f + f 0 )(u. v) = f (u. v) + f (s. E ) and a ow f . v) + f (u. v) + f 0 (u. v) + f (u. v) + f 0(u. v) + f 0 (s. v) =  but f (u. v) + f 0 (u. v) (u. v)  = 0 i.a.) skew symmetry: (f + f 0 )(u.) We show that jf + f 0 j = jf j + jf 0 j X jf + f 0 j = (f + f 0 )(s.

we know that we an push more ow through that path. Let us onsider our example on e more and look for a path in Gf .nding a path from s to t in Gf . We shall formally prove this fa t later. Example: 9 .

we an repeat the pro ess: . So 2 more rates of olives ould be shipped from Malaga to Bar elona via Salaman a then Madrid 1 7=8 12=12 Salaman a 10=12 5=10 8 3=4 Ali ante 8 14=20 8=8 Bar elona 12=20 Madrid 10=10 Zaragoza 3=10 5=8 8 Malaga 5=10 2 3=8 2=10 5=8 Valen ia The ow in G that results from augmenting along path p by its residual apa ity f (p) = 2. 2 The residual apa ity of a path p is: f (p) = minf f (u. v)j(u. f (p) = minf2.v1 2 v4 3 8 v2 s v3 t v5 One augmenting path p is the one that goes from s to v to v to t. v)is on pg It is the maximum amount of net ow that we an move along the edges of p. In our example. 8g. Now that we have a \new" ow f . 3.

rst: onstru t the new residual network se ond: .

nd an augmenting path p and ompute its residual apa ity third: augment the path by its residual apa ity 10 .

we need to study uts .To prove that a ow is maximum if and only if its residual network ontains no augmenting path.

v ) + (s. 2. v ) + f (v . T ) = (v . tg The net ow a ross the ut is: f (S. T ) where S = fs. v ) + (v . But X jf j = f (s. T ) is f (S. the apa ity of a ut is made up of nonnegative-valued apa ities. t) + f (v . T ) and the apa ity of the ut (S. The net ow a ross the ut (S. v ) + f (s. S ) 11 5 2 2 3 3 3 3 . v ) + f (v . v ) + (v . s 2 S . On the other hand. v . v ) + f (v . 3 seen earlier. v g and T = fv . T ) is (S. Note that the net ow a ross a ut an have negative net ows between verti es. T ) = 25. v ) = 5 + 3 + 12 + 3 + ( 3) + 5 = 25 And its apa ity is: (S. and t 2 T . v) = 10 + 10 + 5 = 25 1 1 4 4 1 2 3 2 4 2 2 4 2 4 5 2 5 2 2 v V That is no oin iden e. v . T ) of a ow network G = (V. v ) + (v . T ). V ) f (S. T ) = f (v . Result: Proof: f (S. A minimum ut of a network is a ut whose apa ity is minimum over all uts of the network. Example: S Salaman a 10=12 3=10 5=8 8 10=10 Malaga Ali ante T 8=8 Bar elona 12=20 Madrid 8 3=4 5=10 Zaragoza 5=10 3=8 8 5=8 2=10 Valen ia Consider the ut (S. T ) = f (S. E ) is a partition of V into two disjoint sets S  V and T  V su h that S [ T = V . v . f (S. T ) = jf j Use Remark 1.rst. v ) = 10 + 10 + 20 + 8 + 8 + 10 = 66 Remark: Not that f (S. A ut (S. t) + (v .

u Sv T jf j  (S. The proof is a dire t result of the following lemma: Lemma: If we de. 1. f is a maximum ow 2. jf j = (S. v) u Sv T (u. then the following statements are equivalent: 1. E ) with sour e s and sink t.= = = = = Remark: f (S. T ) and f (S. V ) + f (S s. T ). V ) f (s. T ) = XX 2 2 f (u.3. V ) f (s.1 If f is a ow in a ow network G = (V. T ) for some ut (S. T ) . v)  u V v T and XX we on lude that 2 2 XX 2 2 (u. v) = (S. T ) of G Proof: 1 ) 2: Proof by Contradi tion: Suppose that f is a maximum ow in G but Gf ontains an augmenting path p. V ) jf j 2 Note that be ause we have jf j = f (S. V ) f (s [ (S s).3 Max-Flow Min-Cut Theorem Theorem 1. the residual network Gf ontains no augmenting paths 3.

v) is on p fp (u.ne a fun tion fp : V  V ! R su h that: 8 < f (p) if (u. then fp is a ow in Gf and jfp j p (p) > 0 2 ) 3: Suppose that Gf ontains no path from s to t. T ) is a ut. v) = (v. 12 = . u) is on p : f0(p) ifotherwise. s 2 S and sin e we have no path from s to t. where p is an augmenting path in Gf . t 2 T . Let S = fv 2 V j9 a path from s to v in Gf g and let T =V S (S.

e.e. i. 3 ) 1: We know by a previous result that jf j  (S. we know that f is a maximum ow. T ). T ). But here jf j = (S. We know that f (S. Sin e jf j = (S. T ). we have (u. So for every su h pair. T ) is an upper bound for jf j. T ) = jf j. v) has 0 residual apa ity). many variations (and improvements) were designed.s u t v T S For ea h pair (u. so jf j = (S.4 The basi Ford-Fulkerson algorithm The algorithm is labeled \basi " sin e after its appearan e. (S. It then onstru ts the residual graph Gf . v) 62 Ef (be ause we don't have a path from s to v). Ford-Fulkerson algorithm starts by initializing the ow to zero. T ). v) (i. 2 We are now ready to present the Ford-Fulkerson algorithm that simply implements Ford-Fulkerson's method. we have f (u. The main loop of the algorithm will be exe uted as long as we an . (u. v) in the ut (with u 2 S and v 2 T ). v) = (u. 1.

v) Ford-Fulkerson(G) begin for ea h edge (u. v) is given by f (u. v) = 0 for all edges 13 . v) in p f (u. v) + f (p) f (v. v) 2 E f (u. u) f (u. Consider our ow network on e more with f (u. v) 0 f (v. v) f (u.nd an augmenting path p in Gf . Re all that the residual apa ity f (u. v) is in pg for ea h edge (u. v) f (u. v)j(u. v) = (u. v) 2 E . v) determine residual apa ity Find the new residual graph after having pushed f (p) along the path p Constru t the residual graph Gf return the fun tion f end Example: (u. u) 0 while there exists an augmenting path p in Gf f (p) minf f (u.

Zaragoza Salaman a 10 v v 4 1 12 s 8=10 Malaga 10 v2 8 10 8 8 Madrid 8 4 8 v3 Bar elona 20 t 8=8 8=8 v5 10 Ali ante Valen ia We now onstru t the residual network Gf of the above ow network edge(s. v ) 2 2 2 14 2 . 8g ) f (p) = 8 The ow network will be exa tly the same as above ex ept for the augmenting path.Salaman a v1 12 s Malaga 8 4 10 8 t 8 8 v3 Bar elona 20 Madrid v2 v4 10 8 8 10 Zaragoza 10 8 v5 10 Ali ante Valen ia Assume the algorithm pi ks s ! v ! v ! t to be the augmenting path p. v ) f (s. f (p) = minf10. v ) : f (s. v ) = (s. whi h will be: 2 8=10 s 5 t v2 8=8 8=8 8 v5 The following is the ow network obtained after adding f (p) = 8 to the augmenting path p.

v ) edge(v . s) f (v . s) : f (v . v ) 2 5 5 2 5 2 5 2 5 2 5 2 10 8 = 2 (v . v ) : f (v . v ) : f (v . v ) edge(t. v ) edge(v .= = = = = = = edge(v . t) : f (s. s) 2 2 edge(v . v ) : f (s. s) 0 ( 8) = 8 8 8=0 8 ( 8) = 16 8 8=0 0 ( 8) = 8 2 2 The new residual network Gf indu ed by the ow f in the previous .

gure. v ) = 2 f (v . v) + f (p) and f (v. 8. u) f (u. v ) = 8 + 2 = 10 f (v . v ) = 2 f (v . v ) = 0 + 2 = 2 f (v . s) = 10 f (v . v ) = 2 2 1 2 4 2 2 1 1 1 4 1 4 2 4 4 15 . v) In other words f (s. Salaman a v1 12 8 2 8 s Malaga 8 Bar elona 20 Madrid 8 4 10 v4 10 8 v2 Zaragoza 10 t 8 16 v3 v5 10 Ali ante Valen ia One possible hoi e for an augmenting path p is: s!v !v !v !t f (p) = minf2. t) = 0 + 2 = 2 f (t. v ) = 0 + 2 = 2 f (v . v) f (u. 10g = 2 >From the last ow network: f (u.

Goldberg and Rao gives a history of maximum ow bounds. The algorithm stops when no su h path an be found in the residual network. Sin e the development of the augmenting path method by Ford and Fulkerson. v) are shown in residual networks only.5 Analysis of the Ford-Fulkerson algorithm Only one algorithm for the maximum ow-problem appeared before Ford{Fulkerson's algorithm. then hoosing an augmenting path and so on.Salaman a v1 12 s 10=10 Malaga 10 8 2=8 v2 Zaragoza 2=10 v4 2=8 10 Madrid 8 4 Bar elona 20 8 t 8=8 v3 8=8 v5 10 Ali ante Valen ia The ow network after pushing two more rates along path p: s!v !v !v !t Note that we don't show f (u. The next step involves onstru ting the residual network Gf . Nonpositive f (u. are in the red re tangles. The value/ apa ity that have hanged. v) when its value is nonpositive (as is the onvention with ow networks). The 1951 network simplex method of Dantzig for the transportation problem solved the maximum ow problem as a natural spe ial ase. 2 1 4 1. 16 . many more eÆ ient algorithms were (and are being) developed by several omputer s ientists. The following table from \Beyond the ow de omposition barrier" by A. sin e the last iteration.

where jf  j is the maximum ow found by the algorithm. The omputation is straightforward: It takes (jE j) to initialize the edges of G and at most jf  j iterations of the while-loop are performed. Under these assumptions. When the apa ities are integral and the optimal ow value jf  j is small. n denotes the number of verti es. Its running time heavily depends on the way the augmenting path p in the head of the while-loop is hosen. O(nm log(n log U=(m + 2))) 1989 Cheriyan & Hagerup O(nm + n log n) 1990 Cheriyan et al. But if we have a network as shown in the .year exa t bounds O(n2 mU ) O(nmU ) O(nm2 ) O(n2 m) O(m2 log U ) dis overer(s) 1951 1955 1970 1970 1972 Dantzig Ford & Fulkerson Dinitz Edmonds & Karp Dinitz Edmonds & Karp Dinitz Dinitz 1973 O(nm log U ) Gabow 1974 Karzanov O(np) 1977 Cherkassky O(n m) 1980 Galil & Naamad O(nm log n) 1983 Sleator & Tarjan O(nm log n) 1986 Goldberg & Tarjan O(nm log(n =m)) 1987 Ahuja & Orlin O(nmp+ n log U ) 1987 Ahuja et al. the Ford-Fulkerson algorithm runs in time O(jE jjf  j). O(nm + n  ) 1993 Phillips & Westbrook O(nm(logm=n n + log  n)) 1994 King et al. we obtain a good running time. Let us also assume that the augmenting path is arbitrarily hosen. O(nm logm= n n n) O (m = log(n =m) log U ) 1997 Goldberg & Rao O(n = m log(n =m) log U ) In the table. Let us assume that all apa ities have integral values. m the number of edges and the apa ities of the edges are integers in the range [1 : : : U ℄. The result an be pretty bad with poor hoi es of the path. Ba k to Ford-Fulkerson's algorithm. O(n = log n) 1990 Alon O(nm + n = log n) 1992 King et al.

3 2 2 2 2 2 2 3 8 3 2+ 2+ ( log 2 3 2 2 3 v1 1 s v2 17 2 t ) . the algorithm will take exponential time (even though the apa ities have integer values).gure below.

6.6 Variations of the Ford-Fulkerson algorithm 1.1.1 Edmonds-Karp algorithm A natural variant of the Ford-Fulkerson algorithm is to hoose the shortest augmenting path in the residual network Gf . This algorithm an be implemented by using breadth .

2 Pre ow-push algorithms The pre ow-push algorithms are the fastest maximum- ow algorithms to date. Note that the added edges are along the augmenting path and are dire ted in the opposite dire tion of the ow. It was independently developed by Dinitz and by Edmonds and Karp in the early 1970's. 1. It an be shown that the Edmonds-Karp algorithm runs in O(jV jjE j2 ) time. As before. Pre ow-push algorithms work in a more lo alized fashion than the Ford-Fulkerson algorithm.6. Rather than examining the entire residual network to . This fa t is not obvious sin e new edges an be added to the residual graph as we augment a path. A ru ial observation is that the length of the shortest path in reases with ea h iteration of the main loop in the algorithm. Edmonds-Karp algorithm repeats this step until no further paths an be found.rst sear h and an be shown to run in polynomial time. Note that by shortest augmenting path we mean the path from s to t that ontains as few edges as possible.

looking only at the vertex's immediate neighbors in the residual network. The ow- onservation property is not maintained throughout the exe ution. the algorithm maintains a pre ow whi h is a fun tion f : V  V ! R that satis. Instead. pre ow-push algorithms on entrate on one vertex at a time.nd an adequate augmenting path.

The in rease of height of a vertex is performed through the operation of lifting. to a ommodate ex ess ow.es: i) skew symmetry ii) apa ity onstraint ii) f (V. ow is pushed from a high vertex to a lower vertex. As usual. ea h vertex. E ) is a ow network with sour e s and sink t and f a pre ow fun tion in G. The ex ess ow into u. v) 18 . On the other hand. and all its pipe onne tions are on a platform whose height in reases as the algorithm progresses. The algorithm an be understood. A vertex u 2 V nfs. h(t) = 0 and h(u)  h(v) + 1 for every residual edge (u. u)  0 for all u 2 V nfsg The third property of pre ow is a relaxation of ow onservation that postulates that the net ow into ea h vertex other than the sour e is nonnegative. assume that G = (V. tg is over owing if e(u) > 0. ea h vertex has an out ow pipe leading to an arbitrarily large reservoir. A height fun tion h : V ! N is a fun tion su h that: h(s) = jV j. given by e(u) = f (V. Naturally. by thinking of the ow network as being a system of inter onne ted pipes of given apa ities. We now formally present the pre ow-push algorithm. The Ford-Fulkerson method iteratively adds more streams of ow until no more an be added. The height is in reased to one unit more than the height of the lowest of its neighbors to whi h it has an unsaturated pipe. its reservoir. Se ond. Other vertex heights start at 0 and in rease with time. The height of the sour e is jV j and of the sink is 0. intuitively. So the lifting of the vertex will result in the pushing of ow to at least that unsaturated pipe. u) is the net ow into u. verti es for pre ow-push algorithms are pipe jun tions with two properties: First.

Note that the de.

Push(u. v) end //of Push(u. We note that fh(v)j(u. we have //(u. In other words. i. Flow annot be pushed from u to v be ause v is not downhill from u. Lift(u) is a tivated if u is over owing.. Note that at any given time. f (u. Operation Lift We lift an over owing vertex u for every vertex v for whi h there is residual apa ity from u to v. v) f (v. f (u.nition of h implies that if h(u) > h(v) + 1 then (u. edge (u. v) = minfe(u). So. v) = 0 after the push. Re all that neither s nor t an be lifted. v) Note that Push(u. The amount of ow that an be pushed from u to v is stored in df (u. v) f (u. u) f (u. The ex ess ow stored at vertex u is maintained as e(u) and its urrent height is h(u). v)g //units of ow from u to v df (u. then the push is alled a saturating push. Push(u. v) = minfe(u). we say that u is lifted. otherwise it is alled a nonsaturating push. v) = 0 and h(u) = h(v)+1.e. v). v) minfe(u). E ). e(u) > 0) // f (u. v) > 0 ) h(u)  h(v) for all verti es v Lift(u) begin //appli ability: u is over owing (e(u) > 0) //and for all v 2 V . df (u. we do not want to push a quantity of ow that will ause e(u) to be ome negative. (i. v) is not an edge in the residual graph Gf . v ) begin //appli ability: u is over owing. and if f (u. e(u) > 0) and is su h that f (u.e. 19 . Operation Push Push(u. If after a Push(u. Note that saturated edges do not show up in the residual network. v) is an edge in Gf g end //of Push(u) After the exe ution of Lift(u). or that will ex eed the apa ity of the edge.e. f (u. v) updates the pre ow f in a ow network G = (V. v) is an edge in Gf g is not an empty set sin e u is an over owing vertex (i. v) e(v) e(v) + df (u. v ) is an operation that is a tivated if u is an over owing vertex (i. f (u. f (u. v) + df (u. v) be omes a saturated edge. v)g amount to push //move the ow from u to v //by updating f f (u. v) //and by updating e e(u) e(u) df (u.e. v) > 0). v). v) is a tivated only when h(u) = h(v) + 1. v) edge in Gf ) h(u)  h(v) //a tion: in rease the height of u h(u) 1 + minfh(v)j(u. v)g units of ow will be allowed to be pushed. v) > 0 and h(u) = h(v) + 1 //a tion: Push df (u.

Initially. ea h edge leaving s is .3 The pre ow-push pro edure The pre ow-push pro edure basi ally onsists of two parts: initialization and a push or lift loop.6.1.

v). otherwise e(u) = 0 for all other verti es. while all other edges arry no ow at all. v) = : 0 (u. Re all the de.lled to apa ity. v) ifotherwise As for the height fun tion. In other words: 8 < (u. v) if u = s v=s f (u. it initially assigns the following values:  u=s h(u) = j0V j ifotherwise And for ea h vertex v adja ent to s we have e(v) = (s.

h : V ! N su h that h(s) = jV j h(t) = 0 h(u)  h(v) + 1 for every residual edge (u. v). and so the initial fun tion is indeed a height fun tion. then we must have h(u) < h(v) + 1 for all residual edges (u. h(u)  h(v) and thus a Lift operation an be applied to u. v)) and so are not present in the residual graph Gf . at least one of the two operations applies. v) = (u. After the initialization. h(t) = 0 and the edges (u. In other words. v) is not appli able. we have h(s) = jV j. we have h(u)  h(v)+1 (sin e h is a height fun tion). v). v) In our initialization. these edges are not residual edges. are the one for whi h u = s and those are saturated (f (u. Now if Push(u. 20 . Note that as long as an over owing vertex exists. In other words. the pro edure repeatedly performs the Push or Lift pro edures seen in the previous se tions. This is true sin e for any residual edge (u. v) that violate h(u)  h(v)+1 (that is for whi h h(u) > h(v)+1).nition of a height fun tion.

v) 2 E f (u. u) 0 h(s) jV j height of s is the only nonzero valued height in the beginning //saturate edges leaving the sour e for ea h vertex u 2 Adj[s℄ f (s. and then it is a maximum ow. For that reason it is labeled pre ow. Karzanov was the . Only when the algorithm terminates does the pre ow be ome a ow. u) //Push or Lift phase while there exists an appli able Push or Lift operation do sele t an appli able Push or Lift operation and perform it //f is the maximum ow of G return f end Remarks: 1. s) (s.Preflow-Push(G) begin //Initialization phase for ea h vertex u 2 V h(u) 0 e(u) 0 for ea h edge (u. u) e(u) (s. u) (s. v) 0 f (v. u) f (u.

v) and nonsaturating otherwise. v) = minfe(u). v) = f (u. Example: Consider the AAC example on e more.e. v) with h(u) = h(v) + 1. v). then of ourse u remains an a tive vertex. until there are no a tive verti es: push: Sele t any a tive vertex u. lift: Sele t any a tive vertex u for whi h (u. Send df (u. for the implementation of the pre ow-push algorithm. f (u. The while-loop of the pre ow-push repeatedly performs the following steps. e(u) > 0). v) 2 Ef g. 2. Note that the ondition of \over owing" is present for both operations: push and lift. Repla e h(u) by 1 + minfh(v)j(u. The push has the e e t of in reasing f (u. where a vertex u is a tive if it is over owing (i.rst to use the idea of pre ow. u) and e(u) by df (u. Sele t any residual edge (u. After performing push. if e(u) is still positive. we ould maintain a list (or set) of a tive verti es. 21 . v) and e(v) by df (u. v) and de reasing f (v. Thus. v)g units of ow from u to v. in any order. v) 2 Ef ) h(u)  h(v). The push is saturating if df (u.

Lift(v ) will in rease the height of v by 1 + minfh(v)j(v . and v are su h that h(u) = h(v) = 0 for u 2 fv . At this point we have only one possible operation: lift. v) 2 Ef g = 1 + 0 = 1. v g and (u. v . v . v) 2 Ef and v 6= s. 1 2 3 12 v1 12=12 0 s h=7 h=0 10 10=10 8=8 4=4 10 h=0 8=8 0 20=20 v2 h = 0 v3 v4 h = 0 10=10 8=8 8=8 10=10 0 10=10 8=8 t h=0 8=8 8=8 10=10 h = 0 v5 0 Again. where u is an a tive vertex. So the new height of v will be 1. v . The network after performing Lift(v ) now has h(v ) = 1 (instead of h(v ) = 0). So no pushes are possible. and (u. Suppose that pre ow-push hooses v to lift. 1 1 2 2 2 2 3 3 2 2 2 2 22 2 2 . noti e that the nodes that have an ex ess ow: v . v . To be able to push we should have h(u) = h(v) + 1. The residual network Gf indu ed by the initial pre ow in G. v) 2 Ef . But h = 0 8u 2 V nfsg.: Ex ess ow h: 12 v1 12=12 0 s h=0 h=7 10=10 v4 h = 0 10 8 8 4 8 h=0 v3 0 20 h=0 v2 10 0 10 8 8 10=10 height fun tion t h=0 8 8 v5 10 h = 0 10 0 Network after the initialization phase The a tive nodes are: v .

suppose that the ex ess ow is pushed to v . Assume that pre ow-push hooses option ). v ) = 8 and e(v ) = 10 8 = 2. v and v sin e e(u) > 0 for u 2 fv .12 v1 12=12 0 h=7 10=10 8 0 20 8 4 8 h=0 v3 v4 h = 0 10 v2 h = 1 10 0 10 8 8 10=10 s h=0 t h=0 8 8 v5 10 h = 0 0 G after the lifting of v At this point we still have the same three a tive nodes: v .e. 10 2 1 1 2 2 3 3 1 3 2 2 12 v1 12=12 0 h=7 v2 2 10=10 v4 h = 0 10 8 h=0 0 20 h=1 8 4 v3 0 10 8 8 10=10 s h=0 8 t h=0 8=8 8 v5 10 h = 0 10 8 Network after performing Push(v . v ) = minfe(v ). v . v g. Push(v . Note that v . Pre ow-push an hoose between three options: a) Lift(v ) b) Lift(v ) or ) push the ex ess ow found in vertex v . And so pre ow-push goes on pushing or lifting until there are no a tive verti es (i. 2 5 5 1 3 4 2 2 2 2 2 1 3 5 4 2 5 2 2 5 2 5 5 5 2 5 2 2 1 2 3 5 2 5 2 1 3 5 23 . v . v . v g to perform a lift operation. v) > 0 for v 2 fv .e. f (v . v and t would have been other possible re ipients of the ex ess ow of v (sin e h(v ) = h(v) + 1 and f (v . Note that v remains an a tive vertex even after Push(v . tg. v . v ) will send f (v . v and v . v ) sin e e(v ) = 2. v ) = 8. At this point pre ow-push an hoose between pushing the ex ess ow of v or hoosing a vertex from fv . push the ex ess ow of v . So now the set of a tive verti es omprises v . v . f (v . i. v . e(v ) = 0 + 8 = 8. So f (v . until no vertex is over owing).. v )g = 8 units of ow to v . v ) Moreover.

4 Corre tness of the pre ow-push algorithm We want to show that the pre ow-push algorithm terminates and that when it does.1. Heights are modi. the ow it produ es is the maximum possible ow.6. We start by showing that vertex heights never de rease.

So vertex heights 1. never de rease 2. it must be su h that h(u)  h(v) for all (u. And after the lift of u we have h(u) = 1 + minfh(v)j(u. h(t) = 0. and h(u)  h(v) + 1 for every (u. We already saw that the initialization phase does indeed de. in rease by at least 1 Another important observation about the height fun tion is to realize that the height fun tion will remain a height fun tion (re all the onditions h(s) = jV j.ed only by the lift operation whi h \raises" the value of the height of the vertex. In other words h(u) < 1 + minfh(v)j(u. v) 2 Ef g before the lift. v) 2 Ef . v) 2 Ef ) after the exe ution of the pre ow-push algorithm. v) 2 Ef g. For vertex u to be lifted.

There is no path from the sour e s to the sink t in the residual network Gf . we'll have h(u)  h(v) + 1 for every (u.ne a height fun tion on the verti es of the network. t) to Ef with f (v . we have: h(v ) = 1 and h(t) = 0. Let us show that the operations of lift and push do indeed leave h a height fun tion. w u v As for residual edge (w. we perform a Push(v . f is a 2 2 5 2 2 2 2 2 5 2 5 pre ow in G and h is a height fun tion on V . then we add (v . t) instead of Push(v . We have 2 possibilities: a) It may add (v. we have h(w)  h(u)+1 before and h(w) < h(u)+1 after the exe ution of Lift(u). t) = 10). v ) is no longer in the residual graph. Now (v . i.6. v) from the residual network. u). v). 2 We prove one last property before showing the orre tness of the pre ow-push algorithm.1 G = (V. b) It may remove (u. E ) is a ow network with sour e s and sink t. v). (That is indeed the ase after the exe ution of Push(v . Operation push: Consider Push(u.) But the removal of (u. v). 24 . we shall have h(u)  h(v) + 1 after the exe ution of Lift(u). after the exe ution of Lift(u). h(t) = h(v ) 1). Theorem 1.e. v) from Gf also removes the orresponding onstraint. u) to Ef (if in our example. Proof: The proof is by ontradi tion. In this ase we have h(v) = h(u) 1 and so h remains a height fun tion (in our example. Operation lift: For a residual edge (u. v ) in our example. and so h will still be a height fun tion after the exe ution of Push(u. v) 2 Ef and so h is still a height fun tion. v ). In other words.

then the pre ow f it omputes is indeed a maximum ow for G.. So no su h path exists from s to t in Gf .6. h(s)  k and k < jV j i. then the ondition in the while-loop is not satis. we have edge (vi . v . (without loss of generality)...e. So for i = 0. vi ) in the residual graph Gf . h(s) < jV j whi h ontradi ts the fa t that h(s) = jV j (sin e h is a height fun tion).v0 . E ) with sour e s and sink t. Thus we have: h(s)  h(v ) + 1  (h(v ) + 1) + 1 = h(v ) + 2  h(v ) + 1 + 2 .. 2. 2 0 1 0 +1 +1 1 2 2 3 Theorem 1.. 1. k 1.  h(t) + k But h(t) = 0 sin e h is a height fun tion.2 When the pre ow-push algorithm terminates when run on a ow network G = (V. p is a simple path (does not in lude y les) so that k < jV j. vk > from s = v to t = vk in the residual graph Gf . : : : . so we must have h(vi )  h(vi ) + 1 for i = 0. W.o. k 1. : : : . v1 vk vk 1 Suppose there is a path p =< v . So. : : : .g. Proof: When the pre ow-push algorithm terminates. 1. .. h is a height fun tion.l.

5 Complexity of the pre ow-push algorithm We shall need the following theorem to . 2 1. it delivers a maximum ow from s to t. there are no paths from s to t in the residual graph. no vertex in V nfs..e. Therefore. f is a ow for the ow network G. there are no over owing verti es).ed and thus there are no a tive verti es. So when the pre ow-push algorithm terminates. In other words. It is a maximum ow be ause we know that f is a maximum ow if and only if the residual network Gf ontains no path from s to t (max- ow min- ut theorem) and from the previous theorem we know that sin e h is a height fun tion.6. tg has an ex ess ow (i.

e(u) > 0). E ) is a ow network with sour e s and sink t. If we an show that s 2 U . Suppose that u 2 V is over owing (i. The proof is by ontradi tion. there exists a simple path from u to s in the residual network Gf . Suppose that s 62 U . Claim: f (w.6. Let U = fvj 9 a simple path from u to v in Gf g . v)  0 for any w 2 U and any v 2 U .3 G = (V. Let U = V U . For any over owing vertex u in G. Proof: 25 .nd the running time fun tion of the pre owpush algorithm. Theorem 1. then we are done.e. and f is a pre ow in G.

w) = (v. w) (by de. v) v w v2U w2U Proof of Claim: (by ontradi tion) If f (w. then f (v. w) < 0 but f (v. v) > 0. w) f (v.f (w.

e(U ) = f (V. w) < 0. U ) by de.nition).an edge from v to w And therefore we have a path from u to w whi h ontradi ts the fa t that w 2 U . v)  0. U )  0. and so if f (v. Sin e f (w. we will have f (v. whi h means that there exists an edge (v. w) > 0. u p v w And so in the residual graph Gf we have: . Ba k to the main theorem. we have f (U. w) in the residual graph Gf . So f (w.a path p from u to v (sin e v 2 U ) . v)  0 for every w 2 U and every v 2 U .

U ) = 0 and f (U. U )  0. so e(U )  0. By de. U ) = f (U. U ) by a property already studied But f (U.nition of ex ess ow = f (U [ U. U ) + f (U.

there exists a simple path from u to s in the residual network Gf . e(U ) = 0 whi h ontradi ts the fa t that u is an over owing vertex. U  V nfsg and therefore we have e(v) = 0 for all verti es v 2 U (sin e e(U )  0).nition. So for any over owing u in G. an ex ess ow for any vertex in V nfsg is nonnegative. In parti ular. 2 We next .

4 At any time during the exe ution of the pre ow-push algorithm. Theorem 1. we have h(u)  2jV j 1 for all u 2 V . Proof: Verti es s and t are never over owing (by de.6.nd an upper bound on the heights of verti es.

vk > with u = v . : : : . Moreover. tg. Verti es have their heights altered (in reased) only when they are over owing. s = vk in the residual network Gf (by the previous theorem). (vi . For any over owing vertex u 2 V nfs. v .nition) and so the values of their heights never hange: h(s) = jV j and h(t) = 0. 0 s vk 1 0 v1 1 u Note that k  jV j 1 sin e p is simple. we have a simple path p =< v . vi ) 2 Ef ) h(vi )  h(vi ) + 1 +1 +1 26 .

so h(u)  jV j + (jV j 1) = 2jV j 1 1 2 2 2 As an immediate result of the previous theorem. h(u) = h(v) + 1. When (u. and the total number of lift operations is at most (2jV j 1)(jV j 2) < 2jV j . Proof: Re all that Push(u.6. and that saturated edges do not appear in the residual network. 2 2 Theorem 1. u) and thus have h(v) in rease by at least 2. tg an be lifted. its value grows from 0 (initially h(u) = 0) to at most 2jV j 1 (by the previous theorem).So h(u)  h(v ) + 1  (h(v ) + 1) + 1 = h(v ) + 2 .. So ea h vertex is lifted at most 2jV j 1 times.  h(s) + k h(u)  h(s) + k. h(s) = jV j and k  jV j 1. v) is saturated. Note that after the . and (u. To bound the push operations.5 During the exe ution of the pre ow-push algorithm. Note that any vertex in V nfs. v) = 0) after the push. v) is a saturating push if edge (u. Consider the sequen e A of integers given by h(u) + h(v) for ea h saturating push that o urs between verti es u and v. and when it is. the number of saturating pushes is at most 2jV jjE j. Likewise. That was the bound on lift operations. . We want to upper bound the length of sequen e A. For (u. we must send ow in (v. h(u) must in rease by at least 2 between saturating pushes from v to u. v) be omes saturated ( f (u. v) to reappear in the residual network. we an show that the number of lift operations is at most 2jV j 1 per vertex and at most (2jV j 1)(jV j 2) < 2jV j overall. v) 62 Ef . we onsider the saturating pushes and the nonsaturating pushes separately.

the . In other words.e. we will have h(u) + h(v)  1. Push(u. v) or Push(v. u)).rst push (i.

Sin e heights must in rease by at least 2 between saturating pushes. the number of integers in sequen e A is at most (4jV j 3) 1 + 1 = 2jV j 1: 2 Note that 1 is added on the left hand side of the equality to make sure that the . we will have: h(u) + h(v)  (2jV j 1) + (2jV j 2) re all h(u)  2jV j 1 = 4jV j 3 So the last integer in sequen e A is at most 4jV j 3. When the last push o urs.rst integer in sequen e A is at least 1.

27 . 2 We now would like to bound the nonsaturating pushes.6.6 During the exe ution of the pre ow-push algorithm. We use a potential fun tion to a hieve the goal. So the total number of saturating pushes is at most (2jV j 1)jE j < 2jV jjE j.rst and last numbers of sequen e A are taken into onsideration. the number of nonsaturating pushes is at most 4jV j2 (jV j + jE j). Theorem 1. The total number of saturating pushes between verti es u and v is at most 2jV j 1.

De.

ne a potential fun tion  = Pv2X h(v) where X  V is the set of over owing verti es.7 During the exe ution of the pre ow-push algorithm on any ow network G = (V.  = 0. the total de rease (due to (k) (k 1)  1 when a nonsaturating push is performed).6. the algorithm pi ks u 2 X and performs a: a) lift. Re all that at ea h iteration of the while-loop. ) nonsaturating push. sin e we have no over owing verti es. v). we have to perform one of the following operations: a) a lift b) a saturating push ) a nonsaturating push Let (k) denote the potential value at step k of the algorithm. E ). b) saturating push. then (k) (k 1)  2jV j. Sin e  > 0. If at the kth step. Existen e of path from over owing node to s in Gs . and therefore the total number of nonsaturating pushes. and h(v) h(u) = 1 If K denotes the last step of the algorithm. sin e again no heights hange and only vertex v. then K X (K ) (0) = (k) (k 1) Proof: k=1  (max number of lifts)(2jV j) +(max number of saturating pushes)(2jV j) = (2jV j )(2jV j) + (2jV jjE j)(2jV j) (from previous theorems) = 4jV j (jV j + jE j) So (K ) (0)  4jV j (jV j + jE j). an possibly have a positive ex ess ow. whose height is at most 2jV j. Push(u. sin e u is not an over owing vertex after the push. we have really shown the following theorem: 2 2 2 2 Theorem 1. u e(u) > s v e(v ) > 28 0 0 . v). the number of basi operations is O(jV j2 jE j). v now joins X (even it wasn't an over owing vertex at step k 1). Push(u. then (k) (k 1)  2jV j. is at most 4jV j (jV j + jE j). Initially. then (k) (k 1)  1. 2 Sin e jE j > jV j. sin e only u 2 X is lifted (heights of other elements in X do not hange) and u annot be lifted by more than its maximum possible height. If u is over owing ) 9 a simple path from u to s in Gf . 2 Complexity of pre ow-push algorithm { Summary 1.

aq g . v ) ) v u 10=10 in G 10 e(v ) = 10 v v in Gf in Gf A = fa1.2. a2 . 3. Number of lift operations: < 2jV j 1 per vertex < 2jV j overall 4. Upper bound on height of an vertex: h(u)  2jV j 1 for all u 2 V . : : : . Number of saturating pushes < 2jV jjE j 2 Example: e(u) 4=10 =8 u in G 6 4 u e(v ) =4 e(u) v =2 u Push(u.

Example: Consider the sequen e 1. Furthermore. 7. 9. 3. 17. 15. Be ause after the 1st push h(u) + h(v)  1. 11. 5. We also know that aq  4jV j 3. ai ai  2 holds. we know that a  1.nd a bound on q is the ith saturated push. 13. 19 Count the number of integers in this sequen e: last .

rst + 1 ) 19 1 + 1 = 10 2 2 5. ai 1 +1 2 2 1. Use  = Pv2X h(v). where X is the set of over owing verti es. We denote by (k) the potential value of the algorithm at step k.6. Number of nonsaturating pushes < 4jV j (jV j + jE j).6 Spe i. (k) (k 1)  2jV j and thus (K ) (0)  4jV j (jV j + jE j) where K denotes the last step of the algorithm.

The bottlene k operation in the pre ow-push algorithm is the number of nonsaturating pushes. 29 . ea h with di erent worst- ase omplexity than the pre ow-push algorithm we have just studied. Di erent rules for examining a tive nodes an produ e meaningful redu tions in the number of nonsaturating pushes. implementations of the pre ow-push algorithm The pre ow-push algorithm is exible. By spe ifying di erent rules for sele ting a tive nodes for the push and lift operations one an derive many di erent algorithms.

7 FiFo pre ow-push algorithm As the title suggests. the algorithm examines the a tive nodes in the .1.6.

rst-in. .

In our FiFo pre ow-push algorithm we want to for e the sele tion of u again and keep pushing ow from u util either e(u) = 0 or the algorithm performs a Lift(u) operation. it is possible that u still remains a tive. It an be shown. As before. the algorithm terminates when the queue of a tive nodes is empty. examines u) until either it be omes ina tive or it is lifted (in whi h ase it is added to the end of the queue). we perform a ba kward breadth-. that this algorithm runs in O(jV j3 ) time.rstout (FiFo) order [AMB93℄. v). Consequently.a nonsaturating push. by maintaining a set list as a queue. performs pushes from this node. The FiFo pre ow-push algorithm examines a tive nodes in the FiFo order. we saw that the pre ow-push algorithm sele ts an a tive node u and performs . or . w). The algorithm repeatedly onsiders u (i. or . we do not initialize the height values to zero. If the algorithm performs a saturating push.a lift operation.a saturating push. This sequen e of operations is known as a node examination or we say that an over owing vertex u is dis harged by pushing all of its ex ess ow through admissible edges (u. At any given iteration. The algorithm might sele t u again or might sele t another a tive node instead. v) (an edge (u. To ut down on the number of lifts that the algorithm has to perform in the early stages of the run. and adds newly a tive nodes to the rear of the list.e. Lift(u). v) > 0 and h(u) = h(v) + 1) to neighboring verti es. Push(u. Instead. lifting u as ne essary to ause edges leaving u to belong admissible. Push(u. the algorithm might perform several saturating pushes followed either by a nonsaturating push or a lift operation. It sele ts a node u from the front of the list. v) where f (u.

We illustrate the FiFo pre ow-push algorithm using the AAC example. The algorithm next removes vertex v2 from the queue and \examines it. v). h(v1 ) = h(v3 ) = 2 and h(s) = jV j = 7. verti es that share an edge with t will be of height one. v3 . verti es two hops from t have 2 as initial height.rst sear h starting at the sink t. we keep theP initial height value to be jV j. Suppose that the queue of a tive nodes at this stage is List = fv2 . Example: The initialization phase reates a network with height h(t) = 0. So the sink will have a height of 0. h(v2 ) = h(v4 ) = h(v5 ) = 1. v1 g. But the sour e's ex ess ow is initialized to e(s) = v2V (s. 30 . Thus we'll have an ex ess ow of 12 rates at v1 and an ex ess of 10 rates of olives at ea h of the verti es v2 and v3 . As is the ase with the pre ow-push algorithm of the previous se tion. As for the sour e. et . the algorithm then saturates edges leaving the sour e.

At this stage. 2 2 3 3 2 3 3 2 : Ex ess ow h: 12 v1 12=12 32 s h=2 h=7 v2 4 10=10 6 v4 h = 1 10 h=2 31 8 10 10=20 h=1 8 4=4 v3 0 10 8 8 10=10 height fun tion 8 10 t h=0 8 8 h=1 v5 0 v1 v2 List 5 . Suppose it does Push(v . and v gets rid of its ex ess over ow. Push(v . v ). So the algorithm next examines v (the urrent node in List). 2 0 10=20 v2 t h=1 2 10 h=0 Note that v is not an a tive node after performing Push(v . t). the algorithm performs a nonsaturating push. Push(v .: Ex ess ow h: 12 v1 12=12 32 h=7 10 10=10 8 0 20 8 4 8 h=2 v3 v4 h = 1 10 h=1 v2 0 10 8 8 10=10 s h=2 height fun tion t h=0 8 8 v2 v3 v1 v5 10 h = 1 10 0 List Snapshot of network after saturating the edges leaving s At this stage. t). v ) of 4 units or a saturating push. Push(v . v ) of 10 units. the algorithm ould perform a saturating push.

fun tion  = On e again. v . we partition the total number of node examinations into several phases. we would like to ompute the di eren e between the initial and . As mentioned earlier. and performs Push(v . v ) [a nonsaturating push of 6 units℄.So now v has an ex ess ow of 4 units and is therefore put in the queue. We shall show that the FiFo pre ow-push algorithm performs at most 2jV j +jV j phases.e. phase one onsists in examining nodes v . Noti e that the algorithm examined a node at most on e during a phase. Proof: To analyze the worst- ase omplexity of the FiFo pre ow-push algorithm. And so on. the last step in the algorithm before the while-loop). phase two onsists in examining nodes v . Phase two: Node examination of all the nodes that are in the queue after the algorithm has examined the nodes of phase one. v . v . In our example. In our example. v is not a tive anymore and the algorithm then onsiders vertex v . phase three onsists of the node examination of all the nodes that are in the queue after the algorithm has examined the nodes in the se ond phase. Ea h phase examines any node at most on e and ea h node examination performs at most one nonsaturating push. the algorithm stays with v sin e e(v ) = 6 > 0. But re all that the number of nonsaturating pushes is the bottlene k of the pre owpush algorithm (and is also the bottlene k of the FiFo pre ow-push algorithm). v . So now v is a tive and enters the queue. Phase one: Node examination of the nodes that be ome a tive be ause of the saturating of the edges that leave s (i. Similarly. then we'll have at most jV j(2jV j + jV j) nonsaturating pushes. And so on. 2 3 3 3 5 5 3 1 3 1 2 5 2 3 4 2 2 2 3 Claim: FiFo pre ow-push performs at most 2jV j2 + jV j phases Proof of Claim: We onsider the total hange in the potential maxfh(u)ju is a tive g over an entire phase. So if we an proof that we have at most 2jV j + jV j phases. Theorem 1. So this would imply that the FiFo pre ow-push algorithm runs in O(jV j ).6.8 The FiFo pre ow-push algorithm runs in O(jV j ) time.

we have that the total number of phases is at most 2jV j + jV j. We onsider two ases: Case one: FiFo pre ow-push performs at least one lift operation during a phase. sin e the ex ess of every node that was a tive at the beginning of phase moves to nodes with smaller weights. Ba k to the proof of the laim. Case two: FiFo pre ow-push performs no lift operation during a phase.  de reases by at least 1 unit. We already saw in the previous se tion that h(u)  2jV j 1 for any v 2 V . and that the total number of lift operations is at most 2jV j . Note that the potential fun tion here is slightly di erent than the one we use in the previous se tion. and by realizing that the initial value of  ould be at most jV j. the total in rease in  over all the phases is at most 2jV j . So by ombining ases one and two. By the remarks pre eding the laim. Sin e  annot in rease more than the maximum in rease produ ed by a lift operation. So this proves the laim that FiFo pre ow-push performs at most 2jV j + jV j phases. we have proven that the FiFo pre owpush algorithm runs in O(jV j ) time.nal values of the potential fun tion during a phase. As was seen in the previous se tion. 2 Other possible implementations of the pre ow-push algorithm in lude the highestlabel pre ow algorithm that always pushes ow from an a tive node with the highest 2 2 2 2 3 32 .

The highest-label pre ow push algorithm runs in O(jV j jE j) and is onsidered to be the most eÆ ient maximum ow algorithm in pra ti e.distan e value. that is the beginning of the day by whi h the job must be ompleted Note that rj + pj  dj .7. we an interrupt a job and pro ess it on di erent ma hines on di erent days. We make the following assumptions: 1. Preemption is allowed. Ea h job j 2 J has: pj : number of ma hine days required to omplete the job rj : release date. Ea h job an be pro essed by at most one ma hine at a time 3. and the p ex ess s aling algorithm. 2 1. The feasible s heduling problem as de. i. A ma hine an work on only one job at a time 2. The s heduling problem onsists in determining a feasible s hedule that omputes all jobs before their due dates or to show that no su h s hedule exists.7 Appli ations of the maximum ow problem 1. Note that s heduling problems of this kind arise in bat h pro essing systems involving bat hes with a large number of units. In other words.e.1 S heduling on uniform parallel ma hines We are interested in s heduling a set of J jobs on M uniform parallel ma hines [AMB93℄. the beginning of the day when job j be omes available for pro essing dj : due date.

rj and dj for all j in as ending order and determine mutually disjoint intervals of dates between onse utive \milestones". su h as  the maximum lateness (tardy) problem  the (weighted) minimum ompletion time problem  the (weighted) maximum utilization problem The feasible s heduling problem ould be formulated as a maximum ow problem. Example: Suppose we have M = 3 ma hines and 4 Jobs as given in the table: Job j 1 2 3 4 Pro essing pj 1:5 1:25 2:1 3:6 time Release rj 3 1 3 5 time due dj 5 4 7 9 time Let us rank all the release and due date. 33 .ned here an be used by more general s heduling problems. We show the transformation of this problem into a maximum ow problem by onsidering the following example.

jobs that have been released but not due yet) does not hange. 9 and so we get 5 mutually disjoint intervals of dates (or we an allow the intervals to share end points): [1. In our example. 3. the set of jobs that are available (i. we get: 1.3 2 | | 0 1 2 1 t u | | | | | | | 3 4 5 6 7 8 9 t u 1 3 2 4 t u 3 4 1 t u 4 t u 5 t u 7 9 By ordering rj and dj in as ending order. [3. Let Tk. 5. the .. 7. 3℄. Note that the maximum number of disjoint intervals we an have is 2jJ j 1 (we have 2 entities for ea h job: due and release dates) where jJ j is the number of jobs.e. 9℄ Note that within ea h interval. [5. Another way of seeing the 5 intervals is by onsidering the graph above. [7.l denote the interval that starts at the beginning of day k and ends at the beginning of day l + 1. where j denotes the release time of j and j denotes the due date of job j . 4℄. [4. 5℄. 7℄. 4.

ve intervals are: T . T . T . and T . T . Level two: Nodes orresponding to the jJ j jobs (one vertex per job). in the interval 3 to 4.l we an pro ess all jobs with a) rj  k and b) dj  l + 1 in the interval. i.e. T . we an pro ess the following jobs:  2 sin e r  3 and d = 4  3 + 1  1 sin e r  3 and d = 5  3 + 1 and  3 sin e r  3 and d = 7  3 + 1 To ast this problem as a maximum ow problem. Noti e that within ea h interval Tk. For instan e. Level four: The sink node t. one vertex per interval. we reate 4 levels of verti es: Level one: The sour e node s. As for the edges and apa ities: 12 33 44 56 78 33 2 2 1 1 3 3 34 . Level three: Interval verti es.

 Conne t the sour e node to every job vertex in level two with an edge with
apa ity pj . This means that we need to assign pj days of ma hine time to
job j . 
Conne t ea h job vertex j to every interval vertex Tk;l if
rj  k and dj  l + 1
by an edge with apa ity l k + 1 whi h represents the maximum number of
ma hine days we an allot to job j on the days from k to l. For instan e:
size of
interval

2:1

1

job
3

1

T3;3

r3 = 3 d3 = 7
[3; 4℄

T4;4

[4; 5℄

T5;6

[5; 7℄

2

Job 3 is ready for pro essing at r = 3 and has a due date of d = 7. Its
p = 2:1 of pro essing time ould be s heduled during intervals T ; , T ;
and/or T ; . 
Conne t ea h interval node Tk;l to t by an edge with apa ity (l k + 1)M ,
representing the total number of ma hine days available on the days from k
to l. For instan e:
3

3

3

33

44

56

3
T5;6

6

t

4
The size of the interval is 6 5 + 1 = 2. But this represents the number of
days per ma hine. So we have 6 ma hine days (32) to servi e the pro essing
demands of jobs 3 and 4 that arrive at node T ; .
2
So now we solve the problem and

nd a maximum ow. 

The s heduling problem , The maximumP ow
is equal to pj
has a feasible solution
j 2J
2
56

35

T 1 ;2

1

1

2

1
1

1:5

1:25

s

T 3 ;3
3

2
1

2:1

1

3
3:6

6

T 4 ;4

3

t

6
2

2

T 5 ;6

6

4
2

T 7 ;8

2
1.7.2 Distributed omputing on a two-pro essor omputer
A omputer system has two (not ne essarily identi al) pro essors. We wish to
exe ute a very large program that onsists of several modules (subroutines) that
will need to intera t with ea h other during the exe ution of the program [AMB93℄.
The ost of exe uting a module on a pro essor is known in advan e. Let
i : ost of omputation of module i on pro essor 1

i : ost of omputation of module i on pro essor 2 In general i 6= .

modules i and j are assigned to the same pro essor. ontrol. 3. the interpro ess ommuni ation osts.i be ause the memory. Let ij : interpro ess ommuni ation ost when modules i and j are assigned to di erent pro essors Note that : ij is in general quite high and that if instead. We wish to allo ate modules of the program on the two pro essors so that we minimize the total ost of pro essing and interpro ess ommuni ation. we do not have any ost.2 below. we reate three levels of verti es: 36 . We show the formulation by onsidering an example. So for instan e. speed and arithmeti apabilities of the two pro essors are not the same. ij . Moreover. This problem an be formulated as a minimum ut problem on undire ted network. of assigning module i and j on di erent pro essors are given in the symmetri table 1. Example: Suppose that a program an be broken into 4 modules and that the osts of omputation of the di erent modules on pro essors one and two are as given as in table 1.2 = 6 To ast this problem as a minimum ut problem on an undire ted network.1.

1 2 3 4 6 5 10 4 4 10 3 8 i i .

1: \In-house" osts 1 2 3 4 1 0 5 0 0 2 5 0 6 2 3 0 6 0 1 4 0 2 1 0 Table 1.i Modules Pro essor one Pro essor two Table 1.2: Interpro ess osts sour e vertex s whi h represents pro essor one Level two: verti es orresponding to the program modules: one vertex per module Level three: sink vertex t whi h represents pro essor two As for the edges and apa ities: a) onne t the sour e node to every module vertex i in level two with an edge with apa ity .

Here. j ) of apa ity ij if modules i and j intera t during the exe ution of the program ) onne t every module vertex i with t to form ar and give a apa ity value of Level one: i Noti e that there is a one-to-one orresponden e between s{t uts in the network and assignments of modules to the two pro essors. Consider the extreme ase of s heduling all four modules on pro essor two.i b) onne t two module verti es i and j with an edge (i. there is no interpro ess ommuni ation happening and the ost is just X .

Note that its ost is 4 + 10 + 3 + 8 = 25. then the ost of this assignment is X X X i + ij . Indeed we shall always have the apa ity of a ut equal to the ost of the orresponding assignment.i = 4 + 10 + 3 + 8 = 25 4 i=1 Now onsider the ut (fsg. if A is the set of modules assigned to pro essor one and A is the set of modules assigned to pro essor two. In other words. V fsg) of the network.

2g and A = f3. 4g we get the following ow network: 1 2 37 .j ) A The s{t ut orresponden e to this assignment is (fsg [ A .i + 1 2 2 i A1 2 2 1 A2 i A2 (i. ftg [ A ) 1 Example: 2 With A = f1.

4g 5 2 t 10 Pro essor 2 3 8 4 1 4 Program modules As shown in above .1 A1 = f1. 2g 4 5 2 10 s 3 6 6 Pro essor 1 A2 = f3.

t) of apa ity and edge (2. t) of apa ity . ftg [ A ) = 8 + 3 + 6 + 2 + 5 + 6 = 30 In our example of A = f1. 2g and A = f3. edge (s. edge (1. 4g we saw that (fsg [ A . (fsg [ A . ftg [ A ) = 30 . whi h implies that modules 1 and 2 are assigned to pro ess 1 2. 3) of apa ity .gure. Note that this ut ontains: 1.

4) of apa ity . and edge (s.

3) of apa ity and edge (". A . whi h represents the fa t that modules 3 and 4 are assigned to pro ess 2 3.3 The baseball elimination problem This appli ation has nothing to do with baseball (espe ially the way it is played now). the minimum s{t ut in the network gives the minimum ost assignment of the modules to the two pro essors. . . We showed that the ost of the assignment A and A = apa ity of the ut (fsg [ A . This is true for any assignment A . ftg [ A ). team 2. . edge (2. .7. 4) of apa ity whi h represents the interpro ess ost for the assignment of modules 1 and 2 to pro essor one and modules 3 and 4 to pro essor two. Consequently. team n. There are n teams: team 1. We are in mid-season and we are interested in knowing if team n still has a han e of ending the season in . Team n is our favorite team. 2 1 2 1 2 1 2 1 2 3 4 23 24 1 1 2 2 1 2 1.

38 . and that g(i.rst pla e [AMB93℄ [Wes96℄. Suppose that team i has won w(i) of the games it has already played. We assume that no ties are possible. j ) is the number of games that team i and team j have yet to play with ea h other.

We say that team t is eliminated if it has no han e of .

nishing .

in other words. for i = 1. : : : . : : : . j ) games they still have to play). j ) denote the number of wins that team i has at the end of the season. 2. Re all there are g(i. at least one team will have more wins than team t. j ) + x(j. j ) denote the number of games that team i wins over j (from the g(i. Note that x(i. 2. n) 1 j =1 Team n annot be eliminated if the rest of the games between the other teams is su h that ea h other team has at most W wins at the end of the season. Team n urrently has w(n) wins. j ) games remaining between team i and team j . and there are n X g(j. Team n . n) 1 j =1 games remaining on its s hedule. i) for ea h j = 1. n 1. i) = g(i.rst. n and i = 1. Let x(i. : : : . Let n X W = w(n) + g(j. n Let W (i) = w(i) + Pnj x(i. 2. if for every possible out ome of the unplayed games. j ) = g(j. The best s enario for team n is if it wins all of its remaining games.

nishes the season in .

n 1 In other words. n 1 and n n X X W = w(n) + g(j. n)  w(i) + x(i. In other words g(1. j )  0. : : : . 3) = 3 2 and g(1. i) = g(i.rst pla e if there is a way of ompleting the season su h that W  W (i) for i = 1. While in some problem instan es it is easy to answer this question. team n is not eliminated if there is some assignment of integer values x(i. i) for i = 1. j ) for ea h i = 1. and in the entire season. 4) = 1 39 . n 1 and j = 1. 2. n 1 =1 1 1 j =1 j =1 Our problem is to de ide whether team n is eliminated or not. Example: There are 4 teams. : : : . n su h that x(i. : : : . in others it is a hallenging question. j ) = g(j. 2. In other words. : : : . : : : . 2. ea h team plays ea h of the other 3 teams for 3 games ea h. 2. j ) + x(j. ea h team plays a 9{game season. Suppose that team 1 has already played and beaten team 3 twi e and team 4 twi e. i = 1. 2.

Similarly.e. So w(3) = 0 and w(4) = 0 (see table below). Team i wins w(i) 1 4 2 4 3 0 4 0 Table 1.and w(1) = 4. Thus either X W (1) = w(1) + x(1. 2 The following are 2 equivalent ways of thinking of the problem: Can the games between all of the teams 1. 1) + x(2. 2. 3) = 1g(2. 3) + x(2. either x(1. inj 1 2 3 4 1 | 3 1 1 2 3 | 1 1 3 1 1 | 3 4 1 1 3 | Table 1.3: Wins so far in the season And there are quite a few remaining games. 4) = 0+5 = 5 None of the teams has more than 4 wins so far in the season. 2)  2 (i. team 1 wins at least two of the 3 games) or x(2. But team 1 and team 2 still have to play 3 games.4: The remaining games Is team 4 eliminated ? w = w(4) + 3 X j =1 g(j. team 2 wins at least two of the 3 games). 1)  2 (that is. 4) = 1 and w(2) = 4 and these are the only games played so far. At the end of these games. : : : . j )  4 + 2 = 6 4 j =2 or W (2) = w(2) + x(2. n 1 be played su h that: 40 . suppose that g(2. 4)  4 + 2 = 6 W (1) > W or W (2) > W and therefore our favorite team So either (team 4) is eliminated. as the following table shows.

. team nodes. 41 . j ) 1 1 =1 =1 < i j > is a pair node Create an edge from ea h team node team i to t with apa ity W w(i). . In other words n i XX g(i. Note that we are assuming that W  w(i). n 1 ? The formulation of this problem as a maximum ow problem is based on the se ond interpretation where the ommodity that is being shipped from the sour e s to the sink t is \remaining games between team i and team j for i = 1. j ) . w(i) . 2. n 1. Note that there are n n pair nodes. : : : ... by the end of the season. . where G is the total number of games remaining among teams f1. ... .k. pair nodes (a. We transform the baseball elimination problem into a maximum ow problem.. . 2. j ). one for ea h pair of teams. n 1 and j = 1. G .1.a. 2. n 1 and a sink t. one for ea h team i = 1. .. game nodes).. : : : ... n 1g. j G . 2.. . i .. Example: Let us go ba k to our example. We reate an edge leaving s to ea h < i j > game node and give it apa ity g(i. : : : . . j . ea h team's total does not ex eed W ? 2. . W .. Create an edge leaving game node < i j > to team node i and another to team node j . W t w(j ) . ( 1)( 2 2) . The apa ity of ea h of these edges is G. j ) G = i j X = g(i. If that is not the ase. . . s g(i. ate the end of the season. Create a sour e s. .. .. 2. i . . : : : . team i wins at most W w(i) of its remaining games. : : : . then the urrent number of wins of team i w(i) is greater than W and so team n is eliminated (and no need to onstru t the network). for i = 1.

We de. Similarly. i = 1. The apa ity of the edge from the sour e to game node < i j > is g(i. j )+x(j. 2. 5 3) = 1 2 5 3 5 3 0=5 Re all that the ow here is \remaining game". V fsg) = g(i. n and j = 1. The unit of ow will go from the winning node. j ) and so will ensure that we annot have more than g(i. : : : . n 1 and b) W = w(n)+ Pjn g(j. 2.1 g (1. j ) games between team i and team j. Obviously. n su h that a) x(i. is su h that for its apa ity holds X (fsg. team i lets say to sink t. G 2 =5 5 5 5 3) = 1 1 1 2 3 5 4=1 4=1 t 5 g (2. then team n is indeed eliminated. j ) < i = G j > is a pair node Next we show that: Team n is noteliminated . where V is the set of all verti es. 2. the apa ity of W w(i) on the edge between team i and t will ensure that at most W w(i) games are won by team i (among its remaining games). j ) for ea h i = 1. n 1 and j = 1. n 1. j )  0. Also note that the ut (fsg. n)  w(i)+ Pjn x(i. if that is not the ase. The unit of ow then either goes to team i or team j . j ) = g(j. V fsg). depending on whether team i or team j won. : : : . : : : . : : : . 2. 2) = 3 s g (1. hen e there is some assignment of integer values x(i. i) for i = 1. : : : . Maximum ow = G Proof: ()): We want to show that: Team n is not eliminated ) Maximum ow = G Team n is not eliminated. i) = g(i. 2. A unit of ow from s to game node < i j > represents a game between team i and team j .

< i j >) = g(i. we have f (s.ne a ow fun tion f in the following way: For ea h edge leaving s. j ) 1 =1 1 =1 42 .

This assignment satis.

j ). j ) = x(j.es the apa ity onstraint of the edges (s. j ) and f (< i j >. j ) + x(j. i) for edge (< i j >. Sin e g(i. i) = x(i. < i j >). For ea h edge (< i j >. i). i) (by a) ) the node ow onservation onstraint at ea h game vertex < i j > is satis. let f (< i j >. j ) = x(i.

i) + f (< i j >. t): n X x(i.ed. t) = 1 j =1 Here too. j ) f (i. j ) f (< i j >. j ) 1 j =1 So we assign the following ow to ea h edge (i. In other words: ow out of ow in vertex = vertex < i j >: < i j >: g(i. j ) The total ow assigned to enter team i is n X x(i. the node ow onservation onstraint is satis.

j )=W w(i) t n j =1 x(i. P1 P n 1 j =1 i x(i.ed at ea h team i node. . j ) Note that apa ity onstraint is P also satis.. .

ed: Capa ity of edge (i. j ). t) = jn x(i. j ). In other words. : : : . n 1 1 =1 1 j =1 n So W w(i)  P x(i. 2. t)  W w(i). Assigned ow is: f (i. Let us ompute j the value of f : X jf j = f (s. f (i. But by b) we know that n X W  w(i) + x(i. < i j >) (by de. j ) for ea h i = 1. t) = W w(i).

j ) (by our ow assignment) 1 =1 < i j > < i j > pair node = G But there exists a ut of apa ity G.nition of ow value) pair node X = g(i. V fsg) has apa ity (fsg. Re all that ut (fsg. V fsg) = G Therefore f as de.

ned here is a maximum ow and jf j = G (by the Max Flow-Min Cut theorem). 43 .

: : : . n 1 and j = 1. i) = f (< i j >. j ) units of ow f go into game node < i j > and there are only two edges leaving that node. It is a reasonable assumption sin e all apa ities are integer values. n and j = 1. j ) = f (s. we have to show that there is an assignment of integer values x(i. n 1 and n n b) W = w(n) + P g(j. j ) for ea h i = 1. (fsg. i) So property a) is satis. : : : . To show that team n is not eliminated. 2. j ) + x(j. 2. 1 1 j =1 j =1 As seen previously. so: f (s. Assume that f assigns integer values to ea h edge. j )+x(j.((): We want to show: n is not Maximum ow = G ) Team eliminated Suppose that we have a maximum ow fun tion f and jf j = G. j ). j ) for ea h game node < i j > Set x(i. i) and x(j. 2. 2. Sin e g(i. < i j >) = g(i. V fsg) = G (and jf j = G). i) for i = 1. : : : . we know by the onservation onstraint that g(i. n 1. j ) out ow = x(i. 2. j ) = f (< i j >. n su h that: a) x(i. j )  0 for i = 1. i) = g(i. : : : . j ) = g(j. : : : . < i j >) in ow = f (< i j >. n)  w(i) + P x(i. i) + f (< i j >.

it satis. b) is also met.ed by our assignment of x. : : : . Let us show that property nP x(i. t) = x(i. n 1 1 =1 1 j =1 Sin e f is a ow. Note that j Sin e there is only one edge leaving team i (going to the sink). j ) units of ow f enter team i vertex. j ) for ea h i = 1. we have n X f (i. 2.

n 1 ) W  w(i) + X n 1 j =1 x(i. j )  W w(i) for i = 1. j ) . n 1. : : : . 2. t) = P x(i. Therefore x satis. t)  W w(i) for i = 1. 2. : : : .es the apa ity onstraints. n But sin e f (i. j ) for i = 1. n 1 . : : : . So we have f (i. we have 1 j =1 X n 1 j =1 x(i. 2.

j ) are all non-negative integer values (sin e we assumed that ow f takes only integer values .es property b). So team n is not eliminated and so we have shown that Team nis not ) Maximum ow = G eliminated 2 44 . Also x(i.apply the integrality property).

Example: 1 2 1=5 1 1=5 2=3 1=1 0=5 s 1=1 1 2 3 1= 1 t 1=5 1=1 0=5 2 3 1=5 1=5 3 By omputing a maximum ow for our problem. we get the ow in the above .

2 45 . 2. 3 is 5  4. team 4 is eliminated. So jf j = 4 A ording to the theorem. sin e the total number G of games remaining among teams 1.gure with values in red.

otherwise. M  E . We are interested in . E ) is a set M of edges. su h that no two of the edges of M share a vertex. v is unmat hed. A vertex v 2 V is mat hed by mat hing M if some edge in M is in ident on v.Chapter 2 Graph Mat hing A mat hing in a graph G = (V.

(v . (v . a mat hing of maximum ardinality (jM j  jM 0 j for any other mat hing M 0 ). (v . (v . v ). v ). (v . v ). v )g is a mat hing. v )g has more edges. (v . (v . Is M a maximum mat hing? 1 2 3 4 5 6 8 7 10 1 v1 v4 v7 v2 v5 v8 v3 v6 v9 No. M = f(v . Example: v1 v4 v7 v2 v5 v8 v3 v6 v9 v10 In the above graph. v ). 2 1 2 3 5 4 7 6 8 46 9 10 v10 . M = f(v :v ). v ). in other words. v ).nding a maximum mat hing M .

(u j . v5 . the largest possible mat hing in a graph with jV j nodes then mat hing M is a perfe t mat hing. v . uk i is alled an alternating path if edges (u .t. v) is a mat hed edge. Those with even rank are alled inner. v is inner. v . P starts with an exposed vertex: v . v ) and (v . v5 ). Then M 0 = M 4 P is a mat hing of ardinality jM j + 1. and inner verti es: v . v ) and (v . If (u. ::: are free. v2 . v4 . v .1 Let P be the set of edges on an augmenting path p = hu1 . v . 2 2 4 1 47 5 6 7 10 9 . u ). while (v . M . v . v . Verti es that are not in ident upon any mat hed edge are alled exposed. v . Verti es that lie on an alternating path starting with an exposed vertex and have odd rank on this path are alled outer. v . u ). :::. v . u . ::: are mat hed. v . (v6 . v . v . In our previous example: P = hv . u ). v3 ). v . v . v8 g is an alternating path. v ) are free edges. Note that in our example: P = hv . u j ).r. v . v o upy even positions in P and are therefore inner verti es.As a matter of fa t. v1 v4 v7 v2 v5 v8 v3 v6 v9 v10 The edges (v . v . v i. :::. the remaining verti es are mat hed.r. 1 2 3 5 4 8 2 1 1 3 3 4 5 1 4 2 5 8 1 2 1 4 5 6 8 1 3 1 9 8 7 10 5 8 9 10 4 9 9 1 9 2 6 7 9 6 8 1 2 1 4 5 6 8 7 10 9 Theorem 2. v . u2 . :::. (v4 . (v4 . v and v are outer. v . 2 j Vj When the ardinality of a mat hing M is su h that jM j = b . v . the mat hing M . v . v8 ). v . Re all that if A and B are sets. then A 4 B = (A B ) [ (B A) is the symmetri di eren e of A and B . other edges are free. The edges of mat hing M are alled mat hed edges. every vertex v 2 G (ex ept maybe one) is mat hed by M . (u . u2k i in a graph G w. v3 . it is an alternating path with: v exposed. uk i is alled augmenting if both verti es u and uk are exposed. P = hv . whereas (u . Note that in a perfe t mat hing. v . M is a maximum mat hing sin e any mat hing of G an never have more than jV j = 5 edges. 2 An alternating path P = hu . v . u ). v i is also an alternating path. v i is an augmenting path w. and outer verti es: v . :::.t. 2 2 2 1 2 1 Example: 1 2 2 2 3 4 5 2 2 3 4 2 +1 Consider the previous graph with mat hing: M1 = f(v2 . (u . v ) are mat hed edges. Similarly for P = hv . v10 )g P1 = hv1 .0. v . :::. v ). (v . (u j . and v . u . v . v is exposed. v and v (verti es with odd rank) are alled outer. A path p = hu . then u is the mate of v. So v . v i is an augmenting path. u j ).

v7 ). they are free edges and both are of the form (u j . (v5 . (v7 . v3 ). (v8 . (v6 . Show that jM 0j = jM j + 1 where M 0 = M 4 P . u k i is su h that u and u k are exposed. sharing the same vertex in M . e and e0 . u j ) (for di erent values of j). (v5 . 1 Example: v1 v4 v7 v2 v5 v8 v10 v9 v6 v3 Note that jM 0 j = jV2 j = 5. sin e both edges are not in M . M 0 is a mat hing 2.r. Sin e M 4 P = (M P ) [ (P M ). But u j is a vertex of the edge e00 = (u j . (v10 . Now let us go ba k to proving the theorem. u ). (v4 . Proof: 1. M 0 . (v8 . In ase 2. we have two edges. v19 g. u ). M 0 is an maximum mat hing. v10 g: M1 4 P = (M1 P ) [ (P M1) = f(v2 . (v5 . v6 ). (v8 . :::. Case 3: e 2 M P and e0 2 P M . (v4 . v8 ). But it is impossible sin e M is a mat hing. v9 )g. So now we have two edges. P = hu . :::. v19 g: Note that 1. u ). v3 )g [ f(v1 . (u j . v4 ). (v7 . v10 ). e0 = (u j . In ase 3. u . suppose that e0 2 P M. Suppose that e 2 M 4 P and e0 2 M 4 P are in ident upon the the same vertex. It is impossible. u j ) are mat hed (are in M). We want to show that no two edges in M 4 P share the same vertex in G. v8 ). u ). v6 ).P = f(v1 . :::. we have 3 ases. (u . u j ) 2 M (a mat hing edge). M1 = f(v2 . In ase 1. in a mat hing M that share a ommon vertex. (u . (v1 . Case 2: e 2 P M and e0 2 P M . u j ) is su h that u j is su h that u j is also an endpoint of some edge e 2 M P . v4 ). (v6 . v3 ). (v10 . v5 ). Case 1: e 2 M P and e0 2 M P . v7 ). (v10 . jM 0 j = jM j + 1 = 4 + 1 = 5. The proof is by ontradi tion. e and e0. v4 ). it does not make any sense to try and look for an augmenting path w. ) M 0 = f(v2 . v5 ). v7 ). 2 1 2 2 2 1 2 2 3 2 2 2 2 2 1 1 3 2 +1 2 1 4 2 4 5 1 2 2 2 +1 48 2 . and therefore annot be in ident to the same vertex. 2. Show that M 0 = M 4 P is a maximum mat hing. u j ) are free (not in M) and (u . So we have just shown that M 0 is a mat hing. (u .t. (u j . v6 ).

we have more edges from M' than from M . P begins and ends with edges in M 0 .r. M. The edges that are in M and not part of P . The proof is by ontradi tion.0. But sin e jM 0 j > jM j. The y les are of even length and we have the same number of edges in M as in M 0 . k of the edges of P are free. P . ::: are free. u k ). :::. M then we have M 0 = M 4 P a new mat hing with jM 0 j = jM j + 1 whi h ontradi ts the fa t that M is a maxmimum mat hing. 1 2 3 2 3 4 2 4 5 1 2 2 2 +1 k 1 edges k edges M P M 0 = (M jM 0 j = jM P j + jP M j (sin e (M P ) \ (P M ) = . by inverting the edges of the augmenting path P : The mat hed edges of M are free edges w. ( We have to show: If there is no augmenting path in G w.r. u ). 2. Suppose there is no augmenting path in G w.Note that there are 2k 1 edges in P : (u .e. 3. and (u . M 4 M 0 ) will have the following properties: 1.) ) jM 0 j = jM j + 1 [M \ P has one vertex less than P M ℄ 2 Note that in the previous theorem we in reased the size of mat hing M by one. (u k .t. (u k .r. u ). :::. the subgraph G0 = (V. M ) Dire t result from the previous theorem. in other words. the . G0 is a olle tion of y les and paths. M and that M is not a maximum mat hing. So there exists some mat hing M 0 of G su h that jM 0 j > jM j. (u . to obtain mat hing M 0 . it must be the ase that in one of the paths.e. Consider M 4 M 0 = (M M 0 ) [ (M 0 M ) The edges in M 4 M 0 form a subgraph (that might be dis onne ted) of G. i.t. one of the edges is in M and the other in M 0 . M 0 .t. then M is a maximum mat hing. (This is true for edges that are on the augmenting path). u ). ::: are in M .2 A mat hing M in a graph G is maximum Proof: . k 1 of the edges of P are mat hed. (u .t. P ) [ (P M) Theorem 2.r. u ).r.t. All verti es will have degree 2 or less. If the degree of a vertex is 2. u k ). Sin e two edges of a mat hing annot be in ident upon the same vertex. and the free edges of M are now mat hed edges in M 0 . there is no augmenting path in G w. i. If there is su h an augmenting path w. will of ourse be in M 0 .

rst and last verti es of 49 .

M ). Edmonds was the . 2 Note the big similarity between this theorem and what we have seen with the onstru tion of maximum ows in ow networks.r.t. But this is a ontradi tion sin e we assumed that G does not have an augmenting path. and so P is an augmenting path in G w. Therefore M is a maximum mat hing. M .P are free verti es (w.r.t.

rst one to show how to .

Next. sear hing for an augmenting path P .nd an augmenting path w. in any given graph. in reases M by 1 and repeats the pro ess until no augmenting path a be found.t. we restri t our attention to . Edmonds algorithm starts with an empty mat hing set M . a mat hing M .r. in polynomial time.

all the edges of E go between L and R. E ) in whi h V is partitioned into two sets L  V and R  V su h that 8 < u 2 L and v 2 R (u.nding maximim mat hings in bipartite graphs. 2. v) 2 E ) : or u 2 R and v 2 L In other words. The problem of .1 Maximum Bipartite Mat hing A bipartite graph is an undire ted graph G = (V.

A maximum mat hing provides work for as many ma hines as possible. we de. For instan e L: set of mat hings.nding a maximum mat hing in a bipartite graph has many appli ations. R: set of tasks to be performed simultaneously. Given a bipartite graph G = (V. One way of solving the mat hing problem for bipartite graphs is to relate it (or redu e it) to the maximum ow problem for ow networks. Thus. E ) with vertex partition: V = L [ R. The ow in the network problem will orrespond to mat hings in the maximum bipartite mat hing problem. v) in E : a parti ular ma hine u 2 L is apable of performing a parti ular task v 2 R. any problem that solves the maximum ow problem ould now be used to solve the bipartite mat hing problem. Edge (u.

v 2 R and (u. ) Give ea h edge in the original graph G a dire tion so that all edges go from L to R.ne the orresponding ow network G0 = (V 0 . tg. u)ju 2 Lg [ f(u. R = fv . So E 0 = f(s. d) Assign unit apa ity to ea h edge in E 0 . v . v g with edges shown in the . s 2 V 0 and t 2 V 0 and let them be the sour e and the sink of G'. E 0 ) as follows: a) Add two new verti es. t)jv 2 Rg. L = fu . v)ju 2 L. u . u g. b) Add new dire ted edges from s to the verti es of L. E ). So V 0 = V [ fs. Example: Given the bipartite graph G = (V. u . v) 2 E g [ f(v. and add new dire ted edges from the verti es of R to t. v .

1 2 3 4 1 2 3 4 50 .gure.

The .

u ) = 4.1. v) 2 V  V then we say that the ow f is integer-valued. G0 = (V 0 . (u . v ). the orresponding ow network G0 . De. v ). v) is an integer for all (u. a) M is a mat hing in G ) there is an integer-valued ow f in G0 with jf j = jM j. Proof: a) Suppose that M is a mat hing in G. v )g jf j = P f (s. v ). Theorem 2. b) f is an integer-valued ow in G0 ) there is a mat hing M in G with jM j = jf j.1 G = (V. E 0 ) is the orresponding ow network. 4 jM j = 4 u1 1 3 2 2 3 4 4 1 v1 1 u1 1=1 1=1 u2 v2 1=1 1=1 v3 u3 1=1 v4 u4 L R L 1=1 1=1 v2 1 1 1=1 t v3 1 u4 v1 1=1 u2 s u3 i i=1 1=1 1=1 1=1 v4 R If f (u. (u . E ) is a bipartite graph with vertex partition V = L [ R.gure also shows a maximum mat hing. (u . M = f(u .

s) = f (v. v) = f (v. ea h edge in M.ne the following ow fun tion f in the orresponding ow network G0 : (u. v) = 1 In other words. orresponds to 1 unit of ow in G0 that goes along the path s ! u ! v ! t. v) 2 M ) f (s. v) 2 E 0 . For all other edges (u. v) 2 M . (u. u) = f (u. we de. t) = 1 f (u. u) = f (t.

It is easy to show that f thus de. v) = 0.ne f (u.

ned satis.

So we have that the value of the ow we de. T ) = jf j for any ut (S. T ). Note that we have a ut (L [ fsg.es the three properties of: skew symmetry. R [ ftg) = jM j. By a previous theorem we know that (S. apa ity onstraint and ow onservation. R [ ftg). in the ow network G0 and that the apa ity of the ut is su h that: C (L [ fsg.

ned: jf j = jM j. 51 .

b) Let f be an integer-valued ow in G0 . We need to de.

ne a mat hing M su h that: jM j = jf j. De.

v ) have integer values. there exists one v 2 R net ow enters u su h that f (u. R) = f (L. t) and f (L. v) > 0g is indeed a mat hing in G. at most one edge leaving ea h u 2 L arries positive net ow. v)ju 2 L. The integrality theorem: If all apa ities are integers. s) = f (s. L) = f (s. V 0 ) = 0. V 0 ) f (L. we show that jM j = jf j. v) > 0)g. t) = 0 (no edges between L and t) ) jM j = f (L. and by using properties studied in the se tion on maximum ow: f (L. So M = f(u. s) f (L. v f (v. Similarly for ea h vertex v in R. we show that M is a maximum mat hing in G. v)ju 2 L. v 2 R and f (u. f (L. First. tg [ L).ne M = f(u. Vertex u has at most one unit of positive net ow entering it. f (u. u) = 0 From the onstru tion of G0 we have that: Ea h u 2 L has one entering edge (s. u) and its apa ity is 1. But R = V 0 (fs. V 0 ) = jf j 2 So to ompute a maximum mat hing in G. then the maximum ow f produ ed by the Ford-Fulkerson algorithm is su h that jf j is integer-valued and all f (u. there is at most one edge entering ea h vertex v 2 R with positive net ow. Se ond. v) = 1 Thus. v) > 0 . we run the Ford-Fulkerson algorithm in G0 . L) = 0. assures the su ess of . v 2 R and f (u.. L) f (L. Flow fun tion f is integer-valued ) for ea h u 2 L: one unit of positive . unique 1 u s . f (L.

As for the running time of . Moreover. it is not diÆ ult to show that the ardinality of a maximum mat hing in a bipartite graph G is the value of a maximum ow in its orresponding ow network G0 .nding a mat hing in G.

Note that the resulting network ow problem has a spe ial stru ture. and sin e jf j = jM j. R) = O(jV j).nding a maximum mat hing in a bipartite graph. Bipartite problems are easy to solve sin e they an be modeled as network ow problems and thus an be solved by using any of the many algorithms we have studied. the omplexity is O(jV jjE j). the value of the maximum ow in G0 is O(jV j). Any mat hing in a bipartite graph has ardinality at most min(L. and thus one an re.

Nonbipartite mat hing problems require spe ialized ombinatorial algorithms. Nonbipartite mat hing problems are more diÆ ult to solve be ause they annot be transformed to standard network ow problems. 52 .ne algorithms with better worst- ase omplexity for bipartite mat hing problems than normal network problems.

jobs. Assuming that ea h person an perform at most one job at a time.2. ma hines. a) The Personnel-assignment problem: A ompany requires a number of di erent jobs.1 Bipartite Personnel Assignment In many problems we wish to assign people to obje ts: e. but not others. how should the jobs be assigned so that the maximum number of jobs an be performed simultaneously? w1 Example: j1 w2 j2 w3 j3 w4 j4 w5 j5 w6 w7 jobs workers The maximum mat hing whose edges appear in red shows that all .2 Appli ations 2. 2. rooms or even ea h other. and ea h worker is suited for some of these jobs.In the next se tion we over some appli ations that an be found in [AMB93℄ [Wes96℄ and [CLR90℄.g.

breast. ea h of whom will then swim one of the 4 stroaks: ba k.ve jobs an be assigned to workers. The sum of times obtained by optimally mat hing four strokes gives the minimum feasible relay time and the orresponding team is the best team. Ea h edge represents a suitable job assignment. How an we assign the jobs so that the maximum number of jobs an be performed simultaneously? b) A swimming oa h must sele t from her best 8 swimmers a medley relay team of four. The oa h knows the time of ea h swimmer in ea h stroke. The problem is to identify the team of the 4 best swimmers out of the eight that are available. butter y and free-style. There are 5 jobs and 7 workers. 53 .

The obje tive of the AMA is to identify a stable assignment. An assignment that is not unstable is said to be stable. The Ameri an medi al asso iation (AMA) performs a mat hing pro ess in whi h the graduates rank the hospitals a ording to their preferen es and the hospitals rank the graduates a ording to their preferen es. This alignment is unstable be ause both the graduate i and the hospital j have an in entive to hange their urrent assignments. ) Assigning medi al s hool graduates to hospitals Thousands of do tors graduate from medi al s hool in the USA. The AMA assigns the graduates to hospitals so that the mat hing is \stable". An assignment is unstable if some graduate i is nor assigned a hospital j .2. hospital j prefers graduate i over one of the graduates assigned to it. They are then eligible for residen es at the di erent hospitals around the ountry. 2. But the graduate prefers hospital j over his/her urrent assignment. and at the same time.2 Non bipartite Personnel Assignment Pairing volunteers for a res ue mission Several .

2 2. What is the maximum number of pairs that an be sent out on the res ue mission? Use a graph where ea h vertex represents a volunteer.rst-aid workers from around the world have volunteered for a res ue mission in some third-world ountry that has been stru k by a disaster. but the members of ea h team must speak the same language. and an edge between volunteers i and j means that both speak the same language. The volunteers are to be divided into two-person teams. A maximum mat hing in the graph orresponds to a maximum set of two-person teams.3 Transversals for families of subsets Suppose that A is a nonempty .

a. :::. eg. su h that: ai 2 Si for ea h i = 1. d. :::. :::. S5 g where: S1 = fa. S4 = fd. . e. b. A transversal (or system of distin t representatives) of family F is a sequen e T => a . dg. 2. . S . a . eg F = fS1. S2 . bg. ar > of r distin t elements of A. Then representatives for 2 Note that to solve the \assigning medi al s hool graduates to hospitals" appli ation is in essen e redu ed to . S3 = f . = fe. S4 . S3 . Sr g is a family of (not ne essary distin t) nonempty subsets of A. a > is a transversal.nite set. eg and S5 T =< b. or a system of distin t family F . d. 1 1 2 2 Example: A = fa. bg. S2 = fb. d. . and F = fS . d. r.

nding a transversal for the family F = fS . Sr g where: Si : is the set of all hospitals that . S . :::.

E ) is a bipartite graph with vertex partition: V = L [ R. Mat hing M is said to be L-saturating if ea h vertex in L is an endpoint of an edge 1 54 2 . then one has to he k the transversal to see if the assignment is stable. and we have r graduates who applied for residen es.nd appli ant i a eptable. we assume that we do not have any ranking of hospitals by the graduates or any ranking of graduates by hospitals. If we do take the ranking into onsideration. In this appli ation. Suppose that G = (V. and that M is a mat hing in G.

a. . The theorem makes use of the set of neighbors of a subset of verti es. eg l3 S4 = fd. Then N (W ) denotes the subset of R onsisting of all verti es in R that are adja ent to at least one vertex in W . l . d. a > 2 1 2 3 4 5 2. Let W 2 L. :::. d. An L-saturating mat hing is also known as a omplete mat hing from L into R. eg. . bg l1 a S2 = fb. (l . S g and S = fa. an g (i. 1 1 2 1 2 2 1 2 Consider our previous example where A = fa. d. dg. E ) is a bipartite graph with vertex partition V = L [ R.e. Suppose that G = (V. :::. either all verti es of L are mat hed (Lsaturating) or all verti es of R are mat hed (R-saturating). a)g that orresponds to the transversal T =< b. eg and S = fe. a . Similarly. S . (l . a .4 Hall's theorem Hall's theorem gives a ne essary and suÆ ient ondition for a bipartite graph to have an L-saturating mat hing. E ) with vertex partition V = L [ R in the following fashion: L = fl . S . The following is the 1935 theorem for bipartite graphs by Hall. b. . :::. M is R-saturating if ea h vertex in R is an endpoint in M . S = fd. Note that every L-saturating mat hing and every R-saturating mat hing must be a maximum mat hing. (l . S = fb. and an edge exists between li and aj if aj 2 Si . d). ). Sr g of (not ne essarily distin t) subsets of A. Example: 1 1 2 3 4 S1 = fa.in M . we onstru t a bipartite G = (V. S . S . bg. b). lr g where ea h li orresponds to subset Si in F . eg l4 d S5 = fe. e). and R = fa . bg. a. bg l5 e sets 2 3 5 elements of A L-saturating mat hing : f(l . . F = fS . N (W ) is the set of neighbors of W . Given a set A = fa . (l . eg. dg l2 b S3 = f . ar g and a family F = fS . 55 4 5 . sin e in either ase. Then a transversal for family F orresponds to an X-saturating mat hing of graph G. :::. e. S = f . d. R = A).

.. . i. s . every vertex w 2 W is mat hed to a vertex y 2 R by an edge in M . . Let (X. Then ut(X. Y ) an be expressed as the disjoint union of three ar sets. Sin e M is L-saturating. it suÆ es to show that a minimum ut in G0 has apa ity jLj. So the only thing we have to prove is that any other ut of G0 . . . will be su h that: apa ity (X.1 Let G = (V.. the set of edges from W to Y [ N (w).e. E 0 ). So jW j  jN (W )j. . L W Y t Y \ N (W ) . By the max- ow min- ut theorem. by the onstru tion of G0 itself. jW j  jN (W )j. . . G has an L-saturating mat hing . W .Theorem 2. (X. .4. C1 : C2 : 56 . X .. . Ea h su h y is in N (w). L R Note that all the edges of ut(X. E ) be a bipartite graph with vertex partition V L [ R. that: ut(fsg. Y )  jLj. Y ) are in 3 sets: the set of edges from s to L W . ( Suppose that Hall's ondition holds. Y ). Let W be any subset of L. and let W = X \ L. Y ) be any ut in G0 . . . V 0 fsg ftg) has apa ity jLj. for ea h subset W ofL. We know. Note that an L-saturating mat hing in G would be a maximum mat hing in G and would orrespond to a maximum ow in G0 with value jLj. . Proof: = ) Suppose that we have an L-saturating mat hing M in G. Constru t the orresponding ow network: G0 = (V 0 . . sin e all edges leaving s to L have apa ity one. jW j  jN (W )j for every W  L.

Note that these edges are red in the .C : the set of edges from X \ R to t.

Y ) = C [ C [ C . Y ) = jC + jC j + jC j (sin e the sets are pairwise disjoint) = jL W j + jC j  jX \ R j (by onstru tion of G0 all apa ities = 1). jC j  jY \ N (w)j (by de. So ut(X. ) apa ity of ut(X.gure.

we have: jX \ N (w)j  jX \ Rj. Y )  jL W j + jN (W )j  jL W j + jW j = jLj ) Minimum ut in G0 has apa ity = jLj 2 3 1 2 3 1 2 3 2 2 2 Hall's theorem an be restated in terms of a family of subsets. the proof is exa tly the same as the one we just saw. giving the ne essary and suÆ ient ondition for the family to have a transversal.nition of neighbors) = jN (w)j jX \ N (w)j Sin e N (w) = [N (W )j jX \ N (w)℄ [ [Y \ N (w)℄. So jN (W )j  jW j. Hall's theorem for transversals: Let A be a nonempty . Y )  jL W j + jN (W )j jX \ N (w)jX \ Rj Sin e N (w)  R. Therefore: apa ity of ut(X. Capa ity of ut(X. Y )  jL W j + jN (W )jX \ Rj + jX \ Rj = jL W j + jN (W )j But we assume that Hall's ondition holds. So jC j  jN (w)j jX \ N (w)j and: apa ity of ut(X. Obviously.

Sr be a family of nonempty 1 2 family F has a transversal . = fS . the union of any k of the subsets Si ontains at least k elements of A. :::.nite set. and F subsets of A. One of the . S . for 1  k  r.

G has no augmenting path w. M is a maximum mat hing in G But this is not very pra ti al: exploring all alternating paths w. M would take a very long time.rst appli ations of Hall's theorem is the marriage problem: Given a set of women.r.t. then there must be a omplete mat hing using ompatible pairs. But what if G does not have a omplete mat hing? How an we show that a ertain mat hing is maximum? We know that: A mat hing M in a graph G . For example. If every man is ompatible with k women and every women is ompatible with k men. We know that a omplete mat hing is a maximum mat hing. Hall's theorem is known as the \marriage theorem". The name arises from the s enario of a symmetri ompatibility relation between a set of n men and a set of n women. if jLj = jRj. ea h of whom knows a subset of men. under what onditions an ea h of the women marry a man whom she knows? Note that this problem has many variations.r. We wold rather .t.

57 .nd an expli it stru ture in G that forbids a mat hing larger than M.

S  V . If the graph G represents a road network in some village that ontains no traÆ lights. Examples: 1. E ) is a set S of verti es.5 Maximum Mat hings and Minimum Vertex Covers A vertex over of a graph F = (V. We say that the verti es in S over the edges of G. then the problem of assigning the minimum number of poli e person in order to observe and dire t traÆ in the entire road network is in essen e solvable by . su h that S ontains at least one endpoint of every edge of G.2. A minimum vertex over is a vertex over with the least number of verti es. and su h that the roads are straight (and ontains no isolated verti es).

v .nding the minimum vertex over of the network. v . v . if one an . v g is a minimum vertex over of G. v ). v . (u . Consider the following bipartite graph G = (V. (u . v )g is a maximum mat hing. u1 v1 u2 v2 u3 v3 u4 v4 u5 v5 L R Mat hing M = f(u . v ). jM j = 5. (u . jS j = 5. 2. Consequently. S = fv . (u . then jM j  jS j. IF M is a mat hing in a graph G and S is a vertex over of G. This is due to the fa t that no two edges e and e0 of M an be overed by a single vertex v. E ) with vertex partition V = L [ R. v ). and so the size of S is at least the size of M .

then we know that M is a maximum mat hing and that S is a minimum vertex over. We an always .nd a mat hing M in a graph G and a vertex over S su h that jM j = jS j.

nd a maximum mat hing M and a minimum vertex over S (with 1 1 2 3 4 2 2 3 3 5 58 5 4 4 5 1 .

e)g is a maximum mat hing. E ) be a bipartite graph with vertex partition V = L [ R.jM j = jS j) for bipartite graphs. with vertex partition): V = SL [ (R SR ).1 Theorem (Konig) Let G = (V. Proof: Let S be a minimum vertex over of G. E1 ) of G indu ed on the vertex biparti- tion (i. Konig's theorem is a min-max theorem sin e it states equality between answers to a minimization problem and a maximization problem. The theorem is due to Konig. In graph G: M = f(b. ). 1 59 . Theorem 2. S = fa. Example: a b e graph G d Next we formally prove that for bipartite graphs. dg is a minimum vertex over and jM j < jS j. the size of maximum mat hings and minimum vertex overs are identi al. . (d. S = (S \ L) [ (S \ R) Let SLS \ L and SR = \R. But that does not hold for nonbipartite graphs. The number of edges in a maximum mat hing in G is equal to the number of verti es in minimum vertex over of G.5.e. Consider the bipartite subgraph G1 = (V1 .

So jW j  jNG1 (W )j. ontradi ting the fa t that S is a minimum vertex over. NG1 (W )  R SR . E ) V = SL [ (R SR ) Suppose that W is any subset of SL. In other words. . ) 9w 2 W su h that NG1 (W fwg) = NG1 (W ) ) (SL fwg [ SR = S is a vertex over of G. . sin e G satis. L SL SR L R G = (V . By Hall's theorem. .S SL R SR . Suppose that jW j > jNG (W )j. Let NG1 (W ) denote the neighbor set of W of verti es in G . . . We are going to show that jW j  jNG (W )j.

2 1 1 1 1 1 1 1 1 1 2 1 1 2 2 2 2 2 2 1 1 2 2 The Konig-Egervary theorem is just a di erent interpretation of Konig's theorem whi h involves 0/1 matri es. 60 . with jM j = jSR j. G has an SL -saturating mat hing M . we have that M is a maximum mat hing and S is a minimum vertex over. we have that G has an SR -saturating mat hing M .es Hall's ondition. Note that jM = M j [ M is a mat hing in G and jM j = jM j + jM j = jSL j + jSR j = jS j.5. Now. By an argument similar to the above. Then the maximum number of ones in A.2 The Konig-Egervary theorem: Let A be a 0/1 matrix. let G = (V . E ) be a bipartite subgraph with vertex partition: V = (L SL) [ SR . Theorem 2. no two of whi h lie in the same row or olumn. with jM j = SL j. Sin e in general jM j  jS j and here we exhibited a mat hing and a vertex over of equal size. is equal to the minimum number of rows and olumns that together ontain all the ones in A.

su h as in s heduling. 2. E ) with vertex partition V = L [ R and use A as an adja en y matrix of G. There are 3 rows and 4 olumns of ones. 2 The Konig-Egervary theorem has a few appli ations. note that the mat hing marked in red in G orresponds to the three ones marked in red in A. and R is the vertex set orresponding to the olumns. Example: In the above example. Step one: Given A. There are three of them. where L is the set of verti es orresponding to the rows of matrix A. it onsists of two steps. onstru t a bipartite graph G = (V. 4g = 3 (whi h is the same as the number of ones marked in red) 2 As to the proof of the theorem.Example: 2 3 1 0 1 0 A=6 4 0 0 1 1 75 1 1 0 0 The ones in red are su h that no two are in the same row or olumn. And the three verti es of L is a minimum vertex over. Example: 2 0 1 0 A=6 4 0 0 1 1 1 0 Step two: 3 1 7 1 5 0 L R Apply Konig's theorem.1 The Bottlene k Problem Suppose that a manufa turing pro ess onsists of .5. minf3.

ve operations that are to be performed simultaneously on .

61 .ve ma hines.

ma hine operations M1 M2 M3 M4 M5 op1 4 5 3 6 4 op2 5 6 2 3 5 op3 3 4 5 2 4 op4 4 8 3 2 7 op5 2 6 6 4 5 The time in minutes it takes ea h ma hine to perform ea h of the .

We would like to determine if it is possible to s hedule all .ve operations is given in the table.

3 2 1 0 1 0 1 66 0 0 1 1 0 77 A=6 66 1 1 0 1 1 777 4 1 0 1 1 0 5 1 0 0 1 0 It is possible to .ve operations so that the whole pro ess is ompleted within 4 minutes. Constru t a matrix A whose aij entry is su h that:  if opi takes at most 4 minutes on ma hine Mj aij = 01 otherwise A is a 5  5 matrix.

They are marked in red. it is possible to omplete the whole pro ess within 4 minutes.nd 5 ones no two of whi h are in the same row or olumn. Therefore. As a matter of fa t. the .

duration op1 is assigned to M 5 : 4 minutes op2 is assigned to M 3 : 2 minutes op3 is assigned to M 2 : 4 minutes op4 is assigned to M 1 : 4 minutes op5 is assigned to M 4 : 4 minutes Sin e the operations are performed in parallel. all 5 will .ve ones give the assignments.

The problem onsists in .nish within 4 minutes. 2 2. vj ) and wij  0. E ) in whi h ea h edge e 2 E has a weight w(e) = wij where e = (vi .6 Weighted mat hing We are given a graph G = (V.

If it is not the ase.nding a mat hing of G with the largest possible sum of weights. just let wij = 1 for all (vi . 62 . we an assume that we have an even number of nodes. Note that the solution with largest sum of weights will always be a omplete mat hing. a bipartite weighted graph an be assumed to be a omplete bipartite graph with vertex partition V = L [ R where jLj = jRj. vj ) 2 E . Note that the previous (unweighted) graph mat hing problem is a spe ial ase of the weighted mat hing problem. So. One an also assume that instead of G we have a omplete graph letting the weights of the edges that are missing in G be equal to zero. just add a node with edges of weight zero in ident to it. Similarly.

ta kling the weighted bipartite problem is easier than dealing with general weighted graphs. The Hungarian method solves the weighted mat hing problem for a omplete bipartite graph with 2jV j nodes in O(jV j ) arithmeti operations. The assignment problem an be des ribed by a linear program.It is sometimes more onvenient to think of this problem as a minimization problem by onsidering osts ij where: ij = W wij with W = maxfwij g + 1. sin e it solves the best assignment of tasks to workers problem where the value of wij represents the "produ tion" of the ith worker when given task j to perform. On e again. The bipartite weighted mat hing problem is also known as the assignment problem. 3 63 .

Chapter 3 String Mat hing Given a string Pm alled a pattern and a longer string Tn alled the text. the (exa t) string mat hing problem is to .

Example: Pm = aba. 9. 2 Some o the more ommon appli ations are in: a) word pro essors b) utilities su h as grep on UNIX ) textual information retrieval programs su h as MEDLINE. 7.nd all o urren es. LEXIS or NEXIS d) library atalog sear hing programs that have repla ed physi al ard atalogs in most large libraries e) Internet browsers and rawlers that sift through massive amounts of text available on the Internet for material ontaining spe i. Note that two o urren es of Pm may overlap. Tn = bbabaxababay Pm o urs in Tn starting at lo ations: 3. of pattern Pm in text Tn . if any.

Pm is then shipped one pla e to the right. Genbank is the major U. 3. 64 . keywords f) Internet news readers that an sear h for the arti les for topi s of interest g) telephone dire tory assistan e h) on-line di tionaries and thesauri i) numerous spe ialized databases. and the omparisons are restarted from the left end of Pm . DNA database for example.1 Naive String Mat hing The naive method aligns the left end of Pm with the left end of Tn and then ompares the hara ters of Pm and Tn left to right until either two unequal hara ters are found or until Pm is exhausted. RNA and amino a id strings. in whi h ase an o urren e of Pm is reported. or pro essed patterns alled motifs derived from the raw string data. In either ase. su h as mole ular biology databases holding raw DNA. The pro ess is repeated until the right end of Pm shifts past the right end of T.S.

Tn Pm SHIFT ! INA Here. the template (pattern Pm = INA slides next to Tn = AGRICULT URE . j + +). The pattern in the . from left to right. AGRICULTURE.String is O(mn)... So the running time of Naive. i + +) for (j = 0. (T [i + j ℄ 6=0 n00 ) ^ (P [j ℄ 6=0 n00 ) ^ (T [i + j ℄ == P [j ℄). and the inner loop runs in O(m) time.Naive-String for (i = 0. The naive string mat hing algorithm an be visualized as a sliding template ontaining the pattern Pm over the text Tn. if (P [j ℄ ==0 n00) then output "found a mat h" Naive-String has two nested loops: The outer loop runs in O(n) time. and marking down the shifts for whi h all of this hara ters on the template equal the orresponding hara ters in the text Tn . T [i℄ 6=0 n00 .

and we still have no mat h.gure has been shifted 3 (s = 3) units. we introdu e notions and terminology that we use to investigate string mat hing algorithm that are more eÆ ient than the naive algorithm.2 String Mat hing Algorithms [CLR90℄ Notation: P [1::m℄ array of text of length m (also denoted by Pm ) P !<x ! is a pre. In the next se tion. 3. Naive-String is not optimal for the string mat hing problem.

i. x = !y for some string y 2 P  !=x ! is a suÆx of x.e.e. x = y! for some string y 2  Pk k- hara ter pre. i.x of x.

mg x !  (x) = length of longest pre. 1.x P [1::k℄ of P 1::m℄ Note: P0 =  Pm = P = P [1::m℄  auxiliary fun tion (aka suÆx fun tion orresponding to P ) P  :  ! f0. :::.

e.x of P that is a suÆx of x i. P = abba Then  () = 0... g. ! . l l .  ( aba) = 1.... The speed up in KMP is based on the following observation: 65 .  (bb ababb) = 3 3. It uses an auxiliary fun tion [1::m℄ whi h is pre omputed from the pattern..3 Knuth-Morris-Pratt Algorithm [KMP℄ KMP is a linear-time string-mat hing algorithm. b.  (x) = maxfkjPk < xg T (x) P X . l mat h P Example: = fa.

jj a b a b P q ! jj P a b a b k ! s ! s0 = (s + 4) ! The ...jj a a b a b a a .

s + 2. KMP will not perform the s + 1. no mat h by performing a s + 3 shift. S + 3 shifts. From the 7 (value of q) hara ters we know that we an get no mat h by performing a s + 1 shift. It will pro eed dire tly to s + 4. what is the least shift s0 > s su h that: P [1::k℄ = T [s0 +1::s0 + k℄ where s0 + k = s +9 ? Su h a shift s0 (s0 = s +4 in our example )is the . no mat h by performing a s + 2 shift. How? The information it needs is ontained in  that it omputes from P before even starting the mat hing pro ess.rst 7 hara ters of P mat h the hara ters of T after a shift s of length 4. Ba k to the strings T and P : Given that pattern hara ters P [1::9℄ mat h text hara ters T [s + 1::s + 9℄. but a potential mat h at a s + 4 shift sin e a of P is aligned with a of T .

we know that the next potential mat h might o ur at a shift s0 = s + (9 k). k < 9 su h that Pk < Pq ? On e we ompute k.rst shift greater than s that would yield a potential mat h.e. The pre omputed information that KMP uses is the pre. So an equivalent question to the one above is: What is the longest value of k. Pk ) is a suÆx of Pq . Note that a (i.

[9℄ is the length of the longest pre. 1. :::. mg ! f0. m 1g 9 ! [9℄ where [9℄ = maxfkj(k < 9) ^ (Pk < Pq )g i. :::.e. 2.x fun tion for the pattern P :  : f1.

x of P that is a proper suÆx of Pq . Let P = a ba b T P1 Pq a P2 a P3 a P4 a b P5 a ba a a ba a a ba a a ba b a b P6 P7 P8 Pk a [1℄ = 0 a [2℄ = 0 a [3℄ = 0 a b [4℄ = 0 ba [5℄ = 1 ba [6℄ = 2 ba [7℄ = 3 a b [8℄ = 4 66 . Examples: 1.

[6℄ = 4.2. 5g = 5 Similarly. [5℄ = 3. 3. q 1 2 3 4 5 6 7 8 9 10 P [q ℄ a b a b a b a b a  [q ℄ 0 0 1 2 3 4 5 6 0 1 Pq P1 a P2 ab P3 aba a abab ab ababa aba ababab abab abababa ababa abababab ababab abababab P4 P5 P6 P7 P8 P9 P10 Pk a [1℄ = 0 ab [2℄ = 0 ba [3℄ = 1 ab [4℄ = 2 ba [5℄ = 3 ab [6℄ = 4 ba [7℄ = 5 ab [8℄ = 6 abababab [9℄ = 0 ababababab a a bababab a [10℄ = 1 = abababab a For example for q = 9 we have: P = a b a b a b a a b a b a b a [k = 1℄ a b a b a b a [k = 3℄ a b a b a b a [k = 5℄ P 7 [7℄ = maxfkj(k < 7) ^ (Pk < P g = maxf1. If P = abababab a then [1℄ = [2℄ = [9℄ = 0. [3℄ = [10℄ = 1. for q = 8 well have: [8℄ = 6 P = abababab But note that for q = 9 we have: Pq = abababab and so [9℄ = 0 and . [7℄ = 5 and [8℄ = 6. [4℄ = 2.

nally P = abababab a and [10℄ = 1 7 8 10 Remark:  : f1. :::. [9℄ is the length of the longest pre. m 1g 9 !  [q ℄ where [9℄ = maxfkj(k < q) ^ =Pk < Pq )g i.. mg ! f0. 1. :::.e. 2. 2.

It shows how mu h of the beginning of the string mat hes up to 67 .x of P In Example 2: q 1 2 3 4 5 6 7 8 9 10 P [q ℄ a b a b a b a b a  [q ℄ 0 0 1 2 3 4 5 6 0 1 2 that is a proper suÆx of Pq The auxiliary fun tion  indi ates how mu h of the last omparison an be reused if it fails.

it is sometimes alled the failure fun tion. The portion immediately pre eding q = 9 is: [8℄ = 6. If the omparison fails at q = 9 for instan e. The omparison fails at q = 9. Prefix-Fun tion(G) m length of pattern P [1℄ = 0 k 0 for q 2 to m while 1 (k>0) 2 k  [k ℄ //this is a spe ial ase ^P [k + 1℄ 6= P [q℄) //the mat h ? //we have a mat h so //in rement k if P [k + 1℄ = P [q℄ then k  [q ℄ k return  //is this the end of k+1 1 Note that the whole loop is a tivated only when k > 0 (and not the very .the portion immediately pre eding a failed omparison. so ababab in positions 3 to 8 mat h ababab in positions 1 to 6. For t hat reason. then we know that a and b in positions 3 and 4 are identi al to a and b in positions [3℄ = 1 and [4℄ = 2.

2 P [q℄ 6= P [k + 1℄ and so now we go ba k in P to . when q = 2). First we must have a mat h P [k + 1℄ = P [q℄ to in rement k (to k + 1).rst time.

e. Note that all positions q 1. q 2. KMP(T. q 4.q 1℄: 68 . ::: are he ked. T [i℄ 6= P [q + 1℄ : T [6℄ 6= P [3 + 1℄ ) q [3℄ [i. P ) n length of T m length of pattern P [1℄ Prefix-Fun tion(P ) q 0 for i 1 to n while (q > 0) ^ P [q + 1℄ 6= T [i℄) //pattern mat h has failed q  [q ℄ //move pattern ba k if P [q + 1℄ = T [i℄ //potential pattern mat h then q q + 1 if q = m //su ess of pattern mat h then print \Pattern o urs with shift i m" q  [q ℄ end Example: T = BANANORANANANANO P = NANA 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 B A N A N O R A N A N A N A N O N A N A N A no omparison is performed here. q 3.nd the 1 hara ter that mat hes P [q℄.

i = 1 2 3 4 5 6 7 8 9 10 11 12 13 14 jj text partially mat hed jj T : a a b a b a a . q℄ we have a mismat h at i: T [i℄ 6= P [q + i℄.after T [6℄ 6= P [3 + 1℄ we ompare T [6℄ and P [1 + 1℄ be ause we know that T [5℄ = P [1℄. jj jj a b a b P:  [q ℄ ! ! jj position to whi h the pattern is shifted q ! Suppose T [i q..e.. At this point we know that: P [1::[q℄℄ is the longest proper pre. P ) assumes that we have omputed the auxiliary fun tion. q 3) Compare P [[q℄ + 1℄ to T [12℄ whi h has the e e t of having shifted [9℄ + 1 hara ters (where q is the last position in P where we have a hara ter mat h). The pro edure outputs: Pattern o urs with shift 8 Pattern o urs with shift 10 KMP(T. :::. Prefix-Fun tion(P ) gives: P N A N A 2  0 0 1 2 i : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 jj jj T a a b a b a a . i 1℄ mat hes P [1. :::. q: 1 2 3 4 5 6 7 8 a b a b P jj jj a b a b P In the while loop: P [7 + 1℄ 6= T [12℄ q [7℄ (i...

So now the question is how far ba k do we move the pattern to ontinue with the sear h (and to look for another potential pattern mat h).x of P [1::q℄ that is also a suÆx of P [1::q℄. q > 0 and P [q + 1℄ 6= T [i℄ whi h signals the end of a hara ter mat h and therefore the urrent potential pattern mat h has failed. one of three things happens: 1. KMP now ompares T [i℄ to P [[q℄ + 1℄ 2 Pro edure KMP(T. So now let us ompare: P [q + 2℄ and T [i + 1℄ (at the next iteration of the for loop). P [q + 1℄ = T [i℄ whi h means that we have a hara ter mat h and the urrent potential pattern mat h is still possible. In every iteration through the for loop. P ) ompares P [q + 1℄ and T [i℄. So P is shifted [q℄ + 1 hara ters to the right so that we now have: T [i [q℄::i 1℄ mat hing P [1::[q℄℄ and we ontinue. 2. 69 .

3.1 Analysis of Pre.3. So the pro edure announ ed it..3. we now move ba k the pattern to resume the sear h for the next possible pattern mat h. q = m signals the su ess of a pattern mat h (all m hara ters of the pattern were mat hed). and as was the ase with 1.

The potential has 0 as initial value (see (1)). and sin e q is in remented by one at ea h iteration while k is either: a) in remented by one (for some iterations (see 3)) or b de remented (see 2) Well still have k < q (at all times). where it an be in remented by one for ea h iteration of the for loop. The . So we have k < 9 upon entering the for loop. The other pla e where k an hange value is (3). Sin e [k℄ < k. Perform an amortized analysis where the potential is the variable k of the algorithm. k is de reased (see (2)) but never be omes negative sin e [k℄ > 0 for all k.x-Fun tion Prefix-Fun tion(G) m length of pattern P [1℄ = 0 //this is a spe ial ase (1) k 0 for q 2 to m while (k>0) ^P [k + 1℄ 6= P [q℄) //is this the end of the mat h ? (2) k  [k ℄ if P [k + 1℄ = P [q℄ //we have a mat h so in rement k (3) then k k + 1  [q ℄ k return  Claim: The running time of Prefix-Fun tion is O(m). So the amortized ost of the for loop body is O(1).

P ) n length of T m length of pattern P (5) [1℄ Prefix-Fun tion(P ) (1) q 0 for i 1 to n while (q > 0) ^ P [q + 1℄ 6= T [i℄) (2) q  [q ℄ if P [q + 1℄ = T [i℄ (4) then q q + 1 if q = m then print \Pattern o urs with shift i m" (3) q  [q ℄ end 70 //pattern mat h has failed //move pattern ba k //potential pattern mat h //su ess of pattern mat h .nal potential fun tion is at least as great as the initial potential fun tion. 3.2 Analysis of KMP KMP(T. So the total a tual worst- ase running time of Prefix-Fun tion is O(m). and the for loop iterates m 1 times.3.

Sin e [q℄ < q.The running tome of KMP is O(n). so the amortized ost of the for loop body is O(1). q is de reased (see (2 and 3)) but never be omes negative sin e [q℄  0. The potential has an initial value of 0 (see (1)). The . The potential fun tion is in reased by at most one (see (4)). Perform an amortized analysis there the potential is the value of q.

e. Also suppose that T [s + 5 + 1℄ = 2. p and t . :::. Use Horner's rule to ompute p and ts . 9g. ts = d(ts d m T [s + 1℄) + T [s + m + 1℄ ts = 10(31415 10 (3)) + 2 sin e ts = 31415 = T [s + 1::s + m℄ i. The number asso iated with pattern p. 0 ts+1 = d(ts d(m 1) T [s + 1℄) + T [s + m + 1℄. whi h takes time O(1) (on e the numbers are omputed). tn m an all be omputed in O(n + m) and we an thus . So for instan e the string 31415 orresponds to the de imal number 31. Note: ts = p . :::. But KMP makes use of Prefix-Fun tion (at (5)) whi h is O(m). whenever a omplete mat h o urs or a hara ter mismat h o urs. 2 KMP solves the pattern mat hing problem in O(m + n) instead of O(mn) of the naive algorithm.4 Rabin-Karp Algorithm The Rabin-Karp algorithm interprets the symbols of the alphabet. Claim: 3. T and P . 1.nal potential fun tion is at least as great as the initial potential fun tion. the text has '2' right after 5 of 31415. T [s + 1::s + m℄ In other words. as numbers and onsiders the strings. where d : shifts left ts d(m 1) T [s + 1℄ : removes the high order digit T [s + m + 1℄ : adds the low order digit. T [s + 1℄ = 3 So ts = 10(31415 30000) + 2 = 10(1415) + 2 = 14152. whi h is what we have. p = P [m℄ + d(P [m 1℄ + d(P [m 2℄ + ::: + d(P [1℄):::) where P [1℄ is the highest order digit. Example: Suppose d = 10. . and P [m℄ is the lowest order digit. m = 5 and ts = 31415. The whole idea is based on shifting the pattern as far to the right as possible without bypassing potential mat hes. where d = j P j. ::s + m℄. t . Comparison of a pattern with part of the text involves omparing 2 numbers. s is a valid shift if and only if ts = p. The value t an be similarly omputed from T [1::m℄ in time O(m). in other words. 415 when we let  = f0. So the time omplexity of KMP is O(m + n). Let p be the number orresponding to P [1::m℄. as numbers to the base d. So the total a tual worst- ast running time of the for loop of KMP is O(n). The time to ompute p is O(m) assuming ea h arithmeti operation takes O(1) time. t }| { z 3 1 4 1 5 2 2 {z } | ( +1 1) 3 +1 +1 s ts+1 Sin e ts an be omputed from t in O(1) time. and the for loop iterates n times. and ts be the number orresponding to T [s + 1.

nd all o urren es of the pattern +1 1 71 0 1 .

Perform all the arithmeti mod q. su h that dq . However. if m and d are large. this running time is unrealisti .= P [1::m℄ in the text T = T [1::n℄ in time O(n + m). for some prime number q of base d.

In this ase. 1. This will allow all of the ne essary omputations to be performed with single-pre ision arithmeti . :::. Thus. we need to expli itly ompare P [1::m℄ and T [s +1::s + m℄ whi h takes O(m) time. Note P mod q = 0.ts within one omputer word. 3 1 2 2 7 3 1 5|{z}5 P +1 ( 1) 55 mod 3=1 5 5|{z}3 1 2 2 7 3 1 2 5 5 3|{z}1 2 2 7 3 1 1 5 5 3 1 2|{z}2 7 3 1 0 mat h at position 4 5 5 3 1 2 2|{z}7 3 1 0 spurious hit at position 6 5 5 3 1 2 2 7|{z}3 1 1 5 5 3 1 2 2|{z}7 3 1 1 5 5 3 1 2 2 7 |{z} 3 1 1 Note that t = T [6. 9g P = 12 and q = 3. whenever we have ts  p(mod q) we should ompare ts and p to determine if we have a valid pattern mat h or just a spurious hit. 7℄ = 27 and 27  p(mod 3) But 27 6= p So we might have ts = p(mod q) but T [s + 1::s + m℄ 6= P [1::m℄ This is alled a spurious hit. So the re urren e equation with mod q is: ts = (d(ts T [s + 1℄h) + T [s + m + 1℄)mod q where h  d m (mod q). Example: T = 5 5 3 1 2 2 7 3 1  = f0. 2 6 72 .

P. d. q ) length of T //p mod q and ts mod q //are the .Rabin-Karp(T.

1 Analysis of Rabin-Karp A) Worst- ase analysis: 73 . we need the last if statement in the for loop (if s < n m) and obviously ts should not be updated if we rea h the end of T . Example: P = 31415 Rabin-Karp(T. q = 13 h dm mod q h = 10 mod 13 p 0 t 0 for i 1 to m 31415  7 mod 13 p (dp + P [i℄)mod q p  7 mod 13 t (dt + T [i℄)mod q 23590 = t  8 mod 13 for s 0 to n m t = T [1::m℄ = 23590 if p = ts p = 7 and t = 8 then if P [1::m℄ = T [s + 1::s + m℄ then print \Pattern o urs with shift s" +1 1 4 0 0 0 0 0 0 end if s < n m then ts+1 (d(ts +T [s + m T [s + 1℄h) 1℄) mod q t1 = 10[23590 2(1000)℄ + T [6℄ = 10(3590) + 2 = 35902 So next time through the loop he k p = 7 and t = 9 (sin e 35902  9 mod 13): 1 3. shifts //is p = ts where 0 //ts  T [s + 1::s + m℄ mod q //we have a pattern mat h //su ess of pattern mat h To make sure that ts = T [s + 1::s + m℄mod q whenever we he k p = ts at (*). P.ngerprints of P and ts m length of pattern P h dm mod q p 0 t 0 for i 1 to m p (dp + P [i℄) mod q t (dt + T [i℄) mod q for s 0 to n m (*)if p = ts //if p = ts . then if P [1::m℄ = T [s + 1::s + m℄ then print \Pattern o urs with shift s" n 1 0 0 0 if s < n m then ts+1 (d(ts T [s + 1℄h) +T [s + m 1℄) mod q end //loop omputes: //p P [1::m℄ mod q //t T [1::m℄ mod q //iterate through all pos.4. q ) n length of T T = 235902314152::: m length of pattern P P = 31415. and P [1::m℄ 6= T [s + 1::s + m℄ //then we have a spurious hit. d.

P. d. the veri. q ) n length of T m length of pattern P h dm 1 mod q p 0 t0 0 for i 1 to m p (dp + P [i℄)mod q t0 (dt0 + T [i℄)mod q Example (of worst ase) = am and T = an P for s 0 to n m if p = ts then if P [1::m℄ = T [s + 1::s + m℄ then print \Pattern o urs with shift s" if s < n m then ts+1 (d(ts T [s + 1℄h) +T [s + m 1℄) mod q end Sin e ea h one of the n m + 1 shifts is valid.Rabin-Karp(T.

| {z } without spurious hits Note that if q  (q is larger than P ) and the expe ted number of the valid shifts is small (O(1) let's say). B) Average ase analysis: Rabin-KarpT. the eÆ ien y of Boyer-Moore depends on determing the lo ation of the next mat hing attempt. Spurious hits are also known as signature ollisions. 1983℄. and then ontinuing with the sear h. where s is the number of pattern mat hes. Gonnet and Baeza-Yates present an implementation of the algorithm that goes away with the mod operations by virtue of the impli it modular arithmeti of the target hardware. :::. d. Volume 13. Here p = ts is always true. Let us assume that redu ing values modulo q is like a random mapping from  to Zq (Zq = f0.5 Boyer-Moore As is the ase with KMP. A more re ent implementation of the algorithm redu es the probability of a large number of ollisions by randomly resele ting the prime q (after a ollision). then Rabin-Karp runs in O(n + m) time. 1 3. onsistent with (j P j + 1)q not ausing arithmeti over ow. Pirklbauer uses a value of: q = 8355967 for jj = 256 [ see "A study of pattern-mat hing algorithms" in Stru tured programming. P. So the expe ted running time is: O(n) +O(m(v + nq )). ation will take ((n m + 1)m) time (m hara ters per iteration) The running time of Rabin-Karp is ((n m + 1)m) in the worst ase sin e we verify every valid shift. q ) has an expe ted time of O(n + m) plus the time required to pro ess spurious hits (when we expe t few valid shifts . Then the expe ted number of spurious hits is O( nq ) sin e the probability that for any given ts we have: ts = p mod q is q . 1992℄. Sedgewi k uses a value of: q = 33554393 for jj = 32 [see Algorithms. 2 The prime number q should be as large as possible. On e again. following a mismat h at the q +1 position of the pattern. reinitializing. 74 . Addison-Wesley. q 1g). given that the previous attempt was unsu essful.perhaps as low as O(1).

the pre eding q symbols of the pattern and their stru ture give insight as to where the next mat hing attempt should begin. Thus BM has the unusual property that. on the right) often allows the algorithm to pro eed in large jumps through the text being sear hed. we slide P along T from left to right. on the average. not all of the .. knowing how far to shift the pattern is the most important fa tor. he king orresponding hara ters. 2. In both algorithms. As with KMP. :::. the hara ters of P are he ked right to left after ea h movement of the pattern. But with the Boyer-Moore algorithm (BM). The information gained at the end of the pattern (i.For the KMP and for the Boyer-Moore algorithm. in most ases. Boyer-Moore is more eÆ ient than KnuthMorrisPratt algorithm. the pattern is prepro essed to determine the shift for a failure at the qth symbol of P for ea h q = 1. espe ially when the pattern P is long and the alphabet P is large.e. m.

The heuristi s allow BM to skip the examination of many text hara ters. If su h k does not exist (if the pattern does not ontain a hara ter su h that T [s + j ℄ = P [k℄. if any su h k exists. When a mismat h o urs. BM hooses the larger amount of the two and shifts the pattern by that amount.rst q hara ters of T are inspe ted. Claim: In rease s by j k. ea h heuristi proposes an amount by whi h the pattern an be safely shifted without missing a mat h. 1  k  m) then let k = 0. The number of hara ters a tually inspe ted (on the average) de reases as a fun tion of the length of the pattern P . s +1 s +j . Example: bad hara ter good suÆx z}|{ z }| { . whi h did not mat h pattern hara ter n 2 3. although i) a "good suÆx"  E of P mat hed in T . I T T E N N O T I C E T H A T ....5. s +m . we have: ii) a "bad hara ter" in T ... 1 j m 75 . Let k be the largest index. The two heuristi s an be viewed as operating independly in parallel. We have three ases to onsider:  Case one: k = 0  Case two: k < j  Case three: k > j Case one: k = 0 . mismat h \ ::: l l mat h . \ l l . 1  k  m. su h that T [s + j ℄ = P [k℄. R E M I N I S C E N C E |{z} | {z } no mat h mat h Shift s is invalid.. 1  j  m.1 The Bad-Chara ter Heuristi Suppose we have just found a mismat h: P [j ℄ 6= T [s + j ℄ for some j. BM uses two heuristi s that allow it to avoid mu h of the work that KMP performs. The heuristi s are: the bad- hara ter heuristi and the good suÆx heuristi ...

1  k  m. T [s + j ℄ is not found in p. k = 0.j Here. BM will ignore this re ommendation and will hoose the re ommendation suggested by the good-suÆx heuristi ... T [s + j ℄ = P [k ℄ We slide p to the right by j k hara ters to align T [s + j ℄ with P [k℄... we an safely in rease s by j k without missing any valid shifts. just to the right of the bad hara ter). mismat h \ ::: l l mat h .. Case three: k > j Here j k < 0.. and so the bad- hara ter heuristi is proposing to de rease s. Sin e k is the largest index... C A T C A T C A T C A T C A T C A T and here we .. s +1 s +j . the bad hara ter.. We in rease s by j k = j .e. s +m . Example: T = THIS IS A DELICATE TOPIC T H PI =SCAT I S A D E L I C A T E . So we shift p until it has passed over the bad hara ter) (i. 1k j m j The rightmost o urren e of the bad hara ter is in the pattern to the left of position j . Case two : k < j ..

... \ C A T \ 76 .nd a mat h: C A T Note that we have made 9 omparisons up to this point (the 6 mismat hes shown +3 for the mat h). .. C A T E T O P I C . and that the text has 17 hara ters.

BM relies on the re ommendation given by the good-suÆx heuristi (whi h is always positive). 1 k j m j [T [s + j ℄℄ 1 k m As for ase three. s| {z + j} s| {z + 1} s| +{zm} . i. The results are saved in . j k < 0. m. In other words. In other P words... In this ase. l ::: l l mat h ..5. [a℄ is the index of the rightmost position at whi h a o urs for ea h a 2 . So the bad- hara ter heuristi will re ommend to in rease s by j [T [s + j ℄℄. 3. P Compute-Last-O urren e P (P. where T [s + j ℄ = P [k℄. I T T E N N O T I C E T H A T .. m.. j [T [s + j ℄℄ sin e the bad hara ter T [s + j ℄ o urs to the right of position j . k = [T [s + j ℄℄.2 The Good-SuÆx Heuristi Good SuÆx z }| { . return  The running time of Compute-Last-O urren e is: O(j P j + m).e. Example: P = reminis en e a  [ a℄ r 1 e 2 9 12 m 3 i 46 n 5 10 s 7 8 11 Re all that BM is interested in  in ase two and ase three.. we in rease s by j k where k is the largest index. 1  k  m.. In ase two. l l R E M I N I S C E N C E R E M I N I S C E N C E The good-suÆx heuristi proposes to move the pattern to the right by the least amount that guarantees that any pattern hara ters that align with the good suÆx 77 . ) for ea h hara ter a 2  [ a℄ 0 for j 1 to m [P [j ℄℄ j Compute-Last-O urren e(P..C A TP 2 ) omputes the rightmost position in the pattern P (of length m) at whi h ea h hara ter of P o urs.

where 0 is the pre.. [j ℄ is the least amount we an advan e s and not ause any hara ters in the good-suÆx: T [s + j + 1::s + m℄ to be mismat hed against the new alignment of the pattern... 9g ) [11℄ = 3 . Example: & .. In other words. then the good-suÆx heuristi assures that we an safely advan e s by [j ℄ where: [j ℄ = m maxfkj0  k < m and Pk < P [j + 1::m℄ or P [j + 1::m℄ < Pk . with j < m. l l P j R E M I N I S C E N C E P [j + 1::m℄ R E M I N I S C E N C E CE < REMINISCE ) Pq so k = 9. N E \ l R E M I N I S C E N C E R E M I N I S C E N C E 2 It an be shown that: [j ℄ = m maxfkj0  k < m and Pk < P [j + 1::m℄ or P [j + 1::m℄ < Pk is equivalent to: [j ℄ = min(fm [m℄g [ fl 0 [l℄j1  l  m and j = m 0 [l℄g). Example: In the previous example: T [s + j ℄ = I 6= N = P [j ℄ The good suÆx is: T [s + j + 1::m℄ = CE P = REMINISCENCE CE is a suÆx of REMINISCE So Pq = REMINISCE and k = 9 So shift by m k = 12 9 = 3 hara ters.previously found in the text will mat h those suÆx hara ters. we move p 3 hara ters to the right to align CE of T with the new CE of P . The shifts are omputed by pro edure Compute-Good-Suffix and are saved in . So in out example.T s m .. I T T E N N O T I C E T H A T . If P [j ℄ 6= T [s + j ℄. So shift P by m k = 12 9 = 3 hara ters 1 2 3 4 5 6 7 8 9 10 11 12 R E M I N I S C E N C E We just showed that [10℄ = 3 sin e [10℄ = 12 maxf9g Note that: [11℄ = 12 maxfkjE < Pk or Pk < E g E < RE and E < REMINISCE ) E < P and E < Pq So [11℄ = 12 maxf2..

m)  Prefix-Fun tion(P ) P 0 reverse(P ) 0 Prefix-Fun tion(P 0 ) for j 0 to m [j ℄ m  [m℄ for l 1 to m j m  0 [l ℄ if [j ℄ > l 0 [l℄ then [j ℄ l 0 [l℄ T [s+j ℄ [ + [ ℄ 2 return 78 ℄ .x fun tion of P 0 . Compute-Good-Suffix(P. the reverse of P . Pro edure Compute-Good-Suffix is a straightforward implementation of the last equation.

m) s 0 while s  n m j end m while j > 0 and P [j ℄ = T [s + j ℄ j j 1 if j = 0 then output "Pattern o urs at shift" s s s = [0℄ else s s + max( [j ℄. ) n length of T m length of P P  Compute-Last-O uren e(P. 3. If they do. j [T [s + j ℄℄) 3. BM performs better than other string mat hing algorithms. m. P.3 Analysis of Boyer-Moore P Boyer-Moore(T. ) Compute-Good-Suffix(P. P.5.4 Longest Common Subsequen e Suppose we have the same problem as before.5. ) Compute-Good-Suffix(P. ) n length of T m length of P P  Compute-Last-O uren e(P. j [T [s + j ℄℄) end O(jej + m) O(m) O(n m + 1) O(m) The worst- ase running time of the Boyer-Moore algorithm is O((n m +1)m + j P j). a text T of length n and a pattern P of length m. Example: 79 . 2 Having overed the bad hara ter pro edure and the good suÆx pro edure. we are now ready to ta kle the Boyer-Moore algorithm. m) s 0 while s  n m j m while j > 0 and P [j ℄ = T [s + j ℄ j j 1 if j = 0 then output "Pattern o urs at shift" s s s = [0℄ else s s + max( [j ℄. we say that P is a subsequen e of Y . In pra ti e. We would like to know if the letters of P appear in order (but possibly separated) in T . P Boyer-Moore(T. m.Example: P R E M  0 0 0 P0 E C N 0 0 0 0 R E j [j ℄ 12 12 I N I S C E N C E 0 0 0 0 0 0 0 0 0 E C S I N I M E R 1 2 0 0 0 0 0 1 0 M I N I S C E N C E 12 12 12 12 12 12 12 3 3 1 The running time of Compute-Good-Suffix is O(m).

y2 . G. yn >.. This is the ommon subsequen e problem.P = tennis Is P a subsequen e of T ? Yes. :::xm > and Y =< y1 . When biologists . Note that here the text and the pattern have similar roles (or symmetri roles). Instead. :::. x2 .T = polyte hni institutes . C. so that we don't need to all them "text" and "pattern" anymore. orresponding to the 4 submole ules forming the DNA. There are several areas where we would want to solve the longest ommon subsequen e problem: 1) mole ular biology: DNA sequen es (genes) an be represented as sequen es of 4 letters: A. T . all them X =< x1 . 2 What if the pattern does not o ur in T ? It makes sense to ask for the longest subsequen e that o urs both in the pattern as well as in the text.

they typi ally want to know what other sequen es it is most similar to.nd a new sequen e. One way of omputing how similar two sequen es are is to .

nd the length of their longest ommon subsequen e. 2) .

le omparison: The UNIX program "di " ompares 2 di erent versions of the same .

to determine what hanges have been made to the .le.

It works by .le.

nding a longest ommon subsequen e of the lines of the 2 .

Note: Any line in the subsequen e has not been hanged. Here it is onvenient to think of ea h line of a .les. In other words. what is displayed is indeed the remaining set of lines (that have hanged).

3) s reen redisplay: Many text editors. like "Ema s".le as being a single hara ter in a string. display part of a .

le on the s reen. updating the s reen image as the .

:::. Y ) also maintains a table b[1::m. Y returns the b and tables. 2 To solve the longest ommon subsequen e. these programs want to send the terminal as few hara ters as possible to ause it to update its display orre tly. j ℄ values in a table [0::m.jj>>00and and xi 6= yj polyte hni institutes Pro edure LCS(X. j 1℄ + 1 : max( [i. b[i. j ℄) ifii>>00. Y ) ould have been designed without the expli it use of table b. 0::n℄ whose entries are omputed in a row-major order. so the LCS has length 0. Also note that LCS(X. It saves the [i. Here. j ℄ = [i 1. .le is hanged. Pro edure LCS(X. If either u = 0 or j = 0. :::. j ℄ is the length of an LCS of the sequen es Xi and Yj . 80 . n℄ ontains the length of an LCS of X and Y . 1::n℄ to simplify the onstru tion of an optimal solution. and [m. y2 . For slow{in terminals. In general. Intuitively. Pro edure LCSX. j ℄. Note that the elements for whi h we have " . we have: 8 if i = 0 or j = 0 <0 if xi = yj [i. where [i. j ℄ points to the table entry orresponding to the optimal subproblem solution hosen when omputing [i. yn > as input. the ommon subsequen e is the part of the display that has been orre ted and does not not need to be hanged." are the hara ters that belong to an LCS. one of the sequen es has length 0. Y ) takes 2 sequen es X =< x1 . xm > and Y =< y1 . x2 . j 1℄: [i 1. we use dynami programming (re all the string alignment problem and the edit problem).

j 1℄ + 1 b[i. j ℄ 0 for i 1 to m for j 1 to n if xi = yj then [i. j ℄ b[i. j ℄  [i 1. j ℄ [i 1. j ℄ " -" else if [i.02/VirusDete tion/ 81 1 2 2 2 3 4 . sin e we are assuming that ea h table entry takes O(1) time to ompute.ma te h. j ℄ then [i. 0℄ 0 for j 0 to n [0.5 More Appli ations: Computer Viruses 1) Computer Viruses  From: http://www. j ℄ [i.LCS(X. Y ) m length of X n length of Y for i 0 to m [i. j 1℄ b[i. om/arti les/ma te h/ Vol. Example: F I T G E I S T R 0 0 0 0 0 0 0 0 E 0 0 T 0 0 E 0 0 R 0 0 I 0 1 T 0 1 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 1 1 2 2 2 1 1 2 3 3 2 2 2 3 3 3. j ℄ " "" else [i.5.08/08. j ℄ [i 1. Y ) is O(mn). j ℄ " " return b and The running time of LCS(X.

 Computer viruses have attra ted the attention of the publi . there has been an explosion in the development rate of omputer virus ode. the media. and the s ienti. Re ently.

There are 3 ategories of anti-viral produ ts  Infe tion Prevention Produ ts: halt the virus repli ation and prevent the initial infe tion from o urring.  Infe tion Dete tion Produ ts: dete t infe tion right after it has o urred and mark spe i.  Dete tion of the virus is simply onsidered a pattern string to be sear hed for in a larger text (a possibly infe ted program). Example: dev (Control Panel Devi e) Va ine on Ma intosh. ommunity.

omponents of system segments that have be ome infe ted. It is done by periodi ally inspe ting exe utable .

and may or may not attempt repair of an infe ted .les.

le. Examples: INIT pa kage GateKeeper and GateKeeper Aid on Ma intosh.  Infe tion Identi.

ation Produ ts: identify spe i.

Remarks  Their anti-virus pa kage uses a modi. returning the system to its state prior to infe tion. This antivirus is the most ommon and its e e tiveness dependents on the frequen y of the user invo ation. Examples: M Afee's MS-DOS program SCAN and the Ma intosh Disinfe tant series. viral strains on systems that are already infe ted and may remove the virus.

ation of the Boyer-Moore algorithm. More spe i.

de/jplag/  JPlan an dete t software plagiarism. is by no means omplete. su h as suÆx trees (whi h may be respe tively applied to randomized mat hing and dynami programming).uka.ira. 2) Dete ting Software Plagiarism Systems  A) From: http://wwwipd. like work in virus dete tion.  Modern algorithms make use of parallel hardware and improved data stru tures.  JPlag is a system that .  Work in string mat hing. ally. they use a randomized system of string sele tion.

nds similarities among multiple sets of sour e ode .

and S heme. C++. Ada.  JPlag urrently supports Java. Java. S 0 < S .les. Maier. by D. Pas al. C++. 1 2 82 .  S = s s :::sm is a sequen e with m elements. or S heme programs. ML. jS j = m: S 0 is a subsequen e of S .  B) Moss (Measure Of Software Similarity) is an automati system for determining the similarity of C. if it is a sequen e whi h onsists of S with a number of terms between 0 & m deleted. C. Lisp. 1) Common Subsequen es  From: \The Complexity of Some Problems on Subsequen es and Supersequen es".

is the shortest sequen e S 0 . R = fS .b. and jAj = 5. the shortest ommon sequen e of R.p. . su h that S < Si for i=1..W. is the longest sequen e S..6 The Longest Upsequen e [by E. Sp g is a set of sequen es with alphabet A...  Example If R=fababe.. su h that Si < S 0 for i=1.p. 1 3.e. abde g then (a) A = fa.eg. :::.2.Dijkstra℄ De.5. SCS(R). i. (b) LCS(R) = abe..2.d. the set of values the di erent s0i s an take. abe. and ( ) SCS(R) = abeabde ...  LCS(R).. the longest ommon subsequen e of R.  Similarly.

for whi h Li1  Li2      Li . : : : . is to . i. ik with 1  i1 < i2 <    < ik  n..ne an upsequen e of length k in an array L = L1 L2    Ln of numbers to be a sequen e i1 . given an array L. to be a not ne essarily ontiguous subsequen e of L whi h is non-de reasing from left to right.e. The problem.

the answer is 1. For example. Note that for any pre. if L is sorted in de reasing order.nd the greatest length of any upsequen e whi h o urs in L.

x B = L1    Lm of L (0  m  n). and for any length k of upsequen e whi h o urs at all in B . de.

Ln in turn. : : : . the biggest k for whi h a RETE exists will be the answer. As we onsider ea h of the numbers L1.e. we will . and at all times keep tra k of all the RETEs whi h exist.. largest) member of an upsequen e of length k in B .ne the re ord e onomi al top element (RETE) with respe t to k and B to be the smallest number whi h is the last (i. letting B grow from the empty sequen e to all of L. At the end. Pro eed from left to right.

k 83 .5) 3-upsequen e 7 (3.7) Note that 2 and 5 and 7 are RETEs for one subsequen e (ea h). RETE w. suppose that L1 = 3. and L4 = 7.nd that it is a new RETE in exa tly one way. L3 = 2.5) B = 352 1-upsequen e 2 (2) [sin e 2 is smaller than 3℄ 2-upsequen e 5 (3. L2 = 5. For example.5) B = 3527 1-upsequen e 2 (2) 2-upsequen e 5 (3. L[5℄ will also be a new RETE in exa tly one way.r.5.t B and k: B=3 RETE Upsequen es 1-upsequen e 3 (3) B = 35 1-upsequen e 3 (3) 2-upsequen e 5 (3.

CC g  (n) 8n instan es then (n) is a ratio bound. Note: 1. Fa t: (n)  (n) 1 Our goal is to . then if maxf CC . If C  denotes the optimal solution of an optimization problem. for minimization problems : 0 < C   C . and C the solution obtained by an approximation algorithm for the same problem. CC g  1 2.  An approximation algorithm has a relative error bound of (n) if: jC C C j  (n). The relative error of the approximation algorithm is jCC C j . We are assuming that all C 0 s are positive. ii) (n) = 1 .Chapter 4 Approximation Algorithms An algorithm that returns near-optimal solutions is alled an approximation algorithm. i) (n)  1 sin e maxf CC . For maximization problems : 0 < C  C  . C = C  .

nd approximation algorithms that yield very small ration bounds. For some problems. approximation algorithms have been developed that have .

1 The vertex over problem Given a graph G = (V. E ) . 4.xed ratio bounds. independent of n.

vg 6 remove from E 0 every edge in ident on either u or v 7 return C The algorithm has running time O(E ).1 Approx-Vertex-Cover has a ratio bound of 2.nd a subset V 0  V su h that if (u. v) 2 E .1. then either u 2 V 0 or v 2 V 0 . v) be an arbitrary edge of E' 5 C C [ fu. 84 . of smallest size. Approx-Vertex-Cover(G) 1 C 0 fC ontains vertex over being onstru tedg 2 E 0 E [G℄ 3 while E 0 6= 0 4 do let (u. Theorem 4.

No two edges in A share an endpoint. In other words. jjCCjj  2 2 Proof: 2 4. E ) is given that has nonnegative integer osts C (u. v) for the edges (u. Let A be the set of edges that are hosen in step 4 above. So jC  j  jAj [sin e C  is a over℄ Hen e. and jC j = 2jAj. No edges of A are adja ent. So ea h exe ution of step 5 adds two new verti es to C . (jC j = 2jAj) ^ (jC  j  jAj) ) jC  j  jC j i. v) 2 E . sin e right after pi king an edge. any over ontains one vertex for ea h edge in A. we remove from E 0 all other edges that are in ident on its endpoints (see step 6).We show that jjCCjj  2. We want to . First note that the algorithm returns a vertex over C sin e it loops till every edge in E [G℄ has been overed by some vertex in C .2 The Traveling-Salesperson Problem A omplete graph G = (V.e.

nd a Hamiltonian y le (or tour) of G with minimum ost. Let (A) = P u.v 2A (u. v) where A  E . Cost satis.

w)  (u. w 2 V . w) for all verti es u. ) 1 Sele t a vertex r 2 V [G℄ to be the root 2 Use Prim's algorithm to onstru t a MST T using r as root 3 Traverse the tree in preorder to form the list of visited verti es L 4 Return the Hamiltonian y le H that visits the verti es as spe i.es the triangular inequality if (u. A) The TSP with triangular inequality ( ) Approx-TSP-Tour(G. v) + (v. v.

We want to show that ((HH))  2. 2The running time of Approx-TSP-Tour is (E ) = (V 2 ) sin e the input is a omplete graph. Proof: H  denotes an optimal tour. A full walk of T .1 Approx-TSP-Tour is an approximation algorithm with a ratio bound of 2 for the TSP with triangular inequality. where H is obtained by the approximation algorithm.ed in L. Note: (T )  (H  ) sin e MST T is obtained by deleting an edge from a tour. lists the verti es of T when they are . The algorithm returns a tour whose ost is not more than twi e the ost of an optimal tour. Theorem 4.2. denoted by W .

By the triangular inequality. W is generally not a tour. verti es an be deleted from W without having the ost in reased. So (W ) = 2 (T ) therefore (W )  2 (H  ). So the ost of W is within a fa tor of 2 of the ost of an optimal tour. Thus W traverses every edge of T exa tly twi e. 85 .rst visited and then when they are returned to after visiting the subtree.

A returns a tour of ost no more than  times the ost of an optimal tour. (H 0 ) > jV j So a tour on G that is a Hamiltonian ir uit osts jV j. Suppose that for some   1. Convert G into an instan e of the TSP. Re all (u. So if G ontains a Hamiltonian ir uit. If G has a Hamiltonian ir uit H . E 0 = f(u.2. v) 2 E . v) 2 E (u. then A returns a tour of ost more than jV j. 2 So our assumption is in orre t and su h an algorithm A annot exist. then there exists no polynomial time approximation algorithm with ration bound for the general TSP. So (H )  (W ) [H has less verti es than W ℄  Re all: (W )  2 (H ) therefore HH  2. ) ontains a tour of ost jV j onsisting of the edges from H . v)ju. ). 2 B) The general TSP ( ( ) ) Theorem 4. So A an be used to solve the Ham-Cy le in polynomial time. v) = 1 if (u. Let G0 = (V. E ) be an instan e of Ham-Cy le. It is a Hamiltonian y le (every vertex is visited on e). e 2 V and u 6= vg. Apply A to TSP given by (G0 . We show. Without loss of generality assume  is an integer. We an thus systemati ally remove every vertex that has already been visited in W . Let G = (V.e. 9 a polynomial time approximation algorithm A with ratio bound . A returns it. E ) be the omplete graph on V : i. Proof: Example: 86 . We end up with a preorder walk of T .2 If P 6= NP and the ratio bound   1. and it is the y le obtained by the approximation algorithm. Proof by ontradi tion. If G does not ontain a Hamiltonian ir uit then any tour H 0 of G0 must use some edge not in E . Consider the TSP(G0 . Let H be the y le asso iated with that walk. Assign integer osts to edges of E 0 :  u. If G has no Hamiltonian ir uit. ). then (G0 . the result is having to go dire tly from u to w. any other tour osts > jV j. So (H 0 )  (jV j + 1) + (jV j 1) i.v u If v is deleted from W between visits w to u and w.e. v) = 1jV j + 1 if( otherwise The onversion from G to G0 is in polynomial time in jV j and jE j. that we an use A to solve Ham-Cy le in polynomial time.

F ) of the set- overing problem onsists of a .621 6 10 1 a 165 163 b 651 169 e 164 661 d Edges in green are su h that: (a. 4. A graph that ontains a Hamiltonian y le is a Hamiltonian graph. E ) is a simple y le that ontains ea h vertex in V .3 The Set-Covering Problem An instan e (X. ) = (e. ) = jV j + 1 = (5) + 1 = 5 + 1 A Hamiltonian y le of an undire ted graph G = (V.

e. We are interested in .nite set X and a family F of subsets of X . every element of X belongs to at least one subset in F . S 2 F overs its elements. su h that [ X= S 2 S F i.

F ) C Greedy-set- over U C X U: set of un overed elements C: ontains over being onstru ted . return C Example: 87 . Greedy-Set-Cover(X. the subset that overs as many un overed elements as possible. while U 6= . at ea h iteration. pi ks. The loop body an be implemented to run in O(jX jjF j) time. The algorithm an be implemented to run in polynomial time in jX j and jF j.nding a minimum-size subset C  F whose members over all of X : [ X= S 2 S C then overs it elements. jF j). So the algorithm an run in O(jX jjF jmin(jX j. The number of iterations of the loop [line 3-6℄ is at most min(jX j. jF j)). do sele t S 2 F that maximizes jS \ U j U U S hoose subset that ontains the C C [ fS g maximum number of un overed elements.

S g.e5 e5 e9 e2 e6 e10 e3 e7 e11 e4 e8 e12 An instan e (X. Proof: Let C  denote an optimal set over and Si the ith subset pi ked by the algorithm and C the set over onstru ted by the algorithm. where X onsists of 12 bla k points and F = fS . F ) of the set- overing problem. Spread the ost of sele ting Si evenly among the elements overed for the . S . S . S and S in that order. The greedy algorithm however produ es a over of size 4 by sele ting the sets S .1 Analysis of Greedy-Set-Cover Theorem 4. S g. S .3. S .1 Greedy-Set-Cover has a ratio bound H (maxfjS j : S where 2 F g) d X 1 H (d) = i=1 i is the dth harmoni number. 1 2 3 4 5 6 3 4 5 1 4 5 3 4.3. S . A minimum-size set over is C = fS . S .

Ea h element x 2 X is assigned a ost Cx when it is overed for the .rst time by Si .

rst time: 1 Cx = jS (S [ S [ ::: [ S ) [x is overed by Si ℄ In the Example: i 1 i 1 2 = e1 = e2 = e5 = e6 = e9 = e1 0 = jS11 j e3 = e7 = e8 = jS4 1 S1 j = 13 e4 = e12 = jS5 (S11 [S4 )j = 12 e4 = jS3 (jS11[S4 [S5 )j = 1 If C is returned by the algorithm. then 2 jC j = is the number of sets in C 1 6 X 2 x x X ) jC j  XX 2 2 S C x S 88 x .

In the Example: T = S . Let u = jT j and k e the least index su h that uk = 0. egj = 2 u = jT (T [ T )jj = jS (S [ S )j = jfe gj = 1 u = jT (T [ T [ T )j = (S [ S [ S )j = jfe gj = 1 u = jT (T [ T [ T [ T )j = jS (S [ S [ S [ S )j = 0 So k = 4. If T = S then u = jS j = 2: u = jT T j = jS S = jfe . Let ui = jT (T [ T ::: [ Ti )j for any T 2 F and i = 1.The proof is based on: whi h we prove later. :::. Note that: u u = 0 elements of S are overed for the . T = S and T = S [order in whi h sets are hosen℄. ui is the number of elements in T remaining un overed after T T :::Ti have been sele ted by the algorithm. T = S . So X 2 x  H (jS j) x S XX jC j  x 2 X  H (jS j) S 2C   jC  jH (maxfjS j : S 2 F g) jC j jC j  H (maxfjS j : S 2 F g) ) 2 S C x S 2 We still have to show that P x  H (jT j) 8T 2 F . 2. so that every element in T is overed by at least one of the sets T T :::Tk . jC j.

rst time by S u u = 1 element of S is overed for the .

rst time by S u u = 0 elements of S are overed for the .

rst time by S u u = 1 element of S is overed for the .

rst time by S So. ui  ui and ui ui elements of T are overed for the .

So X X 1 ui ) x = k(ui jT (T [ T [ ::: [ T )j 1 1 1 2 0 2 1 1 6 1 1 6 1 1 2 3 1 2 3 6 4 1 2 3 4 1 1 2 2 3 3 4 2 0 4 6 1 4 1 5 4 3 4 4 6 5 1 4 4 5 3 6 1 6 4 6 5 6 1 3 6 4 2 0 2 3 1 1 2 i i=1 x T 1 i 1 2 Note that: jTi (T [ T [ ::: [ Ti )  jT (T [ T [ ::: [ Ti )j be ause the algorithm is greedy and hooses Ti over T . for i = 1.rst time by Ti . :::. b with a < b: x T 1 1 1 2 i 1 i=1 = a +1 1 + a +1 2 + ::: + 1b b X 1 H (b) H (a) = i H (b) H (a) i=a+1 ) H (b) H (a)  (b a) 1b  1b + ::: + 1b | {z } b aterms 89 1 . So k X X 1 ui ) x  (ui u 1 2 1 1 2 1 2 Note: For integers a. 2. But ui = T (T [ T [ ::: [ Ti )j. k.

an we .4 The Subset-Sum problem Given(S. xi 2 Z and t 2 Z . t) where S = fx . :::.Ba k to X 2 k X (H (ui 1 ) H (ui ))  x i=1 x T = H (uo ) H (u ) + H (u ) H (u ) ::: + H (uk = H (u ) H (uk ) But u = jT j and uk = 0 X ) x  H (jT j) 1 1 2 1 H (uk ) 0 0 2 x T Fa t: Sin e 2 n X 1 k=1 jC j  ln n + 1 we have   ln jxj + 1 k jC j 2 4. xn g.

whose elements add up to the target value t? In the optimization problem.nd a subset S 0 of S . we want to .

L + x =< l + x. 16g It an be shown that: 1. :::. lm > is a list of numbers. if S = fxi jxi 2 Z . 9g P1 = f0. Pi = Pi [ (Pi + xi ). 2. Similarly. xi g and adding its members. l + x. Exa t-Subset-Sum(S. 1  i  ng then S + x = fxi + xjxi 2 S ^ x 2 Z g. Algorithm Exa t-Subset-Sum(S. Re all that given 2 sorted lists L and L0 . 5. 14. we an merge them (via MergeLists(L. 7g P3 = f0. t) n jS j L <0> for i 1 to n do Li Merge-Lists(Li . L0 ) for example) in O(jLj + jL0 j) time. lm + x >. x . 5. :::. l . 7. 5. Li + xi ) remove from Li every element > t return largest element in Ln Let Pi be the set of all values that an be obtained by hoosing a subset of fx .nd a subset S 0 whose sum is as large as possible but not larger than t. + 1 1 2 + 1 + + 0 1 1 2 Example: S = f2. An exponential time algorithm If L =< l . The subset an be . and 1 1 90 1 2 . t) returns a subset S 0 of S whose sum is as large as possible but not larger than t. 2g P2 = f0. :::. 9. 2.

w. 3. 11. Exa t-Subset-Sum(S. Li in Exa t-Subset-SumS. A polynomial approximation s heme is an approximation s heme fAg where ea h algorithm A runs in time polynomial in the length of the input instan e I. t) is an exponential time algorithm in general.t. y. 29> and Æ = 0:1. More pre isely every y 2 L that is removed.an error bound  and has the performan e guarantee: A(I. the both possible algorithms for that problem would be the existen e of a fully polynomial approximation s heme. 15. means to remove as many elements from L as possible su h that L0 . Remark: Having established that a ertain problem is NP{COMPLETE. the resulting list.2. 0 < Æ  1. We a hieve a fully polynomial time approximation s heme for the subset-sum problem by trimming (removing elements) ea h list Li after it is reated. and 3. 20. 23. Very few NP{COMPLETE problems have fully polynomial approximation s hemes. and . t) is a sorted list ontaining every element of Pi whose sum is  t. 22. is represented by a z 2 L0 su h that y y z > Æ (or equivalently: y z  Æy and therefore: (1 Æ)y  z  y) Ea h removed y 2 L is represented by a z 2 L0 su h that the relative error of z . A polynomial approximation s heme is an approximation s heme fAg where ea h algorithm A runs in time polynomial in the length of the input instan e I and  . To trim a list L by Æ. 1. For Example: Suppose L =<10. 21. is at most Æ. ) opt(I )  1 +  Note: This is a very desirable result. An approximation s heme for an optimization problem is an algorithm A whi h takes as input both: . onsists of elements z that represent the removed elements y 2 L. 10 2 L0 (smallest element always in L0 ) =  0:1 so drop 11 (it is represented by 10) = > 0:1 in lude 12 in L0 ( annot be represented by 10) = > 0:1 so 15 2 L0 > 0:1 so 20 2 L0  0:1 so 22 62 L0 > 0:1 so 23 2 L0  0:1 so 24 62 L0 and . 2.r.the problem instan e I . 24. 12.

23.nally > 0:1 so 29 2 L0 ) L0 =< 10. 20. 12. 15. 29 > 1 11 10 11 12 10 12 15 12 15 20 15 20 22 20 22 23 20 23 24 23 24 1 11 2 12 3 15 29 23 29 91 .

102. t. 407> < 0 . Example: Let 1 307 302 307 0 4 1 1 1 Theorem 4. 206>. by 302 404 > 308. t = 308 and  = 0:20. 104> L <0. Note that the 104 optimal answer is f104. 104. L =< 104. n ) remove from Li elements > t let z be the longest element of Ln 0 1 return z L =< 104. while the optimal solution is. 303> 303. 101. 201. 201. 102 . by 302 so so < 0 .Pro edure Trim(L. 206> and 206 is repr. 102. 201. 102. 102. Trim(L. Æ ) m jLj f most re ent element to join L0 g L0 < y > for i 2 to m f last annot represent yi g do if last < (1 Æ)yi then append yi onto L0 last yi 1 1 return L0 We make use of Trim (L. 102. 101g and 102 101 307 Approx-Subset-Sum (S. 302. It assumes that L is sorted in nonde reasing order. ) n jS j L <0> for i 1 to n do Li Merge-Lists(Li . 201. equal to 101 + 102 + 104 = 307. Æ) trims L =< y . 102.  = 0:2. be ause 2 <0. 203. 102. It returns L0 . as already shown above. 92 . 104> <0. 404> 303 is repr. <0. Æ) to onstru t an approximation s heme. by 201. 303. 206> merge drop 407 sin e <0. 101 . 303> 203 is repr. 102. 407 > 308. ) yields 302 whi h is = 0:016% from the optimum. 206> 102 6< (1 0:05)  104 <0. 303. so 3 <201. 201. Trimming parameter: Æ =  = 0:05 Li Trim after removal i Li Merge-Lists 1 L =<0. 102. t. 201. Example: t = 308. and 4 <101. 101 >. Li + xi ) Li Trim(Li . 407> ) so <0. <0. 302. Approx-Subset-Sum(S.1 Approx-Subset-Sum is a fully polynomial time approximation s heme for the subset-sum problem. ::: <m > in time (m). 101.4. by 101 remove 404 sin e <0. 206. a trimmed and sorted list. 407> merge 102 is repr. 101 >. 404> 203. 302. 303. 102. 302> ) <0. 201. 102. 404> Hen e z = 302. 104> L <0. Note L  0. 201. 201.

z . that for every element y 2 Pi where y  t. Approx-Subset-Sum is fully polynomial time Proof: 1. It an be shown. the relative error between y and y0 is at most n . by indu tion on i. there exists a z 2 Li su h that: (1 n )i y  z  y In parti ular. Consider dd (1 n )n . Trimming Li and removing from it e ery element greater than t is done in su h a way that ea h element of Li is the sum of elements of some subset of S . In other words. then y0y0 y  Æ = n . In other words. then there exists a z 2 Ln su h that: (1 n )n y  z  y and the largest possible z is returned by Approx-Subset-Sum. We have to show that the s heme is an an approximation algorithm with relative error bound . n d (1 dn d dn (1  n ) n = = = = ) > 0 so the fun tion n(1 n(1 n(1  (1 n  n n q(n) = (1 in reases with n. 93 . returned at the end of Approx-Subset-Sum is the sum of some subset of S . Note that if y represents y0 . Approx-Subset-Sum is an approximation s heme 2. if y 2 Pn denotes an optimal solution to the subset-sum problem.We have to show that: 1. We still have to show that z = C is su h that C  (1 )  z . we have to show that C C  C   or C  (1 )  C . In parti ular. So  n 1d ) d (1 n n  n 1 ) ( 1)( n  n 1  ) ( n2 ) n  n 1 ) n  ) n  ) n2  n ) n n > 1 ) q(n) > q(1) ) (1 n )n > 1  So (1 )y < (1 n )n y  z But re all: (1 n )n y  z therefore (1 n )y  z In other words yy z   therefore Approx-Subset-Sum is an approximation s heme.

So <S> 62 HITTINGSTRING.2. for whi h the ith hara ter of st and the ith hara ter of y are identi al. 1 1 2 2 3 3 Example:  S = f0  1. 110. 0 1 010 st y 100 010 110 010  0 010  S = f01. HITTINGSTRING = f<S > j S = fs . 100. : : : . Claim: HITTINGSTRING is NP{COMPLETE. 011g does not allow any hitting string. g where js j = js j = js j = : : : = jsk j = n <S> is in HITTINGSTRING if there exists a string y 2 f0.   0g y = 010 is a hitting string for S . 1. s . 1  i  n. 111. sk g of equallength strings over  = f0. Therefore <S> 2 HITTINGSTRING.   0. It an be shown the pro edure is polynomial in the length of the L0i s. s . y is a erti. Proof:  HITTINGSTRING 2 NP. 1gn su h that for ea h st 2 S there is some i.

1. we want to onstru t a set S of size k of n{long strings over  = f0. ate and it takes polynomial time to he k and verify. Note that  is satis. 0. g.  HITTINGSTRING is NP{HARD. To ea h lause p we asso iate an n{long string sp a ording to the following rule: 1 2 3 1 xi ith symbol of sp xi 2 p 2 p 62 p 62 p 2 p 62 p 2 p 62 p  1 0   = (x _ x _ x ) ^ x ^ (x _ x _ x _ x )  = x x x s = 11  0 s = 0    =x  = (x x x x ) s = 1  1 So S = f110. 11g. SAT p HITTINGSTRING Given  =  ^  ^  ^ : : : ^ k with n di erent variables.

able. a satis.

<S> 2 HITTINGSTRING 94 . Note y = 1010 is a hitting string for S .able assignment would be: x = x = 1 and x = x = 0. Example: 1 n = 4. The onversion te hnique (basi ally a table lookup) is done in polynomialtime ( onstru tion of all si 's takes O(kn) time). k = 3 2 4 3 1 1 2 3 1 3 2 4 2 3 2 1 3 1 2 3 2 1 2 2 3 4  2 SAT .

)  is satis. then at least one literal is the one that orresponds to a \hit" in <S>. onstru t a hitting string y a ording to the following rule:  put 1 in ith position of y if xi is true in A. (() : Given some set < S > with < S > 2 HITTINGSTRING. whi h is true ) A is a hitting string for < S > .()) : Given  =  ^ ^ ^ : : : ^k . Or more formally. 1gast su h that y is a hitting string of <S> and jyj = n Sin e ea h string in <S> orresponds to a lause p in .  2 SAT ) 9 a satisfying assignment A = f0. onstru t <S> by the outlined method above. 1g ) ea h p has at least one literal. the following holds: < S >2 HITTINGSTRING ) 9 y 2 f0.  put 0 in ith position of y if xi is false in A. Then y is a hitting string for S .

Use the set of djV j=2e verti es in the lique as erti.able )  2 SAT 1 2 3 HALF{CLIQUE{Problem HALF{CLIQUE = f<G> jlG isma graph with a lique of size jV j g Claim: HALF{CLIQUE is NP{COMPLETE 2 Proof:  HALF{CLIQUE 2 NP.

k> 2 CLIQUE . G0 has a lique of size jV j 2 2 1 2 2 1 2 2 2 95 . ate of G. qm are of degree 0 (they are not adja ent to any nodes). E ) and <G. add m verti es su h that k = jV j . then let G0 = G l m l m .if k = jV j . : : :m. <G0> 2 HALF{CLIQUE The redu tion depends on k: l m . Show that CLIQUE p HALF{CLIQUE The redu tion takes a graph G(V. v) 2 E .if k > jV j .  HALF{CLIQUE 2 NP{HARD. q . ql. v in the set. for every pair u. then let G0 = G [ fq . the edge (u. Che king whether it is a lique is a omplished in polynomial time by he king whether. qm g where the newly added nodes q . : : : . Now k = jV j and l m G has a lique of size k .

" is satis. 2 = 4 V 0 = V [ fq1 g l m l m . q .Example: b G = (V. i 6= j . qm g where the newly added verti es q . the \. G0 has a lique of size k + m Note that k + m = djV 0 j=2e and jV 0 j = jV j + m. C. In all ases.if k < jV j . E ) k=4 l jV j m =3 2 A b B Db b C b ) b F b E fA. B. : : : . Therefore jV j = 2k + m. q . qm are adja ent to all nodes of G and every q is adja ent to qj . 2 1 2 2 1 2 1 Example: b b b b b ) b b Clique of size 3 b b b b b b Clique of size 4 b b So now jV 0 j = jV j + m and G has a lique of size k . add m verti es su h that k + m = jV j then let G0 = G [ fq . Dg is a lique of size 4 A B Db b C b b q1 b F b E m l j Vj 0 0 G = (V . hen e 2k + 2m = jV 0 j and so 2k + 2m = jV j + m. E ). : : : .

ed. u. So CLIQUE p HALF{CLIQUE ) HALF{CLIQUE 2 NP{COMPLETE A hamiltonian path in a graph G = (V. E ) is a path that traverses ea h vertex exa tly on e. v> j 9 a hamiltonian path from u to v in graph Gg Claim: HAM{PATH 2 NP{COMPLETE Proof:  HAMP{PATH 2 NP sin e a path an serve as a erti. More pre isely: HAM-PATH = f<G. The redu tion is in polyonmial time.

v) 2 E g for some v 2 E . Adding nodes and he k is a hieved in polynomial time. E ) into G0 = (V 0 . ate and be he ked in polynomial time to be hamiltonian. u0. v )g [ f(u. u). wg b) E 0 = E [ f(u0. E ) where a) V 0 = V [ fu.  HAM{PATH 2 NP{HARD. 0 0 96 0 . v)j(v . Claim: HAM{CYCLE p HAM{PATH Transform any graph G = (V. (w.

k> j G is a graph ontaining a simpel path of length at most k from a to b g. v0 ) 2 E So C with (v0 .v2 b v b 1 b v3 (() ()) v5 b b v0 ) b v4 v2 b v b 1 b v3 b v5 ub b v0 b u0 w b b v4 G has a hamiltoninan y le . Call it C . v. The rest of P is a path traversing ea h node in V fv0 . v) 2 P . (u. v. If G has a hamiltonian y le C = fv0 . vg exa tly on e. Suppose (u. v) is a hamiltonian y le in G. v0 g then G0 will have the hamiltonian path P = [w. v0 . Claim: SPATH 2 P Proof: The following is a breadth{. The extreme edges of P have to be (u0 . v) 2 E 0 ) (v. u) and (v0 . : : : . b. G0 has a hamiltonian path Suppose G0 has a hamiltonian path P . u0 ℄. u. SPATH = f<G. : : : . a. w).

a simple path (whi h annot repeat verti es) of length exa tly n 1 is a hamiltonian path and vi e versa. b) 2 HAM{PATH. k> j G is a graph ontaining a simpel path of length at least k from a to b g. where n = jV j. a. we an he k in polynomial time whether it is a simple path of length at least k. Sin e there are only n verti es in G. v) 2 E . a. a. Claim: LPATH 2 NP{COMPLETE. a. a. n 1). LPATH = f<G. k) begin mark a repeat k times or until b is marked (whi hever takes fewer iterations) for ea h marked node n not yet expanded. b.rst sear h algorithm whi h runs in polynomial time. (G. b. we show HAM{PATH p LPATH Let (G. Proof:  Given a set of verti es. 97 . b) an be polynomial time redu ed to (G. SPATH(G. where G = (V.  To show that LPATH 2 NP{HARD. expand it a ept if b is marked and reje t otherwise end To expand a node means to mark all nodes ajda ent to it. b. E ). Also re all: u and v are adja ent nodes if (u. That is. we redu e HAM{PATH to it in polynomial time.

Cormen. [CLR90℄ Thomas H. 98 .Bibliography [AMB93℄ Ravindra K. MIT Press.. Charles E. 1993. [Wes96℄ Douglas B. and Ronald L. Prenti e-Hall. 1990. Magnanti. Ahuja. In . 1996. and Orlin James B. Rivest. Network Flows. Introdu ation to Graph Theory. In . Prenti e Hall. West.. Leiserson. Introdu tion to Algorithms. Thomas L. Cambridge.