Cornell University

Boston San Francisco NewYork London Toronto Sydney Tokyo Singapore Madrid Mexico City Munich Paris Cape Toxvn Hong Kong Montreal

Acquisitions Editor: Matt Goldstein Project Editor: Maite Suarez-Rivus Production Supervisor: MariIyn Lloyd Marketing Manager: MichelIe Brown Marketing Coordinator: Yake Zavracky Project Management: Windfall Sofi-tvare Composition: Windfall Software, using ZzTEX Copyeditor: Carol Leyba Technical Illustration: Dartmouth Publishing Proofreader: Jennifer McClain Indexer: Ted Laux Cover Design: Yoyce Cosentino Wells Cover Photo: © 2005 Tim Laman / National Geographic. A pair of weaverbirds work together on their nest in Africa. Prepress and Manufacturing: Caroline Fell Printer: Courier West~ord Access the latest information about Addison-Wesley rifles from our World Wide Web site: http://www.aw-bc.com/computing Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and Addison-Wesley was aware of a trademark claim, the designations have been printed in initial caps or all caps. The programs and applications presented in this book have been included for their instructional value. They have been tested with care, but are not guaranteed for any particular purpose. The publisher does not offer any warranties or representations, nor does it accept any liabilities with respect to the programs or applications. Library of Congress Cataloging-in-Publication Data Kleinberg, Jon. Algorithm design / Jon Kleinberg, l~va Tardos.--lst ed. p. cm. Includes bibliographical references and index. ISBN 0-321-29535-8 (alk. paper) 1. Computer algorithms. 2. Data structures (Computer science) I. Tardos, l~va. II. Title. QA76.9.A43K54 2005 005.1--dc22 Copyright © 2006 by Pearson Education, Inc. For information on obtaining permission for use of material in this work, please submit a written request to Pearson Education, Inc., Rights and Contract Department, 75 Arlington Street, Suite 300, Boston, MA 02116 or fax your request to (617) 848-7047. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or any toher media embodiments now known or hereafter to become known, without the prior written permission of the publisher. Printed in the United States of America. ISBN 0-321-29535-8 2 3 4 5 6 7 8 9 10-CRW-08 07 06 05 2005000401

About the Authors

3on Kleinberg is a professor of Computer Science at Cornell University. He received his Ph.D. from M.I.T. in 1996. He is the recipient of an NSF Career Award, an ONR Young Investigator Award, an IBM Outstanding Innovation Award, the National Academy of Sciences Award for Initiatives in Research, research fellowships from the Packard and Sloan Foundations, and teaching awards from the Cornell Engineering College and Computer Science Department. Kleinberg’s research is centered around algorithms, particularly those concerned with the structure of networks and information, and with applications to information science, optimization, data mining, and computational biology. His work on network analysis using hubs and authorities helped form the foundation for the current generation of Internet search engines. fiva Tardos is a professor of Computer Science at Cornell University. She received her Ph.D. from E6tv6s University in Budapest, Hungary in 1984. She is a member of the American Academy of Arts and Sciences, and an ACM Fellow; she is the recipient of an NSF Presidential Young Investigator Award, the Fulkerson Prize, research fellowships from the Guggenhelm, Packard, and Sloan Foundations, and teaching awards from the Cornell Engineering College and Computer Science Department. Tardos’s research interests are focused on the design and analysis of algorithms for problems on graphs or networks. She is most known for her work on network-flow algorithms and approximation algorithms for network problems. Her recent work focuses on algorithmic game theory, an emerging area concerned with designing systems and algorithms for selfish users.

Contents

About the Authors Preface

v

Introduction: Some Representative Problems I. 1 A First Problem: Stable Matching, 1). 19 Exercises Notes and Further Reading )‘8

2

Basics of Algorithm Analysis 29 2.1 Computational Tractability 29 2.2 Asymptotic Order of Growth 35 2.3 Implementing the Stable Matching Algorithm Using Lists and

2.4 2.5

Arrays 42 A Survey of Common Running Times 47 57 65 Exercises 67 Notes and Fm-ther Reading 70

3

Graphs

73 Basic Definitions and Applications 73 Graph Connectivity and Graph Traversal 78 Implementing Graph Traversal Using Queues and Stacks 87 Testing Bipartiteness: An Application of Breadth-First Search 94 Connectivity in Directed Graphs 97

Contents

Contents 6.4 6.5 6.6 6.7 6.8 6.9 * 6.10 Subset Sums and Knapsacks: Adding a.,~able 266 RNA Secondary Structure: Dynarmc~gramming over Intervals 272 Sequence Alignment 278 Sequence Alignment in Linear Space via Divide and Conquer 284 Shortest Paths in a Graph 290 297 Negative Cycles in a Graph 301 307 Exercises 312 Notes and Further Reading 335

3.6

Directed Acyclic Graphs and Topological Ordering 99 104 Exercises 107 Notes and Further Reading 112 Interval Scheduling: The Greedy Algorithm Stays Ahead 116 Scheduling to Minimize Lateness: An Exchange Argument 125 Optimal Caching: A More Complex Exchange Argument 131 Shortest Paths in a Graph 137 The Minimum Spanning Tree ProbJem 142 Implementing Kruskal’s Algorithm: The Union-Find Data Structure 151 Clustering 157 Huffman Codes and Data Compression 161 Minimum-Cost Arborescences: A Multi-Phase Greedy Algorithm 177 183 Exercises 188 Notes and Further Reading 205
209

4 Greedy Algorithms

4.7 4.8 *4.9

7
The Maximum-Flow Problem and the Ford-FulkersOn Algorithm 338 7.2 Maximum Flows and Minimum Cuts in a Network 346 7.3 Choosing Good Augmenting Paths 352 * 7.4 The Preflow-Push Maximum-Flow Algorithm:, 357 7.5 A First Application: The Bipartite Matching Problem 367 7.6 373 7.7 378 7.8 Survey Design 384 7.9 Airline Scheduling 387 7.!0 Image Segmentation 391 \ 7.11

337

5 Divide and Conquer
5.1 5.2 5.3 5.4 5.5 5.6 A First Recurrence: The Mergesort Algorithm 210 Further Recurrence Relations 214 Counting Inversions 221 Finding the Closest Pair of Points 225 Integer Multiplication 231 234 242 Exercises 246 Notes and Further Reading 249

7.12 Baseball Elimination 400 "7.!3 A Further Direction: Adding Costs to the Matching Problem,~) 404

Solved Exercises 411 Exercises 415 Notes and Further Reading 448 2S1 451 8.1 8.2 8.3 8.4 8.5 8.6 8.7 Polynomial-Time Reductions 452 Reductions via "Gadgets": The Satisfiabflity Problem 459 Efficient Certification and the Definition of NP 463 NP-Complete Problems 466 Sequencing,,Problems 473 Partitioning Problems 481 Graph Coloring 485

6
6.1 6.2

Weighted Interval Scheduling: A Recursive Procedure 252 Principles of Dynamic Programming: Memoization or Iteration over Subproblems 258 6.3 Segmented Least Squares: Multi-way Choices 26~

* The star indicates an optional section. (See the Preface for more information about the relationships among the chapters and sections.)

X

Contents

Contents

8.8 8.9 8.10

Numerical Problems 490 Co-NP and the Asymmetry of NP 495 A Partial Taxonomy of Hard Problems 497 500 Exercises 505 Notes and Further Reading 529 531

11.8

Arbitrarily Good Approximations: The Knapsack Problem 644 649 Exercises 651 Notes and Further Reading 659
661

12 Local Search
12.1 12.2 12.3 12.4 12.5 12.6 12.7

9 PSPACE: A Class of Problems beyond NP
PSPACE 531 Some Hard Problems in PSPACE 533 Solving Quantified Problems and Games in Polynomia! Space 536 Solving the Planning Problem in Polynomial Space 538 543 547 Exercises 550 Notes and Further Reading 551

9.4

The Landscape of an Optimization Problem 662 The Metropolis Algorithm and Simulated Annealing 666 An Application of Local Se_arch to Hopfield Neural Networks 676 Choosing a Neighbor Relation 679 Classification via Local Search 681 690 700 Exercises 702 Notes and Further Reading 705

13
553

10 Extending the Limits of Tractability
10.! 10.2 10.3 * 10.4 * 10.5 Finding Smal! Vertex Covers 554 Solving NP-Hard Problems on Trees 558 Coloring a Set of Circular Arcs 563 Tree Decompositions of Graphs 572 584 591 Exercises 594 Notes and Further Reading 598

11 Approximation Algorithms

599

11.1 Greedy Algorithms and Bounds on the Optimum: A Load Balancing Problem 600 606 11.3 Set Cover: A General Greedy Heuristic 612 11.4 The Pricing Method: Vertex Cover 618 11.5 Maximization via the Pricing Method: The Disioint Paths Problem 624 11.6 Linear Programming and Rounding: An Application to Vertex Cover 630 * 11.7 Load Balancing Revisited: A More Advanced LP Application 637

707 A First Application: Contention Resolution 708 Finding the Global Minimum Cut 714 Random Variables and Their Expectations 719 A Randomized Approximation Algorithm for MAX 3-SAT 724 Randomized Divide and Conquer: Median-Finding and Quicksort 727 13.6 Hashing: A Randomized Implementation of Dictionaries 734 13.7 Finding the Closest Pair of Points: A Randomized Approach 741 13.8 Randomized Caching 750 13.9 Chernoff Bounds 758 13.10 Load Balancing 760 13.1! Packet Routing 762 13.12 Background: Some Basic ProbabiLity Definitions 769 776 Exercises 782 Notes and Further Reading 793 13.1 13.2 13.3 13.4 13.5
795 805 815

Algorithmic !deas are pervasive, and their reach is apparent in examples both within computer science and beyond. Some of the major shifts in Internet routing standards can be viewed as debates over the deficiencies of one shortest-path algorithm and the relative advantages of another. The basic notions used by biologists to express similarities among genes and genomes have algorithmic definitions. The concerns voiced by economists over the feasibility of combinatorial auctions in practice are rooted partly in the fact that these auctions contain computationally intractable search problems as special cases. And algorithmic notions aren’t just restricted to well-known and longstanding problems; one sees the reflections of these ideas on a regular basis, in novel issues arising across a wide range of areas. The scientist from Yahoo! who told us over lunch one day about their system for serving ads to users was describing a set of issues that, deep down, could be modeled as a network flow problem. So was the former student, now a management consultant working on staffing protocols for large hospitals, whom we happened to meet on a trip to New York City. The point is not simply that algorithms have many applications. The deeper issue is that the subject of algorithms is a powerful lens through which to view the field of computer science in general. Algorithmic problems form the heart of computer science, but they rarely arrive as cleanly packaged, mathematically precise questions. Rather, they tend to come bundled together with lots of messy, application-specific detail, some of,it essential, some of it extraneous. As a result, the algorithmic enterprise consists of two fundamental components: the task of getting to the mathematically clean core of a problem, and then the task of identifying the appropriate algorithm design techniques, based on the structure of the problem. These two components interact: the more comfortable one is with the full array of possible design techniques, the more one starts to recognize the clean formulations that lie within messy

xiv

Preface

Preface

XV

problems out in the world. At their most effective, then, algorithmic ideas do not just provide solutions to _well-posed problems; they form the language that lets you cleanly express the underlying questions. The goal of our book is to convey this approach to algorithms, as a design process that begins with problems arising across the full range of computing applications, builds on an understanding of algorithm design techniques, and results in the development of efficient solutions to these problems. We seek to explore the role of algorithmic ideas in computer science generally, and relate these ideas to the range of precisely formulated problems for which we can design and analyze algorithms. In other words, what are the underlying issues that motivate these problems, and how did we choose these particular ways of formulating them? How did we recognize which design principles were appropriate in different situations? In keeping with this, our goal is to offer advice on how to identify clean algorithmic problem formulations in complex issues from different areas of computing and, from this, how to design efficient algorithms for the resulting problems. Sophisticated algorithms are often best understood by reconstructing the sequence of ideas--including false starts and dead ends--that led from simpler initial approaches to the eventual solution. The result is a style of exposition that does not take the most direct route from problem statement to algorithm, but we feel it better reflects the way that we and our colleagues genuinely think about these questions.

intelligence (planning, game playing, Hopfield networks), computer vision (image segmentation), data mining (change-point detection, clustering), operations research (airline scheduling), and computational biology (sequence alignment, RNA secondary structure). The notion of computational intractability, and NP-completeness in particular, plays a large role in the book. This is consistent with how we think about the overall process of algorithm design. Some of the time, an interesting problem arising in an application area will be amenable to an efficient solution, and some of the time it will be provably NP-complete; in order to fully address a new algorithmic problem, one should be able to explore both of these ol)tions with equal familiarity. Since so many natural problems in computer science are NP-complete, the development of methods to deal with intractable problems has become a crucial issue in the study of algorithms, and our book heavily reflects this theme. The discovery that a problem is NPcomplete should not be taken as the end of the story, but as an invitation to begin looking for approximation algorithms, heuristic local search techniques, or tractable special cases. We include extensive coverage of each of these three approaches.

Problems and Solved Exercises
An important feature of the book is the collection of problems. Across all chapters, the book includes over 200 problems, almost a!l of them developed and class-tested in homework or exams as part of our teaching of the course at Cornell. We view the problems as a crucial component of the book, and they are structured in keeping with our overall approach to the material. Most of them consist of extended verbal descriptions of a problem arising in an application area in computer science or elsewhere out in the world, and part of the problem is to practice what we discuss in the text: setting up the necessary notation and formalization, designing an algorithm, and then analyzing it and proving it correct. (We view a complete answer to one of these problems as consisting of all these components: a fl~y explained algorithm, an analysis of the nmning time, and a proof of correctness.) The ideas for these problems come in large part from discussions we have had over the years with people working in different areas, and in some cases they serve the dual purpose of recording an interesting (though manageable) application of algorithms that we haven’t seen written down anywhere else. To help with the process of working on these problems, we include in each chapter a section entitled "Solved Exercises," where we take one or more problems and describe how to go about formulating a solution. The discussion devoted to each solved exercise is therefore significantly longer than what would be needed simply to write a complete, correct solution (in other words,

Overview
The book is intended for students who have completed a programmingbased two-semester introductory computer science sequence (the standard "CS1/CS2" courses) in which they have written programs that implement basic algorithms, manipulate discrete structures such as trees and graphs, and apply basic data structures such as arrays, lists, queues, and stacks. Since the interface between CS1/CS2 and a first algorithms course is not entirely standard, we begin the book with self-contained coverage of topics that at some institutions a_re familiar to students from CS1/CS2, but which at other institutions are included in the syllabi of the first algorithms course. This material can thus be treated either as a review or as new material; by including it, we hope the book can be used in a broader array of courses, and with more flexibility in the prerequisite knowiedge that is assumed. In keeping with the approach outlined above, we develop the basic algorithm design techniques by drawing on problems from across many areas of computer science and related fields. To mention a few representative examples here, we include fairly detailed discussions of applications from systems and networks (caching, switching, interdomain routing on the Internet), artificial

As noted earlier.fication of a precise solution. "Designing the Algorithm. and one is PSPACE-complete. the book has a number of further pedagogical features. there are undoubtedly errors that have remained in the final version. xvii significantly longer than what it would take to receive full credit if these were being assigned as homework problems). from which one can abstract an interesting problem statement and a surprisingly effective algorithm to solve this problem. together with standard applications in which they arise. In cases where extensions to the problem or further analysis of the algorithm is pursued. and offers a guide to commordy occurring functions in algorithm analysis. these sections are consistently structured around a sequence of subsections: "The Problem. It begins with an informal overview of what it means for a problem to be computationally tractable." which proves properties of the algorithm and analyzes its efficiency. together with the concept of polynomial time as a formal notion of efficiency.aw. Pedagogical Features and Supplements In addition to the Problems and solved exercises.edu. the discussions in these sections should be viewed as trying to give a sense of the larger process by which one might think about problems of this type. one by dynamic programming. It then discusses growth rates of functions and asymptotic analysis more formally. but one is solvable bya greedy algorithm. as with the rest of the text. search the site for either "Kleinberg°’ or "Tardos" or contact your local Addison-Wesley representative. but it is valuable to present the material here in a broader algorithm design context. To reflect this style. Rather. one by network flow. In particular. the problems are sequenced roughly in order of increasing difficulty. as in any book of this length. and then to actually design the necessary algorithm. we have tended to assign roughly three of these problems per week. These subsections are highlighted in the text with an icon depicting a feather. Chapter-by-Chapter Synopsis Chapter I starts by introducing some representative algorithmic problems. and "Analyzing the Algorithm. developed by Kevin Wayne of Princeton University." where the appropriate design technique is employed to develop an algorithm. A number of supplements are available in support of the book itself. one (the Independent Set Problem itself) is NP-complete. both to relate the problem description to the algorithmic techniques in the chapter.com. The remainder of Chapter 1 discusses a list of five "representative problems" that foreshadow topics from the remainder of the course. please include the word "feedback" in the subject line of the message. but this is only an approximate guide and we advise against placing too much weight on it: since the bulk of the problems were designed as homework for our undergraduate class. The fact that closely related problems can vary greatly in complexity is an important theme of the book. graph traversal techniques such as breadth-first search and depth-first search. is also available. at the address algbook@cs. aside from the lowest-numbered ones. since we feel it sets up the basic issues in algorithm design more concretely and more elegantly than any abstract discussion could: stable matching is motivated by a natural though complex real-world issue. First. we discuss basic graph definitions. For instructions on obtaining a professor . the problems are designed to involve some investment of time. and these five problems serve as milestones that reappear as the book progresses. We begin immediately with the Stable Matching Problem. these slides follow the order of the book’s sections and can thus be used as the foundation for lectures in a course based on the book. In our undergraduate class. Chapter 3 covers the basic definitions and algorithmic primitives needed for working with graphs. there are additional subsections devoted to these issues." where the problem is described and a precise formulation is worked out. The goal of this structure is to offer a relatively uniform style of presentation that moves from the initial discussion of a problem arising in a computing application through to the detailed analysis of a method to solve it. In particular. and directed graph concepts including strong connectivity and topological ordering. as wel! as the motivating principles behind them. as well as additional supplements to facilitate its use for teaching. Second. which are central to so many of the problems in the book. culminating in the speci. we would appreciate receiving feedback on the book. providing fi~ solutions to each. These five problems are interrelated in the sense that they are all variations and/or special cases of the Independent Set Problem. large subsets of the problems in each chapter are really closely comparable in terms of difficulty.cornell. Chapter 2 introduces the key mathematical definitions and notations used for analyzing algorithms. A set of lecture slides. Chapters 2 and 3 cover the interface to the CS1/CS2 course sequence mentioned earlier. An instructor’s manual works through al! the problems. Comments and reports of errors can be sent to us by e-mail. It is worth mentioning two points concerning the use of these problems as homework in a course. These files are available at wunv.xvi Preface Preface login and password. Finally. A number of basic graph algorithms are often implemented by students late in the CS1/CS2 course sequence. a large number of the sections in the book axe devoted to the formulation of an algorithmic problem--including its background and underlying motivation--and the design and analysis of an algorithm for this problem.

for shortest paths. students are often left without an appreciation for the wide range of problems to which it can be applied. We also consider types of computational hardness beyond NPcompleteness. For divide and conquer. with connections to Internet routing protocols. and a number of other problems. This is a topic on which several nice graduate-level books have been written. and compression. Our chapter on approximation algorithms discusses both the process of designing effective algorithms and the task of understanding the optimal solution well enough to obtain good bounds on it. and we also include some cases in which guarantees can be proved. Thus. We devote most of our attention to NP-completeness. This chapter concludes with some of the main applications of greedy algorithms. To the extent that network flow is covered in algorithms courses. in cases where there is more than one . While this topic is more suitable for a graduate course than for an undergraduate one. Chapters 10 through 12 cover three maior techniques for dealing with computationally intractable problems: identification of structured special cases. image segmentation. Finally. With greedy algorithms. although many of the data structures covered herewill be familiar to students from the CS1/CS2 sequence. Finally. we cover algorithms for network flow problems. and shortest paths in graphs. dynamic programming. the computation of c!osest pairs of points in the plane. because very little is known in the way of provable guarantees for these algorithms. but it can also be used as the basis for an introductory graduate course. given their widespread use in practice. particularly through the topic of PSPACE-completeness. clustering. Our chapter on tractable special cases emphasizes that instances of NP-complete problems arising in practice may not be nearly as hard as worst-case instances. we then show. we discuss local search heuristics. and the Fast Fourier Transform. Use of the Book The book is primarily designed for use in a first undergraduate course on algorithms. we spend roughly one lecture per numbered section. Chapters 4 through 7 cover four major algorithm design techniques: greedy algorithms. scheduling. This topic is often missing from undergraduate algorithms courses. Chapter 13 covers the use of randomization in the design of algorithms. and subsequently building up more and more expressive recurrence formulations through applications in which they naturally arise. including the comparison of rankings. we begin with a discussion of strategies for solving recurrence relations as bounds on running times. we feel it is valuable for students to know something about them. including the Metropolis algorithm and simulated annealing. more advanced data structures are presented in subsequent chapters. and we conclude with an extended discussion of tree decompositions of graphs. with guidance on how one goes about constructing a difficult ~reduction. and network flow. linear programming. As design techniques for approximation algorithms. organizing the basic NP-complete problems thematically to help students recognize candidates for reductions when they encounter new problems. approximation algorithms. Next we develop dynamic programming by starting with the recursive intuition behind it. however. because they often contain some structure that can be exploited in the design of an efficient algorithm. we focus on greedy algorithms. Our goal here is to provide a more compact introduction to some of the ways in which students can apply randomized techniques using the kind of background in probability one typically gains from an undergraduate discrete math course. Chapters 8 and 9 cover computational intractability. undirected and directed spanning trees. we try to do iustice to its versatility by presenting applications to load balancing. divide and conquer. and local search heuristics. This chapter concludes with extended discussions of the dynamic programming approach to two fundamental problems: sequence alignment. it is a technique with considerable practical utility for which it is hard to find an existing accessible reference for students. devoting much of our focus in this chapter to discussing a large array of different flow applications.Preface Preface Chapters 2 and 3 also present many of the basic data structures that will be used for implementing algorithms throughout the book. We find this is a valuable way to emphasize that intractability doesn’t end at NP-completeness. our coverage of this topic is centered around a way of classifying the kinds of arguments used to prove greedy algorithms correct. how familiarity with these recurrences can guide thedesign of algorithms that improve over straightforward approaches to a number of basic problems. with applications in computational biology. the challenge is to recognize when they work and when they don’t. We build up to some fairly complex proofs of NPcompleteness. We illustrate how NP-complete problems are often efficiently solvable when restricted to tree-structured inputs. Our approach to data structures is to introduce them as they are needed for the implementation of the algorithms being developed in the book. anda third method we refer to as "pricing:’ which incorporates ideas from each of the first two. When we use the book at the undergraduate level. and PSPACE-completeness also forms the underpinning for some central notions from artificial intelligence--planning and game playing-that would otherwise not find a place in the algorithmic landscape we are surveying. our focus is on these data structures in the broader context of algorithm design and analysis.

" and hence off-limits to undergraduate algorithms courses. as the field has grown. Sections 11.1-12. Alex Slivkins. and comments on the text.8. 5. researchers. Mike Molloy. 7. Sections 12. For this type of course. the book can be used to support self-study by graduate students. we treat this extra material as a supplement that students carl read about outside of lecture. cover all of Chapter 7 (moving more rapidly through the early sections).6. Alexander Druyan. We thank our undergraduate and graduate teaching assistants. Our own undergraduate course involves material from all these chapters. Justin Yang. Shaddin Doghmi. we tend to skip Sections 4.6. as discussed above. Henry Lin. Alexa Sharp. Finally. 7. over a number of years. Anna Karlin fearlessly adopted a draft as her course textbook at the University of Washington when it was st~ in an early stage of development. We also thank all the students in these classes who have provided comments and feedback on early drafts of the book over the years. John Hopcroft. Jon Peress. The resulting syllabus looks roughly as follows: Chapter 1. Finally. Bill McCloskey. Amit Kumar.6. Shan-Leung Maverick Woo. as we feel that all of these topics have an important place at the undergraduate level.10). Ariful Gan~. A number of graduate students and colleagues have used portions of the book in this way. and 7. Yuval Rabani. These courses have grown. Yeongwee Lee. we have designed them with the goal that the first few sections of each should be accessible to an undergraduate audience. xxi lecture’s worth of material in a section (for example. Chapter 10.2. Vladimir Dizhoor. 11. 7.2. Chapter 9 (briefly). Martin P~il.1. and 7. while these sections contain important topics. The book also naturally supports an introductory graduate course on algorithms. We cover roughly half of each of Chapters 11-13. 6. Gene Kleinberg. Mike Connor._ Vadim Grinshpun.5. Ara Hayrapetyan.!1). Sections. Matt Piotrowski.7-4. we cover the later topics in Chapters 4 and 6 (Sections 4. Sergei Vassilvitskii.7-4.. Chaitanya Swamy. John Bicket. Allan Borodin. Travis Ortogero. Ralph Benzinger. Many of them have provided valuable insights. 6. Monika Henzinger. More generally. David Richardson. the development of the book has benefited greatly from the feedback and advice of colleagues who have used prepublication drafts for teaching.13). and Chapter 13. Venu Ramasubramanian. David Kempe. This last point is worth emphasizing: rather than viewing the later chapters as "advanced. Matthew Wachs. quickly cover NPcompleteness in Chapter 8 (since many beginning graduate students will have seen this topic as undergraduates). Elliot Anshelevich. Tina Nolte. or computer professionals who want to get a sense for how they . Mike Priscott. when a section provides further applications as additional examples). Shanghua Teng. We also tend to skip one or two other sections per chapter in the first half of the book (for example. 6.10.4.6. Although our focus in an introductory graduate course is on the more advanced sections. Sections 13. Tom Wexler. Nadya Travinin. 11. Xin Qi.5-6.9 and 6. Dexter Kozen. Dieter van Melkebeek. Niranjan Nagarajan. Alexei Kopylov.6.5.8.5. Sebastian Sllgardo. and Misha Zatsman. Chapter 11. 7. Chapter 12. including Juris Hartmanis. and then spend the remainder of the time on Chapters !0-13. Sasha Evfimievski. Rie Ando. suggestions. Brian Kulis. and 13. Igor Kats. 4. Chapters 4-8 (excluding 4. Joe Polastre. Dexter Kozen. Perry Tam. Amit Kumar. Steve Baker.5-4. Evan Moran. we would like to thank al! our colleagues at Corne!l for countless discussions both on the material here and on broader issues about the nature of the field. and Sam Toueg. Tim Roughgarden.Preface Preface might be able to use particular algorithm design techniques in the context of their own work. Tom Wexler.3. The course staffs we’ve had in teaching the subject have been tremendously helpful in the formulation of this material. Kevin Wayne. Siddharth Alexander. 5. they are less central to the development of the subject. Doug Burdick. Leonid Meyerguz. given the range of different undergraduate backgrounds among the students in such a course. 4. and in some cases they are harder as well. Bowei Du. Devdatt Dubhashi. Brian Sabino.5-5. Lars Backstrom.1- Acknowledgments This book grew out of the sequence of algorithms co~ses that we have taught at Cornell.3.9. but. and 11.10.5-5.! and 10. we find it usefifl for the students to have the full book to consult for reviewing or filling in background knowledge. she was followed by a number of people who have used it either as a course textbook or as a resource for teaching: Paul Beame. Here we find the emphasis on formulating problems to be useful as well. the use of these two chapters depends heavily on the relationship of each specific course to its prerequisites. We skip the starred sections. Chris Jeuell. Ayan Mandal. Rachit Siamwalla. Omar Khan£ Mikhail Kobyakov. since students will soon be trying to define their own research problems in many different subfields. Ashwin Machanavajjhala. Aravind Srinivasan. Our view of such a course is that it should introduce students destined for research in all different areas to the important current themes in algorithm design. and they reflect the influence of the Comell faculty who helped to shape them during this time. Ronitt Rubinfeld.11.3. we treat Chapters 2 and 3 primarily as a review of material from earlier courses. For the past several years. Aditya Rao.

we thank Matt Goldstein for all his advice and guidance in this process. Duncan Watts. We thank Matt and Susan. we thank our families--Lillian and Alice. Ted Laux for the indexing. and management of the proiect. Dieter van Melkebeek (University of Wisconsin. but we especially thank Yufi Boykov. Maite Suarez-Rivas at Addison Wesley. and in other ways it has remained the same: the driving excitement that has characterized the field since its early days is as strong and enticing as ever. Baltimore County). and Carol Leyba and Jennifer McClain for the copyedifing and proofreading. several years after the hype and stock prices have come back to earth. Edgar Ramos (University of Illinois. Bobby Kleinberg. This book was begun amid the irrational exuberance of the late nineties. in the final version of the text. In a number of other cases. Stephan Olariu (Old Dominion University). We appreciate their support. (It was probably iust in our imaginations. Mohan Paturi (UC San Diego). Kevin Compton (University of Michigan). . and Ramin Zabih. Dan Huttenlocher. Evie Kleinberg. and Paul Anagnostopoulos and Jacqui Scarlott at Windfall Software. Sariel Har-Peled (University of Illinois. Arlington). patience. Prabhakar Raghavan. our approach to particular topics in the book reflects the infuence of specific colleagues. Gainesville). Sanjeev Khanna (University of Pennsylvania). Madison). and. both large and small. Patty Mahtani. We thank Joyce Wells for the cover design. Urbana-Champaign). one can appreciate that in some ways computer science was forever changed by this period. drawn to it for so many different reasons. and many other contributions more than we can express in any acknowledgments here. their comments led to numerous improvements. First and foremost. production. Our early conversations about the book with Susan Hartman were extremely valuable as well. to many of us. and the reach of computing continues to extend into new disciplines. we hope you find this book an enjoyable and useful guide wherever your computational pursuits may take you. Michael Mitzenmacher (Harvard University). Sanjay Ranka (University of Florida. We deeply appreciate their input and advice. and for helping us to synthesize a vast amount of review material into a concrete plan that improved the book. Diane Cook (University of Texas. and Amy. David Matthias (Ohio State University). Many of these contributions have undoubtedly escaped our notice. Urbana-Champaign). which promises to greatly extend its utility to future instructors. the public’s fascination with information technology is still vibrant. which has informed many of our revisions to the content. 2005 Finally.) Now. Jon Kleinberg gva Tardos Ithaca. Ron Elber. briefly to pass through a place traditionally occupied by celebrities and other inhabitants of the pop-cultural firmament. Nancy Murphy of Dartmouth Publishing for her work on the figures. We fln-ther thank Paul and Jacqui for their expert composition of the book. St~ve Strogatz. And so to all students of the subject. It has been a pleasure working with Addison Wesley over the past year. together with Michelle Brown. Adam Meyerson (UCLA). Bart Selman. We would like to additionally thank Kevin Wayne for producing supplementary material associated with the book. Richard Chang (University of Maryland. Philip Klein (Brown University). for all their work on the editing. and Bulent Yener (Rensselaer Polytechnic Institute) who generously contributed their time to provide detailed and thoughtful reviews of the manuscript. David McAllester. Rebecca. and David. Leon Reznik (Rochester Institute of Technology). when the arc of computing technology seemed. Marilyn Lloyd. Lillian Lee. Olga Veksler. David Shmoys.xxii Preface Preface xxiii Sue Whitesides. Mark Newman. Subhash Suri (UC Santa Barbara). We thank Anselm Blumer (Tufts University).

1. Based on these preferences. and most of our work will be spent in proving that it is correct and giving an acceptable bound on the amount of time it takes to terminate with an answer. when David Gale and Lloyd Shapley. The algorithm to solve the problem is very clean as well. or a job recruiting process. that was self-enforcing? What did they mean by this? To set up the question. in 1962. It is motivated by some very natural and practical concerns. and from these we formulate a clean and simple statement of a problem. and each company--once the applications Come in--forms a preference ordering on its applicants. and people begin heading off to their summer internships. applicants choose which of their offers to accept. . companies extend offers to some of their applicants. The crux of the application process is the interplay between two different types of parties: companies (the employers) and students (the applicants). begin applying to companies for summer internships. in part. asked the question: Could one design a college admissions process. Each applicant has a preference ordering on companies. ~ The Problem The Stable Matching Problem originated. all iurdors in college majoring in computer science. two mathematical economists. we look at an algorithmic problem that nicely illustrates many of the themes we will be emphasizing. let’s first think informally about the kind of situation that might arise as a group of friends.1 A First Problem: Stable Matching As an opening topic. The problem itself--the Stable Matching Problem--has several origins.

calls up the people at WebExodus and says. at least initially. I’m happy where I am. This is one testament to the problem’s fundamental appeal. where there is only a single gender.1 1 Gale and Shapley considered the same-sex Stable Matching Problem as well. We might well prefer the following. so we’re afraid there’s nothing we can do. Consider another student. can we assign applicants to employers so that for every employer E." In such a case. Each applicant is looking for a single company. at least one of the following two things is the case? (i) E prefers every one of its accepted applicants to A. and many people-both applicants and employers--can end up unhappy with the process as well as the outcome. earnestly following up with its top applicants who went elsewhere. CluNet offers a job to one of its wait-listed applicants. with the same underlying motivation. they realize that they would have rather hired her than some other student who actually is scheduled to spend the summer at WebExodus. is still in use today.2 Chapter 1 Introduction: Some Representative Problems 1. If this holds. and every applicant A who is not scheduled to work for E. This is motivated by related applications. on looking at Chelsea’s application. And from the point of view of this book. in particular. fewer) applicants than there are available slots for summer iobs. with relatively little change. too. Now. I’d really rather spend the summer with you guys than at Babelsoft. A few days later. for example. calls up Rai and offers him a summer iob as well. Things look just as bad. which had been dragging its feet on making a few final decisions. we’ll be focusing on the version with two genders. ! A First Problem: Stable. who promptly retracts his previous acceptance of an offer from the software giant Babelsoft. let’s note that this is not the only origin of the Stable Matching Problem. Situations like this can rapidly generate a lot of chaos. unbeknownst to them. "You know. Gale and Shapley proceeded to develop a striking algorithmic solution to this problem. our solution to this simplified version will extend directly to the more general case as well. everyone is seeking to be paired with exactly one individual of the opposite gender. Rai actually prefers WebExodus to CluNet--won over perhaps by the laid-back. and the situation begins to spiral out of control. if not worse. to match residents to hospitals. to eliminate these complications and arrive at a more "bare-bones" version of the problem: each of n applicants applies to each of n companies. "No. Suddenly down one summer intern." They find this very easy to believe. the outcome is stable: individual self-interest will prevent any applicant/employer deal from being made behind the scenes. from the other direction. being told by each of them. based on the offers already accepted. Formulating the Problem To get at the essence of this concept. it helps to make the problem as clean as possible. Finally.Matching Gale and Shapley considered the sorts of things that could start going wrong with this process. But in this case. the National Resident Matching Program had been using a very similar procedure. as is sometimes the case. it turns out that we prefer each of the students we’ve accepted to you. but it turns out to be fairly different at a technical level. or (ii) A prefers her current situation over working for employer E. We will see that doing this preserves the fundamental issues inherent in the problem. Suppose that Raj’s friend Chelsea. It is useful. they are able to reply. moreover. this system. Indeed." Or consider an employer. there may be more (or. each applicant does not typica!ly apply to every company. It turns out that for a decade before the work of Gale and Shapley. the small start-up company WebExodus. Given the applicant-employer application we’re considering here. who has arranged to spend the summer at CluNet but calls up WebExodus and reveals that he. destined to go to Babelsoft but having just heard Raj’s story. Following Gale and Shapley. we observe that this special case can be viewed as the problem of devising a system by which each of n men and n women can end up getting married: our problem naturally has the analogue of two "genders"--the applicants and the companies--and in the case we are considering. in the absence of any mechanism to enforce the status quo. it might well find some way to retract its offer to this other student and hire Chelsea instead. What has gone wrong? One basic problem is that the process is not self-enforcing--if people are allowed to act in their self-interest. In this case. . So this is the question Gale and Shapley asked: Given a set of preferences among employers and applicants. more stable situation. but each company is looking for many applicants. and each company wants to accept a single applicant. that your friend Raj has iust accepted a summer job at the large telecommunications company CluNet. anything-can-happen atmosphere--and so this new development may well cause him to retract his acceptance of the CluNet offer and go to WebExodus instead. which we will discuss presently. if WebExodus were a slightly less scrupulous company. and furthermore. in which selfinterest itself prevents offers from being retracted and redirected. Before doing this. would rather work for them. The world of companies and applicants contains some distracting asymmetries. "No. then it risks breaking down. all the outcomes are stable--there are no further outside deals that can be made. Suppose. it provides us with a nice first domain in which to reason about some basic combinatorial definitions and the algorithms that build on them.

. prefers Iv to IV’. Ivn} of n women. Let us consider some of the basic ideas that. w’) is also stable.. Now we can add the notion of preferences to this setting. but each of m and Iv’ prefers the other to their partner in S. Iv’) is an instability with respect to S: (m. Iv) and (m. consisting of the pairs (m’..) Next. in such a way that everyone ends up married to somebody. m’ prefers Iv’ to Iv. would not be a stable matching. If we think about this set of preference lists intuitively. In the present situation. This is an important point to remember as we go forward--it’s possible for an instance to have more than one stable matching. Each man m ~ M ranks all the women. A matching S is a set of ordered pairs.1 A First Problem: Stable Matching I~ n instability: m and w’~ each prefer the other to eir current partners. they arise naturally in modeling a wide range of algorithmic problems. and the women agree on the order of the men. for the complementary reason that both women are as happy as possible.1) with the property that m prefers w’ to Iv. ran} of n men. we should be worried about the following situation: There are two pairs (m. m’}. On the other hand. there’s nothing to stop m and Iv’ from abandoning their current partners and heading off together. ~:~ Designing the Algorithm we now show that there exists a stable matching for every set of preference lists among the men and women. each from M x W. We will refer to the ordered ranking of m as his preference list. here’s an example where things are a bit more intricate. Iv). But the men’s preferences clash completely with the women’s preferences. Iv’}. Our goal. it would be . w’) is stable. and (ii) there is no instability with respect to S. and a set of two women. {iv. Iv’). The matching consisting of the pairs (m. What’s going on in this case? The two men’s preferences mesh perfectly with each other (they rank different women first). Iv prefers m to m’. Iv) would form an instability with respect to this matching. We wil! not allow ties in the ranking.motivate the algorithm. Can we declare immediately that (m. ranks all the men. w) and (m.. and Iv’ prefers m to m’. can we efficiently construct a stable matching if there is one? Some Examples To illustrate these definitions. Suppose the preferences are m prefers Iv to Iv’. Iv’ prefers m to m’. Suppose an unmarried man m chooses the woman Iv who ranks highest on his preference list and proposes to her. and the two women’s preferences likewise mesh perfectly with each other. In this second example. Given a perfect matching S. The other perfect matching. so neither would leave their matched partner. Each woman. everyone is unmarried. what can go wrong? Guided by our initial motivation in terms of employers and applicants. to’) in S (as depicted in Figure 1. Two questions spring immediately to mind: Does there exist a stable matching for every set of preference lists? Given a set of preference lists.1 Perfect matching S with instability (m. So consider a set M = {m1 . We’ll say that a matching S is stable if (i) it is perfect. But the matching consisting of the pairs (m’. A perfect matching S’ is a matching with the property that each member of M and each member of W appears in exactly one pair in S’. Iv prefers m’ to m. Moreover. a man m’ whom Iv prefers may propose to her. there are two different stable matchings. we will say that m prefers Iv to Iv’ if m ranks Iv higher than Iv’. with the property that each member of M and each member of W appears in at most one pair in S. Iv’) does not belong to S. Iv) and (m’. because both men are as happy as possible. Iv’ prefers m to m’. w’). where m ~ M and Iv ~ W. is a set of marriages with no instabilities. our means of showing this will also answer the second question that we asked above: we will give an efficient algorithm that takes the preference lists and constructs a stable matching. (Both m and Iv would want to leave their respective partners and pair up. Initially. consisting of the pairs (m.. Figure 1. Iv) wii1 be one of the pairs in our final stable matching? Not necessarily: at some point in the future. consider the following two very simple instances of the Stable Matching Problem. suppose we have a set of two men. We’ll say that such a pair (m. First. Iv’). and nobody is married to more than one person--there is neither singlehood nor polygamy. The preference lists are as follows: prefers Iv to Iv’.. the set of marriages is not selfenforcing. it represents complete agreement: the men agree on the order of the women. fro. a perfect matching corresponds simply to a way of pairing off the men with the women.. In this case. and a set W = {iv1 . Iv) and (m’.. Let M x W denote the set of all possible ordered pairs of the form (m. There is a unique stable matching here.4 Chapter 1 Introduction: Some Representative Problems !. w) and (m’. then. analogously. Matchings and perfect matchings are objects that will recur frequently throughout the book. because the pair (m.

she may never receive a proposal from someone she ranks as highly as m. It follows that there can be at most n2 iterations. as we are trying to do here. If w is also free.2 An intermediate state of the G-S algorithm when a free man ra is proposing to a woman w. and the resulting perfect matchdng is returned. there are many quantities that would not have worked well as a progress measure for the algorithm. accepting those that increase the rank of her partner. with Figure 1.~) become engaged nlI becomes free Endif Endif Endwhile Return the set S of engaged pairs (1. Here is a concrete description of the Gale-Shapley algorithm. Finally. So if we let ~P(t) denote the set of pairs (m. As time goes on. then m and w become engaged. ~) become engaged Else ~ is currently engaged to m’ If ~ prefers m’ to m then m remains free Else w prefers m to m’ (m. In this case. this man becomes engaged to w and the other becomes flee. so this analysis is not far from the best possible. however. and the sequence of partners to which she is engaged gets better and better (in terms of her preference list). Otherwise.3) The G-S algorithm terminates after at most n2 iterations of the While loop.1 A First Problem: Stable Matching 7 I~ oman w will become~ ngaged to m if she | refers him to rat J © © © dangerous for w to reject m right away. she determines which of m or m’ ranks higher on her preference list. The next step could look like this. it is not immediately obvious that it returns a stable matching.) can increase at most n2 times over the course of the algorithm. and she becomes engaged. through a sequence of intermediate facts. and he proposes to her. He is free until he proposes to the highest-ranked woman on his list. and give a bound on the maximum number of iterations needed for termination. we seek some precise way of saying that each step taken by the algorithm brings it closer to termination. he may alternate between being free and being engaged. at this moment. no one has proposed to her. . As time goes on. Second. w is already engaged to some other man m’. [] Two points are worth noting about the previous fact and its proof. the algorithm wil! terminate when no one is free. First. (1. w) such that m has proposed to w by the end of iteration t. the size of ~P(t + 1) is strictly greater than the size of ~P(t). at this point he may or may not become engaged. The view of a man m during the execution of the algorithm is rather different. Namely. all engagements are declared final. each iteration consists of some man proposing (for the only time) to a woman he has never proposed to before.1) w remains engaged /Tom the point at which she receives her first proposal. ~ Analyzing the Algorithm First consider the view of a woman w during the execution of the algorithm. An arbitrary flee man m chooses the highest-ranked woman w to whom he has not yet proposed.2 depicting a state of the algorithm. w) enter an intermediate state--engagement. We proceed to prove this now. or even a perfect matching. But there are only n2 possible pairs of men and women in total. she may receive additional proposals. since they need not strictly increase in each An intriguing thing is that. (1. A useful strategy for upper-bounding the running time of an algorithm.6 Chapter 1 Introduction: Some Representative Problems 1. Initially all m E M and w E W are free While there is a man m who is free and hasn’t proposed to every woman Choose such a man m Let w be the highest-ranked woman in m’s preference list to whom m has not yet proposed If ~ is free then (m. although the G-S algorithm is quite simple to state. Figure 1. there are executions of the algorithm (with certain preference lists) that can involve close to n2 iterations. So a natural idea would be to have the pair (m. Suppose we are now at a state in which some men and women are/Tee-not engaged--and some are engaged. Now we show that the algorithm terminates. so the value of ~P(. For a while.2) The sequence of women to whom m proposes gets worse and worse (in terms of his preference list). we see that for all t. and she is free. In the case of the present algorithm. is to find a measure of progress. Then a man m may propose to her. Proof. So we discover the following. the following property does hold.

~ prefers w’ to w. Let us now establish that the set S returned at the termination of the algorithm is in fact a perfect matching. by (1.4).. recall that we saw an example earlier in which there could be multiple stable matchings. contxadicting our assumption that m prefers w’ to w. so this is a contradiction. the other stable matching. we have to show that no man can "fall off" the end of his preference list. the only way for the ~’h±]. w’ prefers her final partner m~ to m". by definition. and o w’ prefers m to mL In the execution of the algorithm that produced S. Proof.6) Consider an executionof the G-S algorithm that returns a set of pairs S. Thus. . Now we ask: Did m propose to w’ at some earlier point in Now. then in all runs of the G-S algorithm all men end up matched with their first choice. each of the n women is engaged at this point in time. So this simple set of preference lists compactly summarizes a world in which someone is destined to end up unhappy: women are unhappy if men propose. The set of engaged pairs always forms a matching. Since the set of engaged pairs forms a matching. In this case. the preference lists in this example were as follows: prefers w to w’. that S is a perfect matching. Thus. Proof. to prove S is a stable matching. in S with the properties that o m prefers w’ to w. This example shows a certain "unfairness" in the G-S algorithm.8 Chapter 1 Introduction: Some Representative Problems 1. many of them not achievable by any natural algorithm.5). . there must also be n engaged men at this point in time. m will become engaged to w. We now consider some further questions about the behavior of the G-S algorithm and its relation to the properties of different stable matchings.1). w) and (m. then the resulting stable matching is as bad as possible for the women. m’ is the final partner of w’. (1.list than w’. If the men’s preferences mesh perfectly (they all list different women as their first choice). either way this contradicts our assumption that w’ prefers m to mI. in (1. it must be the case that m had already proposed to every woman. If the women’s preferences clash completely with the men’s preferences (as was the case in this example). we prove the main property of the algorithm--namely. then he was rejected by w’ in favor of some other man m".e loop to exit is for there to be no flee man.iterations. is not attainable from an execution of the G-S algorithm in which the men propose. m’s last proposal was. favoring men.4) If m is free at some point in the execution of the algorithm..1 A First Problem: Stable Matching 9 iteration. (m. Thus. with more than two people on each side. whom w’ prefers to m. Then by (1. it would be reached if we ran a version of the algorithm in which the women propose. we will assume that there is an instability with respect to S and obtain a contradiction. Finally. Let us suppose that the algorithm terminates with a flee man m. [] Extensions We began by defining the notion of a stable matching. It follows that S is a stable matching. The set S is a stable matching. we have just proven that the G-S algorithm actually constructs one. prefers m~ to m. prefers m to m’. these quantities could not be used directly in giving an upper bound on the maximum possible number of. Suppose there comes a point when m is flee but has already proposed to every woman.~) The set S returned at termination is a peryect matching. On the other hand. We have already seen. which says that there cannot be a flee man who has proposed to every woman. Why is this not immediately obvious? Essentially. that it results in a stable matching. then w must occur higher on m’s preference. As defined earlier. . m’ will become engaged to w’ (perhaps in the other order). and men are unhappy if women propose. So the main thing we need to show is the following. this execution? If he didn’t. as could the number of engaged pairs. But there are only n men total. If he did. to w. Let’s now analyze the G-S algorithm in more detail and try to understand how general this "unfairness" phenomenon is. and m is not engaged. such an instability would involve two pairs. for otherwise the ~qhile loop would not have exited.. in any execution of the Gale-Shapley algorithm. and things will stop there. then there is a woman to whom he has not yet proposed. w’). in the style of the previous paragraph. the set of engaged couples would indeed be a perfect matching. But this contradicts (1. For example. we can have an even larger collection of possible stable matchings. consisting of the pairs (m’. To begin wit_h. And in larger examples. w) and (m’. w’). To recap. so either m" = m’ or. (1. (1. independent of the preferences of the women. Proof. At termination.!). the number of free individuals could remain constant from one iteration to the next.

by way of contradiction. there exists a stable matching S’ containing the pair (m. Again. the independent components may not be men and women but electronic components activating parts of an airplane wing. After all. it must be that iv is m’s best valid partner best(m).1 A First Problem: Stable Matching 11 To begin With. the effect of asynchrony in their behavior can be a big deal. since men propose in decreasing order of preference. it must be that m’ prefers iv to iv’. at this moment iv forms or continues an engagement with a man m’ whom she prefers to m. We say that m is the ivorst valid partner of iv if m is a valid partner of w. Now. this means that some man is rejected by a valid partner during the execution g of the algorithm. [] So for the men. iv) in S* such that m is not the worst valid partner of iv. We proceed to prove this now. and we want to know how much variability this asynchrony causes in the final outcome. Since iv is a valid parmer of m. The reiection of m by iv may have happened either because m proposed and was turned down in favor of iv’s existing engagement. to be careful. or because iv broke her engagement to m in favor of a better proposal. So consider the first moment during the execution g in which some man.7) Every execution of the C--S algorithm results in the set S*: This statement is surprising at a number of levels. Proof. To consider a very different kind of example. We will say that iv is the best valid partner of m if iv is a valid parmer of m. is rejected by a valid partner iv. Despite all this. Since men propose in decreasing order of preference. Since the rejection of m by iv was the first rejection of a man by a valid partner in the execution ~. (1. there is no stable matching in which any of the men could have hoped to do better. All Executions Yield the Same Matching There are a number of possible ways to prove a statement such as this.6) as "Consider an execution of the G-S algorithm that returns a set of pairs S. each woman is paired ivith her ivorst valid partner. Since he proposed in decreasing order of preference. And finally." Thus. with different independent components performing actions that can be interleaved in complex ways. we encounter another very natural question: Do all executions of the G-S algorithm yield the same matching? This is a genre of question that arises in many settings in computer science: we have an algorithm that runs asynchronously. it follows that (m’. iv). w). the same cannot be said for the women. But we have already seen that iv prefers m’ to m. Since (m’. and no man whom iv ranks lower than m is a valid partner of hers. Different choices specify different executions of the algprithm. For a woman w. Unfortunately. First of all. we stated (1. we will see that the answer to our question is surprisingly clean: all executions of the G-S algorithm yield the same matching. In the present context. This contradicts our claim that S’ is stable and hence contradicts our initial assumption. this is why. and no woman whom m ranks higher than iv is a valid partner of his. iv). Suppose there were a pair (m. let S* denote the set of pairs {(m. and since iv’ is clearly a valid parmer of m’. we are allowed to choose any flee man to make the next proposal.10 Chapter 1 Introduction: Some Representative Problems 1. But either way. there is no reason to believe that S* is a matching at all. that some execution g of the G-S algorithm results in a matching S in which some man is paired with a woman who is not his best valid partner. best(m)) : m ~ M}. We will prove the folloWing fact.) First. (Recall that this is true if all men prefer different women. (1. Now we ask: Who is m’ paired with in this matching? Suppose it is a woman iv’ ~= iv. Let us suppose. it must be that m’ had not been rejected by any valid parmer at the point in ~ when he became engaged to iv. many of which would result in quite complicated arguments. It turns out that the easiest and most informative approach for us will be to uniquely characterize the matching that is obtained and then show that al! executions result in the matching with this characterization. the G-S algorithm is ideal. it answers our question above by showing that the order of proposals in the G-S algorithm has absolutely no effect on the final outcome. we say that m is a valid partner if there is a stable matching that contains the pair (m. for in execution ~ she rejected m in favor of m’. let alone a stable matching." instead of "Consider the set S returned by the G-S algorithm.8) In the stable matching S*. What is the characterization? We’ll show that each man ends up with the "best possible partner" in a concrete sense. iv) is an instability in S’. our example reinforces the point that the G-S algorithm is actually underspecified: as long as there is a free man. Proof. say m. the proof is not so difficult. We will use best(m) to denote the best valid partner of m. iv) S’. Then there is a stable matching S’ in which iv is paired . and since this is the first time such a rejection has occurred. as defined. why couldn’t it happen that two men have the same best valid partner? Second. the result shows that the G-S algorithm gives the best possible outcome for every man simultaneously. we will say that a woman iv is a valid partner of a man m if there is a stable matching that contains the pair (m.

In S’. Let’s now turn to a discussion of the five representative problems. we’ll also be using them extensively throughout the book. steps: formulating the problem with enough mathematical precision that we can ask a concrete question and start thinking about algorithms to solve it. The goal is to maximize the number of requests accepted. but with experience one can start recognizing problems as belonging to identifiable genres and appreciating how subtle changes in the statement of a problem can have an enormous effect on its computational difficulty. (a) 1. Two requests i andj are compatible if the requested intervals do not overlap: that is. To get this discussion started. in which the men’s preferences clashed with the women’s. For many problems.. or an electron microscope--and (b) depicts a graph on four nodes. m is paired with a woman w’ ~ w. w) is an instability in S’. n. which are very useful in assessing the inherent complexity of a problem and in formulating an algorithm to solve it. Interval Scheduling Consider the following very simple scheduling problem. we see that m prefers w to w’. rejecting al! others. becoming familiar with these design techniques is a gradual process. each of which "joins" two of the nodes. While graphs are a common topic in earlier computer Figure 1. a supercompnter. [] Thus. You have a resource-. then. we’ll be introducing them in a fair amount of depth in Chapter 3.3 Each of (a) and it may be a lecture room. We illustrate an instance of this Interual Scheduling Problem in Figure 1. Thus. science courses..2 Five Representative Problems 13 with a man m’ whom she likes less than m. though. there will be n requests labeled 1 . many people request to use the resource for periods of time. hinted at a very general phenomenon: for any input. we have si < fi for all i. all resembling one another at a general level. u) for some u. . i ~=j are compatible. The goal is to select a compatible subset of requests of maximum possible size. We’ll say more generally that a subset A of requests is compatible if all pairs of requests i. The problems are self-contained and are al! motivated by computing applications. this process involves a few significant. u ~ V. and analyzing the algorithm by proving it is correct and giving a bound on the running time so as to establish the algorithm’s efficiency. while the side that does not do the proposing correspondingly ends up with the worst possible stable matching. More formally.. To talk about some of them. it helps to pick out a few representative milestones that we’ll be encountering in our study of algorithms: cleanly formulated problems. and the fifth hints at a class of problems believed to be harder stil!.4. But from this it follows that (m. designing an algorithm for the problem.j ~ A. Naturally.12 Chapter 1 Introduction: Some Representative Problems 1. but differing greatly in their difficulty and in the kinds of approaches that one brings to bear on them. since w is the best valid partner of m. We typica!ly draw graphs as in Figure 1. it’s enough to think of a graph G as simply a way of encoding pairwise relationships among a set of objects. The first three will be solvable efficiently by a sequence of increasingly subtle algorithmic techniques. For the discussion here.. serving as an example of a problem believed to be unsolvable by any efficient algorithm. and this is the largest compatible set. Note that there is a single compatible set of size 4.2 Five Representative Problems The Stable Matching Problem provides us with a rich example of the process of algorithm design. G consists of a pair of sets (V. until time f? We will assume that the resource can be used by at most one person at a time. This high-level strategy is carried out in practice with the help of a few fundamental design techniques. with each node as a small circle and each edge as a line segment joining its two ends. and w’ is a valid partner of m. the side that does the proposing in the G-S algorithm ends up with the best possible stable matching (from their perspective). A request takes the form: Can I reserve the resource starting at time s.4 An instance of the Interval Scheduling Problem.Figure 1. where we call u and u the ends of e. either request i is for an earlier time interval than request j (fi < or request i is for a later time than request j (1~ _< si). As in any area. A scheduler wants to accept a subset of these requests.3. it will help to use the termino!ogy of graphs.. the fourth marks a major turning point in our discussion. with each request i specifying a start time si and a finish time fi. so that the accepted requests do not overlap in time. contradicting the claim that S’ is stable and hence contradicting our initial assumption. due to their enormous expressive power. We thus represent an edge e ~ E as a two-element subset of V: e = (u. we find that our simple example above. E)--a collection V of nodes and a collection E of edges.

E) is a set of edges M _c E with the property that each node appears in at most one edge of M. Then the matchings and perfect matchings in G’ are precisely the matchings and perfect matchings among the set of men and women. There appears to be no simple greedy rule that walks through the intervals one at a time. then there is a perfect matching if and only if the maximum matching has size n. 1. Consider. and in order to do this it is useful to define the notion of a bipartite graph. Figure 1. and an edge from every node in X to every node in Y. Thus. the bipartite graph G in Figure 1. and we want to figure out how to pair off many people in a way that is consistent with this. So any algorithm for this problem must be very sensitive to the values. vi > O. Instead. computer science departments across the country are often seen pondering a bipartite graph in which X is the set of professors in the department. with the nodes in X and Y in two parallel columns. we defined a matching to be a set of ordered pairs of men and women with the property that each man and each woman belong to at most one of the ordered pairs. in the problem of finding a stable matching. the edges are pairs of nodes. we employ a technique. we sohght to maximize the number of requests that could be accommodated simultaneously. that the two graphs in Figure 1. A perfect matching in this graph consists of an assignment of each professor to a course that he or she can teach.14 Chapter ! Introduction: Some Representative Problems We will see shortly that this problem can be solved by a very natural algorithm that orders the set of requests according to a certain heuristic and then "greedily" processes them in one pass.5 A bipartite graph. that builds up the optimal value over all possible solutions in a compact. suppose more generally that each request interval i has an associated value. but the appearance of arbitrary values changes the nature of the maximization problem quite a bit. making the correct decision in the presence of arbitrary values. find a matching of maximum size. so the set of possible matchings has quite a complicated structure. dynamic programming. E) is bipa~te if its node set V can be partitioned into sets X In the Stable Matching Problem. When a greedy algorithm can be shown to find an optimal solution for al! instances of a problem. that if v1 exceeds the sum of all other vi. we added preferences to this picture. The case in which vi = I for each i is simply the basic Interval Scheduling Problem. We will find that the algorithmic techniques discussed earlier do not seem adequate . Thus the Bipartite Matching Problem is the following: Given an arbitrary bipartite graph G. But notice." we will draw it this way. yj) indicates that professor x~ is capable of teaching course y]. the nodes in Y can represent machines. In other words. In the case of bipartite graphs. then the optimal solution must include interval 1 regardless of the configuration of the fi~l set of intervals. we do not consider preferences. and an edge (xi. when we want to emphasize a graph’s "bipartiteness. Our goal will be to find a compatible subset of intervals of maximum total value.(V. Bipal~te Matching When we considered the Stable Matching Problem. M is a perfect matching if every node appears in exactly one edge of M. it’s often fairly surprising. In the spring. Now. We say that a graph G ---. but there is only one perfect matching. we could picture this as the amount of money we will make from the ith individual if we schedule his or her request. so we say that a matching in a graph G = (V. We can express these concepts more generally in terms of graphs. in such a way that every course is covered. Y is the set of offered courses. tabular way that leads to a very efficient algorithm. We then defined a perfect matching to be a matching in which every man and every woman belong to some pair. a set Y of n women. often.typical of a class of greedy algorithms that we will consider for various problems--myopic rules that process the input one piece at a time with no apparent look-ahead. and an edge (x~. This will be . If IXI = I YI = n. Now. A perfect matching is then a way of assigning each job to a machine that can process it. with the property that each machine is assigned exactly one job. matchings were built from pairs of men and women. it is as though only certain pairs of men and women are willing to be paired off. and yet degenerate to a method for solving (unweighted) interval scheduling when all the values are equal to 1. the nodes in X can represent jobs. for example. y]) can indicate that machine y] is capable of processing job xi.5. A bipartite graph is pictured in Figure 1. To see that this does capture the same notion we encountered in the Stable Matching Problem. for example. (Do you see it?) Matchings in bipartite graphs can model situations in which objects are being assigned to other objects. Here.5: there are many matchings in G. for example. Weighted Interval Scheduling In the Interval Scheduling Problem. selecting as large a compatible subset as it can.3 are also bipartite. or weight. We typically learn something about the structure of the underlying problem from the fact that such a simple approach can be optimal. consider a bipartite graph G’ with a set X of n men. Consider.2 Five Representative Problems 15 and Y in such a way that every edge has one end in X and the other end in Y. but the nature of the problem in arbitrary bipartite graphs adds a different source of complexity: there is not necessarily an edge from every x ~ X to every y ~ Y.

labeled 1. then Queequeg’s Coffee opens a franchise. it inductively builds up larger and larger matchings.3(b) cannot arise as the "conflict graph" in an instance of Bipartite Matching. We simply show you the graph G. Competitive Facility Location Finally. Given a bipartite graph G’ = (V’. circle the nodes of S in red. are the pairs of edges that cannot belong to a common matching. and we want to convince you that it contains an independent set S of size 100. the following: Given G. achieved by the. We define an edge between each pair of elements in V that correspond to edges of G’ with a common end. a very elegant and efficient algorithm to find a maximum matching.2 Figure 1. It would have to implicitly contain algorithms for Interval Scheduling. n. and so on. indeed. then JavaPlanet. we note that not every instance of the Independent Set Problem can arise in this way from Interval Scheduling or from Bipartite Matching. E) in which the nodes are the intervals and there is an edge between each pair of them that overlap. The graph in Figure 1. For Interval Scheduling. No efficient algorithm is known for any of them. Given the generality of the Independent Set Problem. While it is not complicated to check this. 2 . but they are all equivalent in the sense that a solution to any one of them would imply. 5. E’). So there really seems to be a great difference in difficulty between checking that something is a large independent set and actually finding a large independent set. There is. as we’ll see next. and then recording the largest one encountered. the maximum size of an independent set in the graph in Figure 1. This process is called augmeritation. with an edge between each conflicting pair. Consider two large companies that operate caf6 franchises across the country--let’s call them JavaPlanet and Queequeg’s Coffee--and they are currently competing for market share in a geographic area. This may look like a very basic observation--and it is--but it turns out to be crucial in understanding this class of problems. E). How large a group of your friends can you invite to dinner if you don’t want any interpersonal tensions? This is simply the largest independent set in the graph whose nodes are your friends. Furthermore. and let you check that no two of them are joined by an edge.four-node independent set [1.6 A graph whose largest independent set has size 4. which is based on the following twoplayer game. E) in which the node set V is equal to the edge set E’ of G’. The Independent Set Problem is. a solution to all of them. and a host of other natural optimization problems. and the conflicts arise between two edges that share an end. and some pairs of them don’t get along. . nodes-to-edges" transformation.6 is four. Each zone i has a 2 For those who are curious. it takes a little concentration to deal with this type of "edges-to-nodes. however. and each is trying to make its locations as convenient as possible. selectively backtracking along the way. in a precise sense.2 Five Representative Problems 17 for providing an efficient algorithm for this problem. an efficient algorithm to solve it would be quite impressive. For example. Bipartite Matching.) So we define a graph G = (V. Suppose they must deal with zoning regulations that require no two franchises be located too close together. The Independent Set Problem encodes any situation in which you are trying to choose from among a collection of objects and there are pairwise conflicts among some of the objects. and it forms the central component in a large class of efficiently solvable problems called network flow problems.000 nodes. define a graph G = (V. then. checking each to see if it is independent. the independent sets in G are then just the compatible subsets of intervals. the objects being chosen are edges.. it’s possible for a problem to be so hard that there isn’t even an easy way to "check" solutions in this sense. 4. Who will win? Let’s make the rules of this "game" more concrete. It is possible that this is close to the best we can do on this problem. Encoding Bipartite Matching as a special case of Independent Set is a little trickier to see. Say you have n friends.. Independent Set Now let’s talk about an extremely general problem. 6}. the full Independent Set Problem really is more general. Interval Scheduling and Bipartite Matching can both be encoded as special cases of the Independent Set Problem. Here’s a natural question: Is there anything good we can say about the complexity of the Independent Set Problem? One positive thing is the following: If we have a graph G on 1. We will see later in the book that Independent Set is one of a large class of problems that are termed NP-compIete. and it is conjectured that no such algorithm exists.. and the graph in Figure 1. we come to our fifth problem. The geographic region in question is divided into n zones. The obvious brute-force algorithm would try all subsets of the nodes. which includes most of these earlier problems as special cases. find an independent set that is as large as possible. We can now check that the independent sets of G are precisely the matchings of G’.16 Chapter 1 Introduction: Some Representative Problems 1. The current status of Independent Set is this: no efficient algorithm is known for the problem. then Queequeg’s. Given a graph G = (V. then it’s quite easy. (These.. First JavaPlanet opens a franchise.3(a) cannot arise as the "conflict graph" in an instance of Interval Scheduling. we say a set of nodes S _ V is independent if no ’two nodes~in S are joined by an edge.

What would it mean for the claim to be false? There would exist some stable matching M in which a good man m was married to a bad woman w. "If P~ goes here. then P2 does not. Show that in every stable matching.! other good men.18 Chapter 1 Introduction: Some Representative Problems Solved Exercises 19 Solved Exercises Figure 1.]) are adjacent. where we believe that finding a large solution is hard but checking a proposed large solution is easy. This is in contrast to the Independent Set Problem. Could it be the case that every good woman is married to a good man in this matching M? No: one of the good men (namely. however desirable they may seem. This contradicts our assumption that M is stable. Everyone would rather marry any good person than any bad person. value bi.7. the instance pictured in Figure 1. Solved Exercise 1 Consider a town with n men and n women seeking to get married to one another. but it requires some amount of case-checking of the form. then P2 will go here . thus there are n . ~The zoning requirement then says that the full set of franchises opened must form an independent set in G. who is married to a bad man. for example. alternately selecting nodes in G. 1 < k < n . Using the analogy to marriage between men and women. Location Problem. every good man is married to a good woman. Let w’ be such a good woman. there are k good men and k good women. and suppose that P2’s target bound is B = 20. many of these are fundamental issues in the area of artificial intelligence. There does not seem to be a short proof we could present. m) is already married to a bad woman. certain pairs of zones (i. if B = 25. P2 will be able to select a set of nodes with a total value of at least B? We will call this an instance of the Competitive Facility Location Problem. P1 and P2. and its next n . PSPACE-complete problems are believed to be strictly harder than NP-complete problems. Solved Exercise 2 We can think about a generalization of the Stable Matching Problem in which certain man-woman pairs are explicitly forbidden. and this conjectured lack of short "proofs" for their solutions is one indication of this greater hardness. but is married to a bad partner. On the other hand. It is now easy to identify an instability in M: consider the pair (m. we’d have to lead you on a lengthy case-by-case analysis of the set of possible moves. the set of all selected nodes must form an independent set in G. let’s consider what the other pairs in M look like. Thus our game consists of two players. and we want to know: is there a strategy for P2 so that no matter how P1 plays. but if P~ goes over there. we have a set M of n men. . that would still leave some good woman who is married to a bad man....and ~(i. on a reasonably sized graph. Each is good.. Suppose that player P2 has a target bound B. Solution A natural way to get started thinking about this problem is to assume the claim is false and try to work toward obtaining a contradiction.k bad women.E). each preference list has the property that it ranks each good person of the opposite gender higher than each bad person of the opposite gender: its first k entries are the good people (of the opposite gender) in some order. w’) is an instability. There are k good men and k good women. with P1 moving first.) We model these conflicts via a graph G= (V. Consider. The notion of PSPACE-completeness turns out to capture a large collection of problems involving game-playing and planning. regardless of which company owns them. This contrast can be formalized in the class of PSPACE-complete problems. . At all times. and so they cannot be employed at certain companies. a set W of n women. w’). then P2 will go there.1. "And this appears to be intrinsic to the problem: not only is it compntafionally difficult to determine whether P2 has a winning strategy.k bad men and n . Thus. Suppose that for some number k. and that leaves only k .k are the bad people (of the opposite gender) in some order.7 An instance of the Competitive FaciBt3. One can work this out by looking at the figure for a while. and each woman has a preference list that ranks all the men. So even if all of them were married to good women. Then P2 does have a winning strategy. Finally. The set of all 2n people is divided into two categories: good people and bad people. we could imagine that certain applicants simply lack the necessary qualifications or certifications. (They also prevent two franchises from being opened in the same zone. it would even be hard for us to convince you that P2 has a winning strategy.]) is an edge in E if the zones i and ] are adiacent. which is the revenue obtained by either of the companies if it opens a franchise there. and hence concludes the proof. each of m and w’ prefers the other to their current partner. and local zoning laws prevent two adjacent zones from each containing a franchise. Each man has a preference list that ranks all the women. Formally. In the case of employers and applicants. Now. where V is the set of zones. of which Competitive Facility Location is an example. rather. and hence (m.

We now prove that this yields a stable matching.1). We will show that there is always a stable matching. so that m’ is not part of any pair in the matching. is that the G-S algorithm doesn’t know anything about forbidden pairs. produces a stable matching. (The usual kind of instability. then m must have proposed to every nonforbidden woman. we don’t have to worry about establishing that the resulting matching S is perfect (indeed. Each man m ranks all th6 women w for which (m. even in this more general model with forbidden pairs. w’) in S with the property that (m. (i) There are two pairs (m. Now we can ask: For every set of preference lists and every set of forbidden pairs.20 Chapter ! Introduction: Some Representative Problems Solved Exercises 21 and a set F _q M x W of pairs who are simply not allowed to get married. then it must be that no man ever proposed to w. and a man m’. To begin with. or (b) give an example of a set of preference lists and forbidden pairs for which there is no stable matching. So. w) F. (m. Here is the algorithm in full. w) is forbidden. (1. w) F. (A single man is more desirable and not forbidden. under our new definition of stabi~ty.) (iv) There is a man m and a woman w. (A single woman is more desirable and not forbidden. it’s often a good idea to check whether a direct adaptation of the G-S algorithm will in fact produce stable matchings.) (ii) There is a pair (m.~) become engaged Else w is currently engaged to m’ If w prefers m’ to m then m remains free Else ~ prefers m to m’ (m. While there is a man m who is free and hasn’t proposed to every woman. it may not be]. and (1. we need only show (1.5) from the text remain true (in particular. Also. m prefers w’ to w. The difficulty.2). . w’) F. w’) F. Finally. If m is a man who is not pan of a pair in S. Thus. w’) ~ F. facts (1. That turns out to be the case here. and if w is a woman who is not part of a pair in S. and w’ prefers m to m’. so that w’ is not part of any pair in the matching. neither of whom is part of any pair in the matching. the algorithm will terminate in at most n2 iterations].) won’t work: we don’t want m to propose to a woman w for which the pair (m.~) become engaged mt becomes free Endif Endif Endwhile Keturn the set S of engaged pairs Note that under these more general definitions. and m prefers w’ to w.w) ~F. and a woman W’. of course. We also notice an additional pairs of facts. and we will do this by adapting the G-S algorithm. and so the condition in the gh±le loop. Initially all m ~M and w ~W are free While there is a man m who is free and hasn’t proposed to every woman w for which (m. To do this. let’s consider a variation of the G-S algorithm in which we make only one change: we modify the Wh±le loop to say. if you’re faced with a new variation of the problem and can’t find a counterexample to stability. w) and (m’. (m’. for any set of preference lists and forbidden pairs.9) There is no instability with respect to the returned matching S. In this more general setting. w) E S. is there always a stable matching? Resolve this question by doing one of the following two things: (a) give an algorithm that. w) E S. and each woman w’ ranks al! the men m’ for which (m’. and w prefers m’ to m. While there is a man m who is free and hasn’t proposed to every woman w for which (m. (There are two single people with nothing preventing them from getting married to each other. a stable matching need not be a perfect matching.) (iii) There is a pair (m. Solution The Gale-Shapley algorithm is remarkably robust to variations on the Stable Matching Problem. let’s consider why the original G-S algorithm can’t be used directly. w) ~ F. w) ~F Choose ~uch a man m Let ~ be the highest-ranked woman in m’s preference list to which m has not yet proposed If ~ is free then (m. so that (m. we say that a matching S is stable if it does not exhibit any of the following types of instability.

again. involx4ng competition between two enterprises. suppose there is an instability of type (ii). so that (m. If it is false. there is no schedule S’ such that Network ~t wins more slots with the pair (S’. T). so that m’ is not part of any pair in the matching. True or false? In every instance of the Stable Matching Problem. give a cotmterexample. and symmetrically. (m. for the problem of assigning medical residents to hospitals. each with a certain number of available positions for hiring residents. w’) in S with the property that (m. give a short explanation. given their schedules. (m’. Gale and Shapley published their paper on the Stable Matching Problem in 1962. Then no man proposed to w’ at all. . the situation was the following. or (b) give an example of a set of TV shows and associated ratings for which there is no stable pair of schedules. suppose there is an instability of type (i). There were m hospitals. there is a stable matching containing a pair (m. so that w’ is not part of any. give a short explanation. he must have proposed tow. whom we’ll call A and ~B. [] Exercises Decide whether you think the following statement is true or false. in particular. First. is there always a stable pair of schedules? Resolve this question by doing one of the following two things: (a) give an algorithm that.22 Chapter 1 Introduction: Some Representative Problems Exercises Suppose we have two television networks. so w’ rejected rn. w) ~ S. Basically. neither of which is part of any pair in the matching. each network wins certain time slots. w) belongs to S. Third. Each hospital had a ranking of the students in order of preference. Finally. we’ll assume that no two shows have exactly the same rating. There are many other settings in which we can ask questions related to some type of "stability" principle. T). If it is false. but a version of their algorithm had already been in use for ten years by the National Resident Matching Program. w’) ~ F. according to the rule above. for any set of TV shows and associated ratings. m never proposed to w’. w) ~ S. Suppose in the opening week of the fall season. the pair (m. consisting of a man m and a woman w. T) is stable if neither network can unilaterally change its own schedule and win more time slots. and each student had a ranking of the hospitals in order of preference. On the basis of this pair of schedules. Each network wants to devise a schedule--an assignment of each show to a distinct slot--so as to attract as much market share as possible. and w prefers m’ to m. which is based on the number of people who watched it last year. Our general definition of instability has four parts: This means that we have to make sure that none of the four bad things happens. A network wins a given time slot if the show that it schedules for the time slot has a larger rating than the show the other network schedules for that time slot. Each show has a fixed rating. it follows that w prefers her final partner to contradiction. Then m’ must have proposed to w and been rejected. There were n medical students graduating in a given year. The analogue of Gale and Shapley’s question for this kind of stability is the following: For every set of TV shows and ratings. in particular. w’) ~ F. We will assume that there were more students graduating than there were slots available in the m hospitals. and each network has n TV shows. m prefers w’ to w. he must have proposed to every nonforbidden woman. That is. True or false? Consider an instance of the Stable Matching Problem in which there exists a man m and a woman w such that m is ranked first on the preference list of w and w is ranked first on the preference list of m. T) than it did with the pair (S. suppose there is an instability of type (iii). produces a stable pair of schedules. and so he must prefer w to contradiction. there is no schedule T’ such that Network ~B wins more slots with the pair (S. consisting of pairs (m. suppose there is an instability of type (iv). which means she would no longer be single--a contradiction. w) such that m is ranked first on the preference list of w and w is ranked first on the preference list of m. consisting of a pair (m. But for ra to be single. and a man m’. and thus she prefers her final partner to m--a contradiction. Then in every stable matching S for this instance. If it is true. We’ll say that the pair of schedules (S. Here’s one. and w’ prefers m to m’. and a woman w’. give a counterexample. consisting of a pair (m. 3. Network A reveals a schedule S and Network ~B reveals a schedule T. and rn prefers w’ to w. T’) than it did with the pair (S. w) ~ F. Here is the way we determine how well the two networks perform relative to each other. There are n prime-time programming slots. Next. If it is true. each interested in joining one of the hospitals. pair in the matching. w) ~ F. The goal of each network is to win as many time slots as possible. Decide whether you think the following statement is true or false. It follows that m must have proposed to w’. 23 Proof. w) and (m’.

but this is okay. In other words. And for each. assumes that all men and women have a fully ordered list of preferences. As before we have a set M of n men and a set W of n women. and . was in finding a way of assigning each student to at most one hospital.h prefers s’ to s.s is assigned to h. With indifferences in the ranldngs. and tv either prefers m to m’ or is indifferent be~veen these two choices. In this problem we will consider a version of the problem in which men and women can be indifferent between certain options. there will be some day when it arrives in its scheduled port and simply remains there for the rest of the month (for maintenance). Does there always exist a perfect matching with no weak instability? Either give an example of a set of men and women with preference lists for which every perfect matching has a weak instability.s’ is assigned to no hospital. so that .m prefers u~ to ui. we can ask about the existence of stable matchings. 6. For safety reasons. find a truncation of each so that condition (t) continues to hold: no two ships are ever in the same port on the same day. as discussed in the text. (a) A strong instability in a perfect matching S consists of a man m and a woman tv. and give an algorithm to find them. Show that such a set of truncations can always be found. ¯ First type of instability: There are students s and s’. and s’ is assigned to tff. and hospitals ° t~ and h’. in such a way that all available positions in all hospitals were filled. and m either prefers u~ to u3’ or is indifferent between these two choices. there would be some students who do not get assigned to any hospital. has the following strict requirement: (t) No two ships can be in the same port on the same day. but now we allow ties in the ranking. We will say that tv prefers m to m’ if m is ranked higher than m’ on her preference list (they are not tied). and . (You can assume the "month" here has m days. or give an algorithm that is guaranteed to find a perfect matching with no strong instability. or preferred by one while the other is indifferent.) We say that an assignment of students to hospitals is stable ff neither of the following situations arises. or give an algorithm that is guaranteed to find a perfect matching with no weak instability. So we basically have the Stable Matching Problem. the remainder of the truncated schedule simply has it remain in port P. for each day of the month. or u~ prefers m to m’.t~ prefers s’ to s. and m4 is in last place.. except that (i) hospitals generally want more than one resident. and (ii) there is a surplus of medical students. or whether it’s out at sea. PSL Inc.s is assigned to h. for some m > n. The Stable Matching Problem. via the following scheme. . Does there always exist a perfect matching with no strong instability? Either give an example of a set of men and women with preference lists for which every perfect matching has a strong instability. The company wants to perform maintenance on all the ships this month.) Each ship visits each port for exactly one day during the month. (Since we are assuming a surplus of students. such that each of m and tv prefers the other to their partner in S. is a shipping company that owns n ships and provides service to n ports. a woman could say that ml is ranked in first place. and a hospital h. and . This means that S~ will not visit the remaining ports on its schedule (if any) that month. as follows. so that . the pairing between m and tv is either preferred by both. Peripatetic Shipping Lines. such that their partners in S are tv’ and m’. and . naturally. respectively. Show that there is always a stable assignment of students to hospitals. Second type of instability: There are students s and s~. So the truncation of S~’s schedule will simply consist of its original schedule up to a certain specified day on which it is in a port P. second place is a tie between mz and m3 (she has no preference between them). there could be two natural notions for stability. They want to truncate each ship’s schedule: for each ship Sg.s’ prefers tt to h’. which of the ports it’s currently visiting.24 Chapter 1 Introduction: Some Representative Problems Exercises 25 The interest. Now the company’s question to you is the following: Given the schedule for each ship. and one of the following holds: . Each of its ships has a schedule that says. and give an algorithm to find one. For example (with n = 4). Assume each man and each woman ranks the members of the opposite gender. inc. (b) A weak instability in a perfect matching S consists of a man m and a woman tv.

If the stream of Input i is switched onto Output j. give an algorithm to find such a valid switching. port P2 Then the (only) way to choose truncations would be to have the first ship remain in port Pz starting on day 3. for two distinct points x and y on the same wire. Furthermore--and this is the trick3. then this stream passes through all junction boxes upstream from B on input i. Some of your friends are working for CluNet. The basic question is: Can a man or a woman end up better off by lying about his or her preferences? More concretely. but each input data stream must be switched onto a different output wire. are looking at algorithms for switching in a particular type of input/output crossbar. at a special piece of hardware called a junction box. Output 2 (meets Input 2 before Input 1) Junction [Junction Input 1 (meets Output 2 before Output 1) Input 2 (meets Output 1 before Output 2) Figure 1. w will end up with a man m" that she truly prefers to both m and m’? (We can ask the same question for men. a builder of large communication networks. and the "month" has four days. port P2~ at sea and the second ship’s schedule is Junction . then through B. and the stream of Input 2 were switched onto Output 2. by falsely claiming that she prefers m’ to m) and nmning the algorithm with this false preference list. and no two of the resulting streams pass through the same junction box. here’s the problem. that x is upstream from y if x is closer to the source than y.~e in exactly one distinct point. It does not matter .vhich input data stream gets switched onto which output wire.8 gives an example of such a collection of input and output wires. x is downstream from y. Here is the setup. Points on the ~e are naturally ordered in the direction from source to terminus. and otherwise we say. Suppose the first ship’s schedule is port P1. then through all junction boxes downstream from B on Output j. here’s the switching component of this situation. switching the order of a pair on the list cannot improve a woman’s partner in the GaleShapley algorithm. and they. Now. Suppose w prefers man m to m’. Input 1 has its junction with Output 2 upstream from its junction with Output 1. at sea. A valid solution is to switch the data stream of Input 1 onto Output 2. port Pff at sea. Now consider a woman w. Can it be the case that by switching the order of m and ra’ on her list of preferences (i. Finally.) Figure !. then both streams would pass through the junction box at the meeting of Input 1 and Output 2--and this is not allowed. (And similarly for the orders in which output wires meet input wires. On the other hand.e. The order in which one input wire meets the output ~es is not necessarily the same as the order in which another input wire meets the output wires.26 Chapter 1 Introduction: Some Representative Problems Exercises 27 Example. but will focus on the case of women for purposes of this question. Additionally. Suppose we have two ships and two ports. constraint--no two data streams can pass through the same junction box following the switching operation. but both m and m’ are low on her list of preferences. at junction box B. if the stream of Input 1 were switched onto Output 1. Each input ~e is carrying a distinct data stream. There are n input wires and rt output wires. we will explore the issue of truthfulness in the Stable Matching Problem and specifically in the Gale-Shapley algorithm. ~Junction Output 1 (meets Input 2 before Input 1) at sea.) Resolve this question by doing one of the following two things: (a) Give a proof that. and this data stream must be switched onto one of the output wqres. we say. we suppose each participant has a true preference order. and the data stream of Input 2 onto Output 1. a valid switching of the data streams can always be found--one in which each input data stream is switched onto a different output. ".. each directed from a source to a terminus. Show that for any specified pattern in which the input wires and output wires meet each other (each pair meeting exactly once). Each input wire meets each output . or . For this problem.8 An example with two input wires and two output wires. Input 2 has its junction with Output 1 upstream from its junction with Output 2. for any set of preference lists. and have the second ship remain in port P1 starting on day 2..

28 Chapter 1 Introduction: Some Representative Problems (b) Give an example of a set of preference lists for which there is a switch that-Would improve the partner of a woman who switched preferences. We begin this chapter by talking about how to put this notion on a concrete footing. 2. respectively. Basics A~gorithm Analysis Analyzing algorithms involves thinking about how their resource requirements--the amount of time and space they use--will scale with increasing input size. We will discuss the problems in these contexts later in the book. We will look for paradigmatic problems and approaches that illustrate. and we conclude this chapter with a very useful example of such a data structure: priority queues and their implementation using heaps. beginning with an implementation of the Gale-Shapley algorithm from Chapter 1 and continuing to a survey of many different running times and certain characteristic types of algorithms that achieve these running times. we develop the mathematical machinery needed to talk about the way in which different functions scale with increasing input size. the basic approaches to designing efficient algorithms. our topic seems to . Notes and Further Reading The Stable Matching Problem was ~st defined and analyzed by Gale and Shapley (1962). of greedy algorithms. with a minimum of irrelevant detail. covered in books by Gusfield and Irving (1989) and Knuth (1997c). Gusfield and Irving also provide a nice survey of the "paralle!" history of the Stable Matching Problem as a technique invented for matching applicants with employers in medicine and other professions. network flow. so the problems and . At the same time. we will txy to identify broad themes and design principles in the development of algorithms. At this level of generality. making precise what it means for one function to grow faster than another. dynamic programming. so what is specific to our approach here? First. 2001). our five representative problems will be central to the book’s discussions. NP-completeness. as making it concrete opens the door to a rich understanding of computational tractability. according to David Gale.encompass the whole of computer science. obtaining a good running-time bound relies on the use of more sophisticated data structures. As discussed in the chapter. and pSPACE-completeness. it would be pointless to pursue these design principles in a vacuum. Stable matching has grown into an area of study in its own right. their motivation for the problem came from a story they had recently read in the Netv Yorker about the intricacies of the college admissions process (Gale. In some cases. We then develop running-time bounds for some basic algorithms.1 Computational Tractability A major focus of this book is to find efficient algorithms for computational problems. Having done this.

Before focusing on any specific consequences of this claim. once we have decided to go the route of mathematical analysis. we could take this to be the total size of the representation of all preference lists. In particular. but in general the worst-case analysis of an algorithm has been found to do a reasonable job of capturing its efficiency in practice. we will focus primarily on efficiency in running time: we want algorithms that run quickly. the number of men and the number of women. So what we could ask for is a concrete definition of efficiency that is platform-independent. In considering the problem. but very often it can also become a quagmire. even good algorithms can run slowly when they are coded sloppily. this proposed deflation above does not consider how well. we will focus on analyzing the worst-case running time: we will look for a bound on the largest possible running time the algorithm could have over all inputs of a given size N. Even bad algorithms can run quickly when applied to small test cases on extremely fast processors. The first is the omission of where. an algorithm may scale as problem sizes grow to unexpected levels. or badly. even if our main goal is to solve real problem instances quickly on real computers. and so average-case analysis risks telling us more about the means by which the random inputs were generated than about the algorithm itself. Proposed Definition of Efficiency (1): An algorithm is efficient if. when implemented. what is a "real" input instance? We don’t know the ful! range of input instances that will be encountered in practice.30 Chapter 2 Basics of Algorithm Analysis 2.atipnal Tractability "31 approaches we consider are drawn from fundamental issues that arise throughout computer science. real inputs to an algorithm are generally not being produced from a random distribution. each of length n. like the Stable Matching Problem. we can view N = 2n2. N is closely related to the other natural parameter in this problem: n.1 Comput. Worst-Case Running Times and Brute-Force Search To begin with. And indeed. we will seek to describe an algorithm at a high level. we implement an algorithm. Average-case analysis--the obvious appealing alternative. But what is a reasonable analytical benchmark that can tell us whether a running-time bound is impressive or weak? A first simple guide . high-level suggestion: that we need to take a more mathematical view of the situation. and one will sti!l run quickly while the other consumes a huge amount of time. The focus on worst-case performance initially seems quite draconian: what if an algorithm performs well on most instances and just has a few pathological inputs on which it is very slow? This certainly is an issue in some cases. in which one studies the performance of an algorithm averaged over "random" instances--can sometimes provide considerable insight. and see how this scales with N. instance-independent. they will involve an implicit search over a large set of combinatorial possibilities. Some Initial Attempts at Defining Efficiency The first major question we need to answer is the following: How should we turn the fuzzy notion of an "efficient" algorithm into something more concrete? A first attempt at a working definition of efficiency is the following. That is. we can at least explore its implicit. But it is important that algorithms be efficient in their use of other resources as well. Let’s spend a little time considering this definition. A common situation is that two very different algorithms will perform comparably on inputs of size 100. and some input instances can be much harder than others. and a general study of algorithms turns out to serve as a nice survey of computationa~ ideas that arise in many areas. it is hard to find an effective alternative to worst-case analysis. and so attempts to study an algorithm’s performance on "random" input instances can quickly devolve into debates over how a random input should be generated: the same algorithm can perform very well on one class of random inputs and very poorly on another. Another property shared by many of the problems we study is their fundamentally discrete nature. Moreover. suppressing more fine-grained details of how the data is represented. So in general we will think about the worst-case analysis of an algorithm’s running time. Finally. The input has a natural "size" parameter N. and we will see techniques for reducing the amount of space needed to perform a computation. since this is what any algorithm for the problem wi!l receive as input. At a certain leve!. it runs quickly on real input instances. As we seek to understand the general notion of computational efficiency. and then analyze its running time mathematically as a function of this input size N. But there are some crucial things missing from this definition. As we observed earlier. it’s very hard to express the full range of input instances that arise in practice. and the goal will be to efficiently find a solution that satisfies certain clearly delineated conditions. there is a significant area of research devoted to the careful implementation and profiling of different algorithms for discrete computational problems. it’s hard to argue with: one of the goals at the bedrock of our study of algorithms is solving real problems quickly. Since there are 2n preference lists. After all. and how well. the amount of space (or memory) used by an algorithm is an issue that will also arise at a number of points in the book. and of predictive value with respect to increasing input sizes. multiply the input size tenfold. Also. We can use the Stable Matching Problem as an example to guide us.

at an analytical level. We did not implement the algorithm and try it out on sample preference lists. to our solution of the Stable Matching Problem is that we needed to spend time proportional only to N in finding a stable matching from amgng this stupendously large space of possibilities. we can formulate this scaling behavior as follows. For most of these problems. emerges our third attempt at a working definition of efficiency. We’d like a good algorithm for such a problem to have a better scaling property: when the input size increases by a constant factor--say. than brute-force search. if the input size increases by one. as one might expect. much one may try to abstractly motivate the definition of efficiency in terms of polynomial time. The natural "brute-force" algorithm for this problem would plow through all perfect matchings by enumeration. n log n. The surprising punchline. Proposed Definition of Efficiency (2): An algorithm is efficient if it achieves qualitatively better worst-case performance. for some c and d. This will be a common theme in most of the problems we study: a compact’ representation. we reasoned about it mathematically. This was a conclusion we reached at an analytical level.) For now. From this notion. Proposed Definition of Efficiency (3)" An algorithm is efficient if it has a polynomial running time. it is an intellectual cop-out. "yes" and "yes.32 Chapter 2 Basics of Algorithm Analysis 2. Even when the size of a Stable Matching input instance is relatively small. Algorithms that improve substantially on brute-force search nearly always contain a valuable heuristic idea that makes them work. it provides us with absolutely no insight into the structure of the problem we are studying. our analysis indicated how the algorithm could be implemented in practice and gave fairly conclusive evidence that it would be a big improvement over exhaustive enumeration. But if there is a problem with our second working definition. of course. or n3. for example. Wouldn’t an algorithm with running time proportional to nl°°--and hence polynomial--be hopelessly inefficient? Wouldn’t we be relatively pleased with a nonpolynomial running time of nl+’02(l°g n)? The answers are. of course. There are certainly exceptions to this principle in both directions: there are cases. so is 2a." And indeed. n2. Let’s return to the example of the Stable Matching Problem. it is vagueness. this one seems much too prescriptive. if this running-time bound holds. the search space it defines is enormous (there are n! possible perfect matchings between n men and n women). Arithmetically. Conversely. which is a slow-down by a factor of 2a. its running time is at most proportional to Nd. its running time is bounded by cNd primitive computational steps. and try to quantify what a reasonable running time would be. (In other words. then we say that the algorithm has a polynomial running time. or that it is a polynomial-time algorithm. 2aNa. implicitly specifying a giant search space. Search spaces for natural combinatorial problems tend to grow exponentially in the size N of the input. Not only is this approach almost always too slow to be useful. in Polynomial Time as a Definition of Efficiency When people first began analyzing discrete algorithms mathematicalfy--a thread of research that began gathering momentum through the 1960s-- . Where our previous definition seemed overly vague. In any case. Yet. This will turn out to be a very usefu! working definition for us. there will be an obvious brute-force solution: try all possibilities and see if any one of them works. and we need to find a matching that is stable. checking each to see if it is stable. the bound on the running time increases from cNd to c(2N)a = c.1 Computational Tractability is by comparison with brute-force search over the search space of possible solutions. in a sense. problems for which no polynomial-time algorithm is known tend to be very difficult in practice. and the intuition expressed above. a primary justification for it is this: It really works. What do we mean by "qualitatively better performance?" This suggests that we consider the actual running time of algorithms more carefully. Note that any polynomial-time bound has the scaling property we’re looking for. a consensus began to emerge on how to quantify the notion of a "reasonable" running time. we will remain deliberately vague on what we mean by the notion of a "primitive computational step"but it can be easily formalized in a model where each step corresponds to a single assembly-language instruction on a standard processor. and computational tractability. the number of possibilities increases multiplicatively. lower-degree polynomials exhibit better scaling behavior than higher-degree polynomials. And so if there is a common thread in the algorithms we emphasize in this book. Suppose an algorithm has the following property: There are absolute constants c > 0 and d > 0 so that on every input instance of size N. If the input size increases from N to 2N. however. a factor of 2--the algorithm should only slow down by some constant factor C. Since d is a constant. and they tell us something about the intrinsic structure. it would be the following alternative definition of efficiency. of the underlying problem itself. at the same time. or one line of a standard programming language such as C or Java. Problems for which polynomial-time algorithms exist almost invariably turn out to have algorithms with running times proportional to very moderately growing polynomials like n.

we simply record the algorithm as 2. and among different problems. which was tied to the specific implementation of an algorithm.2 Asymptotic Order of Growth Our discussion of computational tractability has turned out to be intrinsically based on our ability to express the notion that an algorithm’s worst-case running time on inputs of size n grows at a rate that is at most proportiona! to some function f(n).000. When we seek to say something about the running time of an algorithm on inputs of size n. But overwhelmingly. for it allows us to ask about the existence or nonexistence of efficient algorithms as a well-defined question.100. We will mainly express algorithms in the pseudo-code style that we used for the Gale-Shapley algorithm. n log2 n. and we have algorithms with running-time bounds of n. 50. but this style Of specifying algorithms will be completely adequate for most purposes. and there are also cases where the best polynomia!-time algorithm for a problem is completely impractical due to large constants or a high exponent on the polynomial bound.000. or years) for inputs of size n = 10. fundamental benefit to making our definition of efficiency so specific: it becomes negatable. being able to do this is a prerequisite for turning our study of algorithms into good science. In particular.5n < I SeE < I sec 11 mJn 12.000 I00. 50. minutes." This may be an interesting statement in some contexts. And finally.000. 2. and others do not. It becomes possible to express the notion that there is no efficient algorithm for a particular problem. In Table 2. We now discuss a framework for talking about this concept. that we have a processor that executes a million high-level instructions per second.710 years previous definitions were completely subjective. and 1.62n2 + 3. "On any input of size n. we will generally be counting the number of such pseudo-code steps that are executed. 2n. n2.892 years very long very long very long very long 2n < I sec 18 rain 36 years 1017 years very long very long very long very long n! 4 see 1025 years very long vgry long very long very long very long very long n= I0 n=30 n=50 = 100 = 1.000 < I see < I sec < 1 sec < 1 sec < 1 sec < I sec < 1 sec 1 sec < I sec < I sec < I sec < I sec < 1 sec < I sec 2 sec 20 sec < I sec < I sec < 1 sec < 1 sec 1 sec 2 min 3 hours 12 days < I sec < i sec < 1 sec 1 sec 18 re_in 12 days 32 years 31.2 Asymptotic Order of Growth 35 n taking a very long time. Suppose. As just discussed. n2 n log rt 2 /73 1.000. extremely detailed statements about the number of steps an algorithm executes are often--in a strong sense--meaningless.000. getting such a precise bound may be an exhausting activity. one thing we could aim for would be a very concrete statement such as.1 The running times (rounded up) of different algorithms on inputs of increasing size. we show the running times of these algorithms (in seconds. In a sense.34 Chapter 2 Basics of Algorithm Analysis Table 2. There is a final. and the tractability of problems. 1. in tea! life.000 10.5n. and n!. Each one of these steps will typically unfold into some fixed number of primitive steps when the program is compiled into which an algorithm with exponential worst-case behavior generally runs well on the kinds of instances that arise in practice. or performing an arithmetic operation on a fixed-size integer. In contrast. 10.000 1. but as a general goal there are several things wrong with it. When we provide a bound on the running time of an algorithm. First. looking up an entry in an array.5n + 8 steps. Our definition in terms of polynomial time is much more an absolute notion. the concrete mathematical definition of polynomial time has turned out to correspond surprisingly wel! in practice to what we observe about the efficiency of algorithms. in this context. for example. days. and more detail than we wanted anyway. show up more clearly. it is closely connected with the idea that each problem has an intrinsic level of computational tractability: some admit efficient solutions. 1. both of our . n3. At times we will need to become more formal.000. Second. we’d actually like to classify running times at a coarser level of granularity so that similarities among different algorithms. we will generally be counting steps in a pseudo-code specification of an algorithm that resembles a highlevel programming language. one step wil! consist of assigning a value to a variable. One further reason why the mathematical formalism and the empirical evidence seem to line up well in the case of polynomial-time solvability is that the gulf between the growth rates of polynomial and exponential functions is enormous. because our ultimate goal is to identify broad classes of algorithms that have similar behavior. more and more algorithms fal! under this notion of efficiency. The function f(n) then becomes a bound on the rtmning time of the algorithm. All this serves to reinforce the point that our emphasis on worst-case. turned efficiency into a moving target: as processor speeds increase. the algorithm runs for at most 1. following a pointer. In cases where the running time exceeds 10-’s years. and hence limited the extent to which we could discuss certain issues in concrete terms.100. the first of our definitions. for a processor performing a million high-leve! instructions per second. polynomial-time bounds is only an abstraction of practical situations.1.

q q. It is not hard to do this. not the exact growth rate of the function. Asymptotic Upper Bounds Let T(n) be a function--say. then in a natural sense we’ve found the "right" bound: T(n) grows exactly like [(n) to within a constant factor.yn2 = (P q. independent of n. we have T(n) <_ c. It’s simply that it wasn’t the "tightest" possible running time. q. Often when we analyze an algorithm--say we have just proven that its worst-case running time T(n) is O(n2)--we want to show that this upper bound is the best one possible.) expresses only an upper bound.36 Chapter 2 Basics of Algorithm Analysis 2. and then into some further number of steps depending on the particular architecture being used to do the computing.62n2 + 3. returning to the function T(n) = pn2 + qn + r. and say that it grows like n2. There are cases which meets what is required by the definition of f2 (. we want to express the growth rate of running times and other functions in a way that is insensitive to constant factors and loworder terms. since they characterize the worst-case performance of an algorithm . where p. Whereas establishing the upper bound involved "inflating" the terms in T(n) until it looked like a constant times n2. and since we also have n2 < n3.5n + 200 steps when we analyze it at a level that is closer to the actual hardware.) Thus. c cannot depend on n. This inequality is exactly what the definition of O(-) requires: T(n) < cn2. note that the constant ~ must be fixed. for sufficiently large n. now we need to do the opposite: we need to reduce the size of T(n) until it looks like a constant times n2.2 Asymptotic Order of Growth 37 an intermediate representation. and ® For all these reasons. f(n) happens to be n2.) Given another function f(n). By analogy with O(-) notation. is the conclusion we can draw from the fact that T(n) -=pn2 q. we want to express the notion that for arbitrarily large input sizes n. our analysis above shows that T(n) = pn2 q. it is correct to say that our function T(n) = pn2 + qn + r is S2 (n). we’d like to be able to take a running time like the one we discussed above. for example. For example. we can conclude that T(n) < (p + q + r)n~ as the definition of O(n~) requires. we say that T(n) is ®([(n)).62n2 + 3. for example. s2. q. We’d like to claim that any such function is O(n2). (We will assume that ’ all the functions we talk about _here take nonnegative values. Note that O(. In this case. it shows up in the analysis of running times as well.r is both O(n2) and f2 (n2).qn q. up to constant factors. in particular. it was a correct upper bound. In this case. for all n >_ O. people analyze the same algorithm more carefully. and r < rn2. f(n). the notion of a "step" may grow or shrink by a constant factor-for example.5n + 8. we say that T(n) is Off(n)) (read as "T(n) is order f(n)") if. let’s claim that T(n) = fl (n2). we say that [(n) is an asymptotically tight bound for T(n).5n2 + 87. just as we claimed that the function T(n)= pn2 + qn + r is O(n2). some years pass. T(n) is Off(n)) if there exist constants c > 0 and no >_ 0 so that for all n >_ no. 1. and r are positive constants. In other words. we will say that T is asymptotically upperbounded by f. we have T(n) > ~. Just as we discussed the notion of "tighter" and "weaker" upper bounds. the same issue arises for lower bounds. Indeed. It is important to note that this definition requires a constant c to exist that works for all n. To see why. This definition works just like 0(. where an algorithm has been proved to have running time O(n~). we will refer to T in this case as being asymptotically lowerbounded by f. rather than from above. we have T(n) = pn2 + qn + r > pn2. We will also sometimes write this as T(n) = Off(n)). the function T(n) is bounded above by a constant multiple of f(n).qn + r is ®(ha). and they show that in fact its running time is O(n2). For example. the function T(n) is at least a constant multiple of some specific function f(n). except that we are bounding the function T(n) from below. O. The fact that a function can have many upper bounds is not just a trick of the notation.). For example. (In this example. Again. where c =p + q + r. So we can write T(n) = pn~ + qn + r < pn2 + qn2 q. As an example of how this definition lets us express upper bounds on running times. There is a notation to express this: if a function T(n) is both O([(n)) and S2([(n)). Asymptotic Lower Bounds There is a complementary notation for lower bounds. There was nothing wrong with the first result. To do this. So the most we can safely say is that as we look at different levels of computational abstraction. Asymptotically tight bounds on worst-case running times are nice things to find. . [he worst-case running time of a certain algorithm on an input of size n. then our algorithm that took at most 1. since T(n) > pn2 > pn.) with ~ = p > 0.5n + 8 steps can also be viewed as taking 40. We now discuss a precise way to do this. This. So. Asymptotically Tight Bounds If we can show that a running time T(n) is both O(]’(n)) and also s2 ([(n)). we notice that for all n > 1. we say that T(n) is ~2 if(n)) (also written T(n) = S2 if(n))) if there exist constants ~ > 0 and no _> 0 so that for all n > n0.r)n2 for all n >_ 1. and r. f(n). More precisely.if it takes 25 low-level machine instructions to perform one operation in our high-level language. consider an algorithm whose running time (as in the earlier discussion) has the form T(n) = pn2 + qn + r for positive constants p. it’s also correct to say that it’s O(n3). we have qn <_ qn2. we just argued that T(n) <_ (p + q + r)n2.

We’re given that for some constants c and no. as re. but there is no example known on which the algorithm runs for more than f2 (n2) steps. and [(n) >_ ½cg(n) for all n >_ no. we have f(n) <_ cg(n) for all n >_ n0. then f = O(h). Since the overall running time is a sum of two functions (the running times of . [] Properties of Asymptotic Growth Rates Having seen the definitions of O.thenf=®(h). (b) If f = S2 (g) and g = ga (h). one can obtain such bounds by closing the gap between an upper bound and a lower bound. we have f(n) <_ Ch(n) for all n > no.. We have f(n) < cg(n) < cc’h(n).. if the ratio of functions f(n) and g(n) converges to a positive constant as n goes to infinity. it follows from the definition of a limit that there is some no beyond which the ratio is always between ½c and 2c. we know from part (b) that [ = It follows that [ = ® (h).).. . then it applies to their sum..2 Asymptotic Order of Growth 39 precisely up to constant factors. which is exactly what is required for showing that f + g = O(h). we omit the proof. We will use the fact that the limit exists and is positive to show that f(n) = O(g(n)) and f(n) = S2(g(n)). m There is a generalization of this to sums of a fixed constant number of functions k. Then f + g = O(h). Also. since [ = S2(g) and g = S2(h). and it is easy to show that one of the two parts is slower than the other. Thus we have shown (2. for some (potentially different) constants c’ and n~. ’ We have f(n) + g(n) <_ch(n) + c’h(n). And as the definition of ®(-) shows. It frequently happens that we’re analyzing an algorithm with two high-level parts. The result can be stated precisely as follows. Essentially. n~). Then fl + f2 +"" + fk = O(h). n~). we have f = O(h) and g = O(h). & and h be functions such that fi = O(h) for all i. which implies that f(n) = O(g(n)). and so f(n) < cc’h(n) for all n > max(no. it is useful to explore some of their basic properties. adapted to sums consisting of k terms rather than just two. First." This is implicitly an invitation to search for an asymptotically tight bound on the algorithm’s worst-case running time. and if g in turn is asymptotically upperbounded by a function h. we can obtain a similar result for asymptotically tight bounds. ’ So consider any number n that is at least as large as both no and no. if we have an asymptotic upper bound that applies to each of two functions f and g. Thus. Then since [ = O(g) and g = O(h). (2.4) that covers the following kind of situation. We’d like to be able to say that the running time of the whole algorithm is asymptotically comparable to the running time of the slow part. So consider any number n that is at least as large as both no and n~. for some (potentially different) constants c’ and no. then f = ~2 (h). we know from part (a) that [ = O(h). Sometimes one can also obtain an asymptotically tight bound directly by computing a limit as n goes to infinity. For (a). (2.3) !f/=O(g) andg=®(h). then Proof. then f is asymptotically upper-bounded by h.1) Let f and g be two functions that lim f(n___~) ..58 Chapter 2 Basics of Algorithm Analysis 2. where k may be larger than two. we have g(n) <_ c’h(n) for all n _> n~. since it is essenti!lly the same as the proof of (2. (z. f(n) = ®(g(n)). we’re given that for some constants c and n0.-->~ g(n) Combining parts (a) and (b) of (2. f(n) < 2cg(n) for all n >_ no. exists and is equal to some number c > O. Since lira f(n) n-+oo g(n) = c > 0. This latter inequality is exactly what is required for showing that f = O(h). Proof. For example.4).2) (a) !ff = O(g) and g = O(h). (2. and O. the proof of part (b) is very similar. A similar property holds for lower bounds. Transitivity A first property is transitivity: if a function f is asymptotically upper-bounded by a function g. We’ll prove part (a) of this claim. Sums of Functions It is also useful to have results that quantify the effect of adding two functions. and let fl. f2 . S2.quired by the definition of ®(. We write this more precisely as follows. Suppose we know that [ = ®(g) and that g = ®(h). Then f(n) = ®(g(n)). we have g(n) < c’h(n) for all n > no. sometimes you will read a (slightly informally phrased) sentence such as "An upper bound of O(n3) has been shown on the worst-case running time of the algorithm. Thus f(n) + g(n) <_ (c + c’)h(n) for all n _> max(no. Also. Proof.5) Let k be a fixed constant.2). which implies that [(n) = ~(g(n)).4) Suppose that f and g are two functions such that for some other function h. There is also a consequence of (2.

Clearly f + g = f2 (f). Using O(-) notation. We state this more formally in the following claim. ¯ One can also show that under the conditions of (2. for example.2 Asymptotic Order of Growth- 41 the two parts). Logarithms Recall that logo n is the number x such that bx = n. One can directly translate between logarithms of different bases using the following fundamental identity: loga n -logb n logo a This equation explains why you’ll often notice people writing bounds like O(!og n) without indicating the base of the logarithm. we will restrict our attention to polynomials for which the high-order term has a positive coefficient aa > O. logarithms. Proof. In other words.5) that f is O(na).40 Chapter 2 Basics of Algorithm Analysis 2.7) Let f be a polynomial of degree d. Asymptotic Bounds for Some Common Functions There are a number of functions that come up repeatedly in the analysis of algorithms. Then f = O(nd). We write f = ao + aln + a2n2 ÷ " " " ÷ aana. each of which is O(na). in which the coefficient aa is positive. (2. Since f is a sum of a constant number of functions. First. we need to show that f + g = O(f). To take another common kind of example. it is one less than the number of digits in the base-b representation of the number n. and hence n log n < n2 for all n > 1. a number of algorithms have running times of the form O(nx) for some number x that is not an integer. constant number of functions: the most rapidly growing among the functions is an asymptotically tight bound for the sum. even for (noninteger) values of x arbitrary close to 0. we will see many algorithms whose running times have the form O(n log n). where the final coefficient aa is nonzero. So to complete the proof. One way to get an approximate sense of how fast logb n grows is to note that. To begin with. but in any case we have ajnJ <_ lajlna for all n > 1. the function logo n is asymptotically bounded by every function of the form nx. we have f = f2 (ha). In particular. and hence it follows that in fact f = ® (ha).5).4): we’re given the fact that g = O(f). The upper bound is a direct application of (2. if an algorithm has nmning time O(n log n). we have logo n = O(nX). results on asymptotic bounds for sums of functions are directly relevant. 1 + log2 n. Since we are concerned here only with functions that take nonnegative values. is the number of bits needed to represent n. notice that coefficients aj forj < d may be negative. it follows from (2. we will also see exponents less than 1. But this is a direct consequence of (2. it’s easy to formally define polynomial time: apolynomiaI-time algorithm is one whose running time T(n) is O(nd) for some constant d. Then [ + g = ®([). This is a good point at which to discuss the relationship between these types of asymptotic bounds and the notion of polynomial time. rounded down. In other words. where aa > 0. Thus each term in the polynomial is O(na).~) we have f + g = O(f). (Thus. since for all n >~ 0. For example. (2. in Chapter 5 we will see an algorithm whose running time is O(n1"59). log n < n for all n > 1. so by (2. For example. But it’s important to realize that an algorithm can be polynomial time even if its running time is not written as n raised to some integer power. and exponentials. Proof. (2. as in bounds like ®(J-K) = O(nl/2). Such algorithms are also polynomial time: as we will see next. for every base b. m This result also extends to the sum of any fixed. [ is an asymptotically tight bound for the combined function [ + g. so the point is that loga n = ® (logo n). Polynomials Recall that a polynomial is-a function that can be written in the form f(n) = at + aln + a2n2 +" ¯ ¯ + aana for some integer constant d > 0. if we round it down to the nearest integer. where d is independent of the input size. we have f(n) + g(n) ~ f(n). which we arrived at in the previous section as a way to formalize the more elusive concept of efficiency. A basic fact about polynomials is that their asymptotic rate of growth is determined by their "high-order term"--the one that determines the degree. and also f = O(f) holds for any function.6) Suppose that [ and g are two functions (taking nonnegative values) such that g = Off).7).) So logarithms are very slowly growing functions. This is not sloppy usage: the identity above says that loga n =~1 ¯ !ogb n.8) For every b > I and every x > O. then it also has running time O(n2). and the base of the logarithm is not important when writing bounds using asymptotic notation¯ . This value d is called the degree of the polynomial. and it is useful to consider the asymptotic properties of some of the most basic of these: polynomials. and so it is a polynomial-time algorithm. So algorithms with running-time bounds like O(n2) and O(n3) are polynomial-time algorithms. the functions of the form pn2 + qn + r (with p ~ 0) that we considered earlier are polynomials of degree 2.

Arrays and Lists To start our discussion we wi!l focus on a single list. and who is matched with whom. So asymptotically speaking. this may involve preprocessing the input to convert it from its given input representation into a data structure that is more appropriate for the problem being solved. and it has the following properties. In order to asymptotically analyze the running time of . each man and each woman has a ranking of all members of the opposite gender. In particular. logarithms. Occasionally there’s more going on with an exponential algorithm than first appears. Further. This is not entirely fair. data structures will be covered in the context of implementing specific algorithms. for example. whether it is equal to A[i] for some i).42 Chapter 2 Basics of Algorithm Analysis 2. by a direct access to the value A[i]. (2. for example--one doesn’t have to actually program. compile. and have A[i] be the ith element of the list. In order to implement the algorithm. every exponential grows faster thari every polynomial. for each algorithm we will choose data structures that make it efficient and easy to implement. which is iustified by ignoring constant factors. exponentials raise a fixed number to n as a power. and exponentials serve as useful landmarks in the range of possible functions that you encounter when analyzing running times. we will only need to use two of the simplest data structures: lists and arrays.9) For every r > 1 and every d > O. exponential functions are all different. One way to summarize the relationship between polynomials and exponentials is as follows. Here we will be concerned with the case in which r > !. we consider an implementation of the Gale-Shapley Stable Matching algorithm. such as the list of women in order of preference by a single man. Taken together. Maybe the simplest way to keep a list of rt elements is to use an array A of length n.3 Implementing the Stable Matching Algorithm Using Lists and Arrays Exponentials Exponential functions are functions of the form f(n) = rn for some constant base r. and all exponentials grow so fast that we can effectively dismiss this algorithm without working out flLrther details of the exact running time. this generic use of the term "exponential" is somewhat sloppy. then. when you plug in actual values of rt. it’s usually clear what people intend when they inexactly write "The running time of this algorithm is exponential"--they typically mean that the running time grows at least as fast as some exponential function. The very first question we need to discuss is how such a ranking wil! be represented. where polynomials raise rt to a fixed exponent. polynomials. we have na = O(rn). To get such a bound for the Stable Matching algorithm. Indeed.3 Implementing the Stable Matching Algorithm IJsing Lists and Arrays We’ve now seen a general approach for expressing bounds on the running time of an algorithm. The implementation of basic algorithms using data structures is something that you probably have had some experience with. Thus. In particular. this leads to much faster rates of growth. the expression (r/s)n is. In the Stable Matching Problem. An important issue to note here is that the choice of data structure is up to the algorithm designer. And as we saw in Table 2. we showed earlier that the algorithm terminates in at most rt2 iterations.e. and so it cannot possibly remain bounded by a fixed constant c. To get this process started. for different bases r > s > 1. it’s a reasonable rule of thumb. it is never the case that rn = ® (sn). Since r > s." without specifying which exponential function they have in mind. Just as people write O(log rt) without specifying the base. our implementation also provides a good chance to review the use of these basic data structures as well. but as we argued in the first section of this chapter.. the differences in growth rates are really quite impressive. and polynomials grow more slowly than exponentials. you’ll also see people write "The running time of this algorithm is exponential. which results in a very fast-growing function. and our implementation here provides a corresponding worst-case running time of O(n2). In some cases. Such an array is simple to implement in essentially all standard programming languages. We can answer a query of the form "What is the ith element on the list?" in O(1) time. an algorithm expressed in a high-level fashion--as we expressed the GaleShapley Stable Matching algorithm in Chapter 1. Still. as we’!l see. in Chapter 10.1. but one does have to think about how the data will be represented and manipulated in an implementation of the algorithm. Logarithms grow more slowly than polynomials. In this book. this would require that for some constant c > 0. But rearranging this inequality would give (r/s)n < c for all sufficiently large ft. Unlike the liberal use of log n. we need to check the 2. counting actual computational steps rather than simply the total number of iterations. we need to decide which data structures we will use for all these things. we would have rn _< csn for all sufficiently large ft. If we want to determine whether a particular element e belongs to the list (i. and so we will encounter different data structures based on the needs of the algorithms we are developing. and execute it. In particular. the algorithm maintains a matching and will need to know at each step which men and women are free. tending to infinity with rt. so as to bound the number of computational steps it takes.

By starting at First and repeatedly following pointers to the next element until we reach null. assuming we don’t know anything about the order in which the elements appear in A. To insert element e between elements d and f in a list. referenced by e. Given the relative advantages and disadvantages of arrays and lists.Prev that contains a pointer to the previous element in the list. To ensure this. we set this pointer to nail if i is the last element. we "splice it in" by updating d. we can thus traverse the entire contents of the list in time proportional tO its length. and often preferable. This . n}. but we will have more to say about it in the next section. it is easy to convert between the array and list representations in O(n) time. An alternate. if we actually want to implement the G-S algorithm so that it runs in time proportional to n2.. and the Next and Prey pointers of e to point to d and f. A schematic illustration of part of such a list is shown in the first line of Figure 2.) We also include a pointer Last. referenced by e. and this provides a type of upper bound on the running time. we have to follow the Next pointers starting from the beginning of the list. we will not need to use binary search for any part of our stable matching implementation. alphabetically). since men go from being flee to engaged. A generic way to implement such a linked list. A doubly linked list can be modified as follows. o Deletion. such as the fist of flee men in the Stable Matching algorithm. Unlike arrays. a list of flee men needs to grow and shrink during the execution of the algorithm. If the array elements are sorted in some clear way (either numerically or alphabetically). we need to maintain a pointer to the next element. which takes a total of O(i) time.Igext. way to maintain such a dynamic set of elements is via a linked list. To delete the element e from a doubly linked list.1. such preprocessing is often useful. Such a record would contain a field e.Prev. we can order the men and women (say. Thus. o Insertion. and associate number i with the ith man mi or ith women wi in this order. It is generally cumbersome to frequently add or delete elements to a list that is maintained as an array. Inserting or deleting e at the beginning of the list involves updating the First pointer.1 A schematic representation of a doubly linked fist. rather than updating the record of the element before e. we need to be able to implement each iteration in constant time. While lists are good for maintaining a dynamically changing set. is to allocate a record e for each element that we want to include in the list. and potentially back again.val that contains the value of the element. We have already shown that the algorithm terminates in at most n2 iterations. In a linked list. essentially the reverse of deletion.3 Implementing the Stable Matching Algorithm Using Lisis and Arrays 45 elements one by one in O(n) time. which is traversable in both directions.. they also have disadvantages. However. it may happen that we receive the input to a problem in one of the two formats . For simplicity. respectively. and the next element.. and in this case. for each element v on the list. and indeed one can see this operation at work by reading Figure 2. This operation is Before deleting e: Element e After deleting e: Element e Figure 2. This allows us to freely choose the data structure that suits the algorithm better and not be constrained by the way the information is given as input.1.1 from bottom to top. and a field e.and want to convert it into the other. (e. then we can determine whether an element e belongs to the list in O(log n) time using binary search..Chapter 2 Basics of Algorithm Analysis 2. point directly to each other. showing the deletion of an element e.Igext and/. by also having a field e. As discussed earlier. the elements are sequenced together by having each element point to the next in the list.Prey to point to e.Prev = null if e is the first element. when the set of possible elements may not be fixed in advance. An array is less good for dynamically maintaining a list of elements that changes over time. we cannot find the ith element of the list in 0(1) time: to find the ith element. that points to the last element in the list.Next that contains a pointer to the next element in the list. analogous to First. We also have a pointer First that points to the first element. We can create a doubly linked list. Implementing the Stable Matching Algorithm Next we will use arrays and linked lists to implement the Stable Matching algorithm from Chapter 1. We discuss how to do this now. we can just "splice it out" by having the previous element. The deletion operation is illustrated in Figure 2. assume that the set of men and women are both {1 .

In approaching a new problem. we increment the value of Next[m] by one. and hence we have everything we need to obtain the desired running time. Keeping the women’s preferences in an array IqomanPref. at the start of the algorithm. to be able to identify the highest-ranked woman to whom he has not yet proposed. For a woman w and two men m and m’. ¯ 4. i] to denote the ith woman on man m’s preference hst. we take the first man m on this list. and the other on the size of the problem’s natural search space (and hence on the running time of a brute-force algorithm for the problem). Note that the amount of space needed to give the preferences for all 2rt individuals is O(rt2).46 Chapter 2 Basics of Algorithm Analysis 2. consider selecting a free man. We set Current [w] to a special null symbol when we need to indicate that woman w is not currently engaged. 2. we can do a lot better if we build an auxiliary data structure at the beginning. where Ranking[w. Consider a step of the algorithm. consider a man m. If a man m needs to propose to a woman. We need to identify the highest-ranked woman to whom he has not yet proposed. First. Essentially. and her current partner is rn’ =Current[w]. we create an n. We initialize Next [m] = 1 for al! men m. Learning to recognize these common styles of analysis is a long-term goal. it helps to have a rough sense of the "landscape" of different running times.4 A Survey of Common Running Times 47 assumption (or notation) allows us to define an array indexed by all men or all women. there are styles of analysis that recur frequently. for a man m. as we would need to walk through w’s list one by one. one for women’s preference lists and one for the men’s preference lists. We delete m from the list if he becomes engaged. Now assume man m proposes to woman w. m] contains the rank of man m in the sorted order of w’s preferences. We will do this b_y maintaining the set of flee men as a linked list. 2.4 A Survey of Common Running Times When trying to analyze a new algorithm. Assume w is already engaged. as each person has a list of length n. To do this we will haye two arrays. it often helps to think about two kinds of bounds: one on the running time you hope to achieve. we offer the following survey of common running-time bounds and some of the typical . In this case. where Current[w] is the woman w’s current partner m’. At the start of the algorithm.10) The data structures described above allow us to implernentthe G-S algorithm in O(n2) time. we need to identify her current partner. again in constant time. While O(rt) is still polynomial. To sum up. i] to be the ith man on the preference list of woman w. By a single pass through w’s preference list. and so when one sees running-time bounds like O(n). analogous to the one we used for men. O(n log n). when man m proposes to a woman w. m’ can be inserted at the front of the list. We need to be able to identify a free man. and possibly insert a different man rn’. (2. again in constant time. which of m or m’ is preferred by w. we need to be able to ~denfify the man m’ that w is engaged to (if there is such a man). does not work. Current[w] is initialized to this null symbol for all women w. we simply compare the values Ranking[w.Next [re]I. We would like to decide in O(1) time if woman w prefers rn or rn’. and once he prdposes to w. x n array Ranking. rrt’]. We can do this by maintaining an array Current of length n. We need to consider each step of the algorithm and understand what data structure allows us to implement it efficiently. it’s often for one of a very small number of distinct reasons. he’ll propose to w = ManPref[m. regardless of whether or not w accepts the proposal. 1. and if she is. to decide which of m or m’ is preferred by w. and O(n2) appearing over and over. When we need to select a flee man. We need. Earlier we discussed the notion that most problems have a natural "search space"--the set of all possible solutions--and we noted that a unifying theme in algorithm design is the search for algorithms whose performance is more efficient than a brute-force enumeration of this search space. we need to be able to decide. taking O(n) time to find m and rn’ on the list. The discussion of running times in this section will begin in rhany cases with an analysis of the brute-force algorithm. This allows us to execute step (4) in constant time. rrt] and Ranking[w. then. the data structures we have set up thus far can implement the operations (1)-(3) in O(1) time each. we need to decide if w is currently engaged. if some other man m’ becomes free. Maybe the trickiest question is how to maintain women’s preferences to keep step (4) efficient.approaches that lead to them. for a total initial time investment proportional to rt2. 3. we can create this array in linear time for each woman. For a woman w. Next. since it is a useful . Indeed. To get things under way. Then. and similarly WomanPref [w. we will use ManPref Ira. To do this we will need to maintain an extra array Next that indicates for each man m the position of the next woman he wil! propose to on his list. We need to have a preference list for each man and for each woman. we need to be able to do each of four things in constant time.

we’d like to make use of the existing order in the input. an algorithm has a running time of O(n). you know that the smaller of these two should go first on the output pile. place it on the output. Suppose the numbers are provided as input in either a list or an array. We’d like to See Figure 2. To illustrate some of the ideas here. 16. or linear. and now iterate on what’s left. 4. bn: Maintain a Cu~ent pointer into each list. 25 results in the output 2. have developed to study this model of computation. we do constant work per element. 9. append the remainder of the other list to the output For i= 2 to n If ai> max then set max---. If you look at the top card on each stack. Linear Time An algorithm that runs in O(n). In other words. time has a very natural property: its running time is at most a constant factor times the size of the input. but it can only perform a constant amount of computational work on each packet.4 A Survey of Common Running Times 49 way to get one’s bearings with respect to a problem. 19. and it can’t save the stream so as to make subsequent scans through it. We process the numbers an in order. Sometimes the constraints of an application force this kind of one-pass algorithm on you--for example. we check whether ai is larger than our current estimate. 9. 11. and each is already arranged in ascending order. One basic way to get an algorithm with this running time is to process the input in a single pass. the task of improving on such algorithms will be our goal in most of the book. initialized to point to the front elements While both lists are nonempty: Let a~ and ~ be the elements pointed to by the Cu~ent pointer Append the smaller of these two to the output list Advance the Cu~ent pointer in the list from which the smaller element was selected EndWhile Once one list is empty. but still has a linear running time. an and bn.ai Endif Endfor In this way. For example. One way to think about designing a better algorithm is to imagine performing the merging of the two lists by hand: suppose you’re given two piles of numbered cards. but the reason is more complex. ignore the fact that they’re separately arranged in ascending order. and run a sorting algorithm. Other algorithms achieve a linear time bound for more subtle reasons. and it can try computing anything it wants to as this stream passes by. 16. We now describe an algorithm for merging two sorted lists that stretches the one-pass style of design just a little. we have the following algorithm. 25. and you’d like to produce a single ordered pile containing all the cards. IaAppend the smaller of~ and bj to the output. each arranged in ascending order. online algorithms and data stream algorithms. keeping a running estimate of the maximum as we go. 3.2 To merge sorted lists A and B.) A B Figure 2. an algorithm running on a high-speed switch on the Internet may see a stream of packets flying past it. 3. merging the lists 2. But this clearly seems wasteful. Two different subareas of algorithms. and if so we update the estimate to qn that is also arranged in ascending order. can be performed in the basic "one-pass" style. we c6nsider two simple lineartime algorithms as examples.48 Chapter 2 Basics of Algorithm Analysis 2. To do this. for a total running time of O(n). 19 and 4. Each time we encounter a number ai. Merging Two Sorted Lists Often. . we could just throw the two lists together. so you could remove this card.2 for a picture of this process. we repeatedly extract the smaller item from the front of the two lists and append it to the ou~ut. Computing the Maxirrturrt Computing the maximum of n numbers. 11. for example. spending a constant amount of time on each item of input encountered.

xn and then process them in sorted order.. the number of pairs is O(n2) because we multiply the number of ways of choosing the first member of the pair (at most n) by the number of ways of choosing the second member of the pair (also at most n). suppose we are given a set of n time-stamps xl. More crudely. it is added to the output and never seen again by the algorithm.. The number b1 at the front of list B will sit at the front of the list for iterations while elements from A are repeatedly being selected. xn on which copies of a file arrived at a server.. 2n . the Mergesort algorithm divides the set of input numbers into two equal-sized pieces. enumerate all pairs of points.. We have just seen that the merging can be done in linear time. For example. An element can be charged only once. and then combines the two solutions in linear time. y) coordinates. and the cost of each iteration is accounted for by a charge to some element. it is compared with each element in the other list). each specified by (x." But it is actually not true that we do only constant work per element. and consider the lists A = 1. n + 2. so the total running time is O(n). In other words.. The distance between points (xi. ( O(rt log n) Time O(n log n) is also a very common running time. yi) For each other input point (~. it is true that each element can be involved in at most O(n) comparisons (at worst.1 and B = n. and then it spends constant work on each number in ascending order.. one is tempted to describe an argument like what worked for the maximum-finding algorithm: "We do constant work per element.yj)2 in constant time. 5 . One also frequently encounters O(n log n) as a running time simply because there are many algorithms whose most expensive step is to sort the input. x2 . Quadratic Time Here’s a basic problem: suppose you are given n points in the plane. yj) can be computed by the formula (x~ . compute the distance between each pair. 3. The brute-force algorithm for finding the closest pair of points can be written in an equivalent way with two nested loops: For each input point (xi. n + 4 . sorts each half recursively. so there can be at most 2n iterations. What is the running time of this algorithm? The number of pairs of points is (~)_ n(n-1)2 .2. Each iteration involves a constant amount of work.x/)2 + (y~ . This example illustrates a very common way in which a rtmning time of O(n2) arises: performing a search over all pairs of input items and spending constant time per pair.. and then merges the two sorted halves into a single sorted output list.. Sorting is perhaps the most well-known example of a problem that can be solved this way.50 Chapter 2 Basics of Algorithm Analysis 2. ~) Compute distance d = J(xi . This is a correct boflnd. for a total running time of O(n)..~)2 + ¥ . Quadratic time also arises naturally from a pair of nested loops: An algorithm consists of a !oop with O(n) iterations. and we’d like to find the largest interval of time between the first and last of these time-stamps during which no copy of the file arrived. as desired. Suppose we charge the cost of each iteration to the element that is selected and added to the output list. and hence it will be involved in f2 (n) comparisons. solves each piece recursively. and since this quantity is bounded by ½n2. The largest of these gaps is the desired subinterval. While this merging algorithm iterated through its input lists in order. x2 . Multiplying these two factors of n together gives the running time.4 A Survey of Common Running Times 51 Now. the "interleaved" way in which it processed the lists necessitated a slightly subtle running-time analysis. since at the moment it is first charged. The natural brute-force algorithm for this problem would... but the order in which they process the nodes and edges depends on the structure of the graph.. it is O(n2). But there are only 2n elements total. The better way to argue is to bound the number of iterations of the While loop by an "accounting" scheme... and each iteration of the loop launches an internal loop that takes O(n) time. and in Chapter 5 we will see one of the main reasons for its prevalence: it is the running time of any algorithm that splits its input into two equa!-sized pieces. Specifically. so the overall running time is O(rt2). and Chapter 5 will discuss how to analyze the recursion so as to get a bound of O(n log n) on the overall running time. to show a linear-time bound. and if we sum this over all elements we get a running-time bound of O(n2). Suppose that n is an even number. A simple solution to this problem is to first sort the time-stamps x~. but we can show something much stronger. Note that this algorithm requires O(rt log n) time to sort the numbers. 3n . Now.. and you’d like to find the pair of points that are closest together. the remainder of the algorithm after sorting follows the basic recipe for linear time that we discussed earlier. In Chapter 3 we will see linear-time algorithms for graphs that have an even more complex flow of control: they spend a constant amount of time on each node and edge in the underlying graph.. determining the sizes of the gaps between each number and its successor in ascending order. and then choose the pair for which this distance is smallest. yi) and (xj.

That is. It’s important to notice that the algorithm we’ve been discussing for the Closest-Pair Problem really is just the brute-force approach: the natural search space for this problem has size O(n2). For each subset S of k nodes Check whether S constitutes an independent set If S is an independent set then Stop and declare success Endif End/or If no k-node independent set was fotmd then Declare failure Endif This is a concrete algorithm. The following is a direct way to approach the problem. there are algorithms that improve on O(n3) running time. Cubic Time More elaborate sets of nested loops often lead to algorithms that run in O(n3) time. don’t we?--but in fact this is an illusion. we need to consider two quantities. that for some fixed constant k. and _we’re simply enumerating it. Suppose. Looping over the sets S] involves O(n) iterations around this innermos~ loop. has no elements in common.4 A Survey of Common Running Times 53 If d is less th~n the current minimum.Chapter 2 Basics of Algorithm Analysis 2. for example. For pair of sets Si snd S] Determine whether Si ~ud S] have ~u element in common End/or Consider. What is the running time needed to solve this problem? Let’s suppose that each set Si is represented in such a way that the elements of Si can be listed in constant time per element. At first. which we discussed in Chapter 1. one feels there is a certain inevitability about thi~ quadratic algorithm-we have to measure all the distances. Recall that a set of nodes is independent if no two are joined by an edge. First. For this problem. but they are quite complicated. and in Chapter 13 we show how randomization can be used to reduce the running time to O(n). the following problem. In Chapter 5 we describe a very clever algorithm that finds the closest pair of points in the plane in only O(n log n) time.1)(n . (n. Consider. and we can also check in constanttime whether a given number p belongs to Si. over (xj. we would like to know if a given n-node input graph G has an independent set of size k. update minimum to d End/or End/or Report that S~ and Sj are disjoint Endif End/or End/or Each of the sets has maximum size O(n).. Note how the "inner" loop. has O(n) iterations. The natural brute-force aigorithm for this problem would enumerate all subsets of k nodes. in particular. Multiplying these three factors of n together. each invoking the inner loop once. each taking constant time. the total number of k-element subsets in an n-element set is nk) n(n. we get the running time of O(n3). the problem of finding independent sets in a graph. and looping over the sets Si involves O(n) iterations around this. For each set Si For each other set S] For each element p of St Determine whether p also belongs to Sj End/or If no element of S~ belongs to Sj then To understand the running time of this algorithm. so the innermost loop takes time O(n). has O(n) iterations. yi). and for each subset S it would check whether there is an edge joining any two members of S. and we would like to know whether some pair of these sets is disjoint--in other words. We are given sets n}.k+ 1) nk . for example. but to reason about its running time it helps to open it up (at least conceptually) into three nested loops. we obtain a running time of O(nk) for any constant k when we search over all subsets of size k. and the "outer" loop. over (xi. it is not clear whether the improved algorithms for this problem are practical on inputs of reasonable size.. yj).2). O(nk) Time In the same way that we obtained a running time of O(n2) by performing bruteforce search over all pairs formed from a set of n items. Furthermore.

Search spaces of size n! tend to arise for one of two reasons. and in the other you aren’t. we will see (in Chapter 4) how to find an optimal solution in O(n log n) time. Independent Set is a principal example of a problem believed to be computationally hard. it is the number of possible perfect matchings of n men with n women in an instance of the Stable Matching Problem. whether there is an edge joining them. and for many of them we wil! be able to find highly efficient polynomialtime algorithms. a brute-force search algorithm for the Interval Scheduling Problem that we saw in Chapter 1 would look very similar to the algorithm above: try all subsets of intervals. we will see a similar phenomenon for the Bipartite Matching Problem we discussed earlier. like we saw earlier in the discussion of quadratic time. Multiplying all these choices out. and in particular it is believed that no algorithm to find k-node independent sets in arbitrary graphs can avoid having some dependence on k in the exponent. and so forth. In the case of Independent Set. n! is the number of ways to match up n items with n other items--for example. Hence this is a search over pairs. The definition of an independent set tells us that we need to check. However. something at least nearly this inefficient appears to be necessary. that we are given a graph and want to find an independent set of maximum size (rather than testing for the existence of one with a given number of nodes). what is the shortest tour that visits all cities? We assume that the salesman starts and ends at the first city.54 Chapter 2 Basics of Algorithm Analysis 2. so the crux of the problem is the . Again. that is. Suppose. and so the outer loop in this algorithm will run for 2n iterations as it tries all these subsets. with distances between all pairs. For example. The function n! also arises in problems where the search space consists of all ways to arrange n items in order. except that now we are iterating over all subsets of the graph.5 Since we are treating k as a constant. if there are n nodes on each side of the given bipartite graph. note that there are n choices for how we can match up the first man. pairs and spending constant time on each. even once we’ve conceded that brute-force search over kelement subsets is necessary. Thus. we were able to solve the Stable Matching Problem in O(n2) iterations of the proposal algorithm. for example. For each subset S of nodes Check whether S constitutes ~n independent set If g is a larger independent set than the largest seen so far then ~ecord the size of S as the current maximum Endif Endfor This is very much like the brute-force algorithm for k-node independent sets. A basic problem in this genre is the Traveling Salesman Problem: given a set of n cities. so each iteration of the !oop takes at most O(n2) time. by a fairly subtle search algorithm. there are n . But in the case of the Interval Scheduling Problem. there can be different ways of going about this that lead to significant differences in the efficiency of the computation. In Chapter 7. This is a recurring kind of dichotomy in the study of algorithms: two algorithms can have very similar-looking search spaces. Thus the total running time is O(k2n~). there can be up to n! ways of pairing them up. Inside this loop. but it’s important to ke~p in mind that 2n is the size of the search space for many problems. having eliminated this option.4 A Survey of Common Running Times 5. and find the largest subset that has no overlaps. Thus see that 2n arises naturally as a running time for a search algorithm that must consider all subsets.¯ (2)(1) = n! Despite this enormous set of possible solutions. we are checking all pairs from a set S that can be as large as n nodes. To see this. the outer loop in the algorithm above will run for O(n~) iterations as it tries all k-node subsets of the n nodes of the graph. there are n . people don’t know of algorithms that improve significantly on brute-force search. Beyond Polynomial Time The previous example of the Independent Set Problem starts us rapidly down the path toward running times that grow faster than any polynomial. we get-a rulming time of O(n22n). two kinds of bounds that coine up very frequently are 2n and and we now discuss why this is so. Multiplying these two together. o(k2). this quantity is O(nk). Since we are treating k as a constant here. we can write this running time as O(nk). which in this case would look as fol!ows.1 choices for how we can match up the second man.2) -. we get n(n . but in one case you’re able to bypass the brute-force search algorithm. The function n! grows even more rapidly than 2n. so it’s even more menacing as a bound on the performance of an algorithm. as we will discuss in Chapter 10 in the context of a related problem. it requires looking at (~2).1)(n . In particular. The total number of subsets of an n-element set is 2n. we will be able to find the largest bipartite matching in O(n3) time. for each pair of nodes. we need to test whether a given set S of k nodes constitutes an independent set. First. and since constants can be dropped in O(-) notation. However.2 choices for how we can match up the third man. as opposed to the Independent Set Problem. having eliminated these two options. Inside the loop.

O(log n) arises as a time bohnd whenever we’re dealing with an algorithm that does a constant amount of work in order to throw away a constant fraction of the input.5 A More Complex Data Structure: Priority Queues Our primary goal in this book was expressed at the outset of the chapter: we seek algorithms that improve qualitatively on brute-force search. we will see that Traveling Salesman is another problem that. we want to take the one with highest priority. and we’re shrinking the size of this region by a factor of two with every probe. If q > p. A motivating application for priority queues. how long will it take for the size of the active region-to be reduced to a constant? We need k to be large enough so that (½)k = O(1/n). at which point the recursion bottoms out and we can search the remainder of the array directly in constant time. at which point the problem can generally be solved directly. where each element v ~ S has an associated value key(v) that denotes the priority of element v. achieving a polynomial-time solution to a nontrivial problem is not something that depends on fine-grained implementation details. and each time we need to select an element from S. Priority queues will be useful when we describe how to implement some of the graph algorithms developed later in the book. Since it takes linear time just to read the input. So how large is the "active" region of A after k probes? It starts at size n. however. it is a useful illustration of the analysis of a data structure that. A priority queue is designed for applications in which elements have a priority value. belongs to the class of NPcomplete problems and is believed to have no efficient solution. For our purposes here. then in order for p to belong to the array A. leading to a search space of size (n. because of this successive shrinking of the search region. Our implementation of priority queues will also support some additional operations that we summarize at the end of the section. The point is that in each step.1 cities. 2. is the problem of managing . In Chapter 8. In particular. Some complex data structures are essentially tailored for use in a single kind of algorithm.5 A More Complex Data Structure: Priority Queues 57 implicit search over all orders of the remaining n . We could do this by reading the entire array. and in general we use polynomial-time solvability as the concrete formulation of this. there are cases where one encounters running times that are asymptotically smaller than linear. If q = p. rather. the size of the active region has been reduced to a constant. when k = log2 n. and the goal is to minimize the amount of querying that must be done. ~ The Problem In the implementation of the Stable Matching algorithm in Section 2. smaller keys represent higher priorities. and to do this we can choose k = log2 n. In such situations. Priority queues support the addition and deletion of elements from the set. and one that is useful to keep in mind when considering their general function. we describe one of the most broadly useful sophisticated data structures. and we want to be able to select an element from S when the algorithm calls for it. like Independent Set. unlike lists and arrays. by carefully probing particular entries. we’d like to determine whether a given number p belongs to the array. these situations tend to arise in a model of computation where the input can be "queried" indirectly rather than read completely. and sometimes by using more complex data structures. or key. Typically. then we apply the analogous reasoning and recursively search in the upper half of A. we’re done. So the running time of binary search is O(log n). we probe the middle entry of A and get its value--say it is q--and we compare q to p. there’s a region of A where p might possibly be. The crucial fact is that O(log n) such iterations suffice to shrink the input down to constant size. Finally. so we ignore the upper half of A from now on and recursively apply this search in the lower half. while others are more generally applicable. we want to be able to add elements to and delete elements from the set S. Perhaps the best-known example of this is the binary search algorithm. taking advantage of the fact that the array is sorted.1)!. it is often possible to achieve further improvements in running time by being careful with the implementation details. Sublinear Time Finally.3. In this section. but we’d like to do it much more efficiently. Given a sorted array A of n numbers. In general. A priority queue is a data structure that maintains a set of elements S. the difference between exponential and polynomial is based on overcoming higher-level obstacles. ff q < p. we discussed the need to maintain a dynamically changing set S (such as the set of all free men in that case). the priority queue.56 Chapter 2 Basics of Algorithm Analysis 2. so after k probes it has size at most (½)kn. and also the selection of the element with smallest key. Once one has an efficient algorithm to solve a problem. Thus. it must lie in the lower half of A. Given this. must perform some nontrivial processing each time it is invoked.

The heap data structure combines the benefits of a sorted array and list for purposes of this application. We should note that the situation is a bit more complicated than this: implementations of priority queues more sophisticated than the one we present here can improve the running time needed for certain operations.. at least one of the operations can take up to O(n) time--much more than the O(log n) per operation that we’re hoping for. Scheduling the highest-priority process Corresponds to selecting the element with minimum key from the priority queue. H[1] is the root. we should consider what happens with some simpler. we would have to move all later elements one position to the right. The keys in such a binary tree are said to be in heap order if the key of any element is at least as large as the key of the element at its parent node in the txee. we can use binary search to find the array position where s should be inserted in O(log n) time. finding the minimum is quick--we just consult the M±n pointer--but after removing this minimum element. but the doubly linked list would not support binary search. We can maintain the set of processes in a priority queue. but to insert s in the array. Proof. we think of a heap as a balanced binary tree as shown on the left of Figure 2.. in a comparison-based model of computation (when each operation accesses the input only by comparing a pair of numbers). and three pointers pointing to the two children and the parent of the heap node. We can avoid using pointers. this way. with a priority queue that can perform insertion and the extraction of minima in O(log n) per operation. Before we discuss the structure of heaps. (2. In Figure 2. We can use poiriters: each node at the heap could keep the element it stores. and each node can have up to two children. Rather. We will think of the heap nodes as corresponding to the positions in this array. This is where heaps come in. according to their priority values.. N.3 the numbers in the nodes are the keys of the corresponding elements. Thus.3. Conceptually. we have a current set of active processes. or urgency. and the element with minimum key selected. In other words. It is known that. Before discussing the implementation. of a priority queue. we need to update the ~±n pointer to be ready for the next operation. Each process has a priority. if a bound N is known in advance on the total number of elements that will ever be in the heap at any one time. or a linked list? Suppose we want to add s with key value key(s). The Definition of a Heap So in all these simple approaches. This makes it easy to extract the element with smallest key. and insert each number into H with its value as a key. But (2. but processes do not arrive in order of their priorities. a left and a right child. and we want to be able to extract the one with the currently highest priority and run it.11) highlights a sense in which O(log n) time per operation is the best we can hope for. but now how do we add a new element to our set? Should we have the elements in an array. How efficiently do we hope to be able to execute the operations in a priority queue? We will show how to implement a priority queue containing at most n elements at any time so that elements can be added and deleted. This complication suggests that we should perhaps maintain the elements in the sorted order of the keys. the element w at i’s parent satisfies key(w) < key(v).58 Chapter 2 Basics of Algorithm Analysis 2.11) A sequence of O(n) priority queue operations can be used to sort a set of n numbers. the time needed to sort must be at least proportional to n log n. at a node i. and hence we may need up to O(n) time to find the position where s should be inserted. This makes adding new elements easy.5 A More Complex Data Structure: Priority Queues 59 real-time events such as the scheduling of processes on a computer. (2. we need to consider what data structure should be used to represent it. The tree will have a root. we could insert it in O(1) time into any position. This would take O(n) time. however. the numbers will come out of the priority queue in sorted order. Such heaps can be maintained in an array H indexed by i = 1 . we can sort n numbers in O(n log n) time. Specifically. and separately have a pointer labeled M±n to the one with minimum key. Before we discuss how to work with a heap. On the other hand. in O(log n) time per operation. more natural approaches to implementing the flmctions .. its key. Heap order: For every element v. If the set S is maintained as a sorted array. and add extra functionality. we will also be inserting new processes as they arrive. Set up a priority queue H. but extraction of the minimum hard. and for any node A Data Structure for Implementing a Priority Queue We will use a data structure called a heap to implement a priority queue. so. We could just have the elements in a list. with the key of a process representing its priority value. if we maintain the set as a sorted doubly linked list.11) shows that any sequence of priority queue operations that results in the sorting of n numbers must take time at least proportional to n log n in total. and this would require a scan of all elements in O(n) time to find the new minimum. let us point out a very basic application of priority queues that highlights why O(log n) time per operation is essentially the "right" bound to aim for. Then extract the smallest number one by one until all numbers have been extracted. concurrent with this.

but raising it to cz would fix the problem. j ) Endif Endif To see why Heapify-up wOrks. Unfortunately. .u p process is movingI element v toward the root. Letj = parent(i) = L//2] be the parent of the node i.5 A More Complex Data Structure: Priority Queues 61 ~a EaCh node’s key is at least~ s large as its parent’s. 1 2 5 10 3 7 11 15 17 20 9 15 8 16 X Figure 2. To start with. and v is the element in position i. then the value of H[!] must also be smaller than both its children. by setting H[i] = v.3 for the array representation of the heap on the left-hand side. and the parent of a node at position i is at position parent(f) =/i/2J. consider that if raising the value of H[1] to c~ would make H a heap. and represented as an array on the right. the heap xdolation moves one step closer to the root of the tree (on the right). and use lenggh(H) to denote the number of elements in H. but the resulting structure will possibly fail to satisfy the heap property at position j--in other words. so it takes O(1) time to identify the minimal element.6O Chapter 2 Basics of Algorithm Analysis 2. the site of the "damage" has moved upward from i to j. and hence it already has the heap-order property. the children are the nodes at positions leftChild(i) = 2i and rightChild(f) = 2i + 1. as the key of element v may be smaller than the key of its parent. we will use the first rt positions of the array to store the n heap elements. (In other words. We will use the procedure Heap±f y-up to fix our heap. we can add the new element v to the final position i = n + 1. it helps to understand more fully the structure of our slightly damaged heap in the middle of this process. eventually restoring the heap order. Assume that H is an array. tree on the left.3 Values in a heap shown as a binaD. This representation keeps the heap balanced at all times. Heapify-up (H.4 shows the first two steps of the process after an insertion. The H e a p i fy . and assume H[j] = w. We thus call the process recursively from position ] = parent(i) to continue fixing the heap by pushing the damaged part upward.e. So the two children of the root are at positions 2 and 3. How do we add or delete heap elements? First conside~ adding a new heap element v. If key[v] < key[w].4 The Heapify-up process. this does not maintain the heap property. and assume that our heap H has n < N elements so far. Key 3 (at position 16) is too small (on the left). then we will simply swap the positions of v and w. at position i. Figure 2. H[1]) too small. If the heap has n < N elements at some time. then in fact it is a~heap. So we now have something that is almost-a heap. Now it will have n + 1 elements. i) : If i> 1 then let ] = parent(i) = Lil2J If key[H[i]]<key[H[j]] then swap the array entries H[i] mad H[j] Heapify-up (H. We say that H is almost a heap with the key of H[i] too small. element v in H[i] is too small. if there is a value ~ _> key(v) such that raising the value of key(v) to c~ would make the resulting array satisfy the heap property. To see why this is true. After swapping keys 3 and 11. See the right-hand side of Figure 2. Figure 2. except for a small "damaged" part where v was pasted on at the end. This wil! fix the heap property at position i.) One important point to note is that if H is almost a heap with the key of the root (i. The arrows show the children for the top three nodes in the tree.. Implementing the Heap Operations The heap element with smallest key is at the root.

as required.1 elements are in the first n . Figure 2. and fl = key(w).1 positions. i) fixes the heap property in O(log i) time. The process follows the tree-path from position i to the root. Here we will implement a more general operation Delete(/. if lowering the value in H[i] would make H a heap. On the other hand. Many applications of priority queues don’t require the deletion of arbitrary elements. We use Heapify-up to fix the heap property. In a heap. the heap violation moves one step closer to the bottom of the tree (on the right). Heapify-down (H. So by the induction hypothesis. and right=2f+l Let ] be the index that minimizes key [H [left] ] and key [H [right] ] Else if 2f=n then Let ] = 2f Endif If key[H[]]] < key[H[i]] then swap the array entries H[i] and H[]] Heapify-down (H.5 A More Complex Data Structure: Priority Queues 63 (2. then in fact H is a heap. but we may well still not. closely analogous to Heapify-up. but only the extraction of the minimum. So as a first step. If i = ! there is nothing to prove.1 elements. there is actually a "hole" at position i. To insert a new element in a heap. then we can use Heapify-up(i) to reestablish the heap order. If the key is too small (that is. We claim that after the swap. then the array is a heap. Now consider the case in which i > 1: Let v = H[i]. Using Heapify-up we can insert a need element in a heap of n elements in O(log n) time. we first add it as the last element. this corresponds to identifying the key at the root (which will be the minimum) and then deleting it. which will delete the element in position i. to patch the hole in H.| toward the leaves.62 Chapter 2 Basics of Algorithm Analysis 2. [] Now consider deleting an element. it has no children). it is almost a heap with the key value of the new element too small. Proof. as the key of element w may be either too small or too big for the position i. After swapping keys 21 and 7. we will use a procedure called Heapify-dovn. the only place in the heap where the order might be violated is position i. Swapping elements v and w takes O(1) time. H at least has the property that its n . Otherwise. Indeed. assuming that the array H is almost a heap with the key of H[i] too small. Assume the heap currently has n elements. We say that H is almost a heap with the key of Hill too big. and not only is the heap-order property violated. we will refer to this operation as ExtractMin(H). the violation of the heap property is between node i and its parent). swaps the element at position i with one of its children and proceeds down the tree recursively.5 shows the first steps of this process. In this case. i). we move the element w in position n to position i. so it takes O(log i) time. that I The He~pi fy-down process~ is moving element w down. if there is a-vaiue o~ _< key(w) such that lowering the value of key(w) to c~ would make the resulting array satisfy the heap property. j = parent(i). However. the array H is either a heap or almost a heap with the key of H[j] (which now holds v) too small. After deleting the element H[i].5 The Heapify-down process:. as setting the key value at node j to ~ would make H a heap. This is true. ]) Endif Assume that H is an array and w is the element in position i. have the heap-order property. Note that if H[i] corresponds to a leaf in the heap (i. then . ) Figure 2. w = H[j]. the heap will have only n . i) : Let n= length(H) If 2i>n then Terminate with H unchanged Else if 2f<n then Let left=2f.e. If the new element has a very large key value. since we have already argued that in this case H is actually a heap. if key[w] is too big..12) The procedure Heapify-up(!-!. and H is almost a heap with H[i] too big. applying Heap±fy-up(j) recursively will produce a heap as required. the heap property may be violated between i and one or both of its children. After doing this. since H[i] is now empty./. We prove the statement by induction on i. Key 21 (at position 3) is too big (on the left).

Now j >_ 2i. To implement this operation in O(log n) time. Swapping the array elements w and v takes O(1) time. For example. we apply Delete(H. This is true as setting key(v) = key(w) would make H a heap.Position[u]).8). ~ Delete(H. To delete the element u. H[n] = w. Let n be the number of elements in the heap. ~ ExtractMin(H) identifies and deletes an element with minimum key value from a heap. by (2. which we do by using the array Position. and by (2. o Insert(H. and so we can delete an element v from a heap with n nodes in O(log n) time. in a number of graph algorithms that use heaps. we first need to be able to identify the position of element v in the array. so by the induction hypothesis. There is a second class of operations in which we want to operate on elements by name. (2. v. polynomials. i) fixes the heap property in O(log n) time. . We prove that the process fixes the heap by reverse induction on the value i. and f4 very easily.-we have f2(n) = O(f~(n)). as it involves initializing the array that will hold the heap. H is a heap and hence there is nothing to prove.9). the heap elements are nodes of the graph with key values that are computed during the algorithm. let j be the child of i with smaller key value. ~ F±ndM±n(H) identifies the minimum element in the heap H but does not remove it. we have f4(n)= O(f2(n)). if function g(n) immediately follows function f(n) in your list. since they belong to the basic families of exponentials. To be able to access given elements of the priority queue efficiently. Once we have identified the position of element v. we want to operate on a particular node. We use Heapify-down or Heapify-down to fix the heap property in O(log n) time. If the heap currently has n elements. f2. We can now implement the following further operations. This is implemented in O(log n) time for heaps that have n elements. If the resulting array is not a heap. Proof. and logarithms. assuming that H is almost a heap with the key value of H[i] too big. Solved Exercises Solved Exercise 1 Take the following list of functions and arrange them in ascending order of growth rate. [] Implementing Priority Queues with Heaps The heap data structure with the Heap±fy-do~m and Heapi~y-up operations can efficiently implement a priority queue that is constrained to hold at most N elements at any point in time. Otherwise. fl(n) = 10n h(n) = n h(n) = nn f4(n) = log2 n f~(n) = 2~4T~ ~ Solution We can deal with functions fl. rather than by their position in the heap. To use the process to remove an element v = H[i] from the heap. which changes the key value of element u to key(u) = o~. If 2i > n. so in O(log n) iterations the process results in a heap. then. i) deletes the element in heap position i. cO. we change the key and then apply Heapify-up or Heapify-doma as appropriate. we simply maintain an additional array Pos±t±on that stores the current position of each element (each node) in the heap. following a tree-path.13) The procedure Heapify-down(H. This operation takes O(N) time. and let w = H[j]. v) inserts the item u into heap H. then it should be the case that [(n) is O(g(n)). In particular. we replace HI±] with the last element in the array. o 8taxtHeap(N) returns an empty heap H that is set up to store at most N elements. An additional operation that is used by some algorithms is ChangeKey (H. That is.64 Chapter 2 Basics of Algorithm Analysis Solved Exercises 65 H[i] is already larger than its parent and hence it already has the heap-order property. At various points in these algorithms. This is a combination of the preceding two operations. Using Heap±fy-up or Heap±fy-dovn we can delete a new element in a heap o[ n elements in O(log n) time. this takes O(log n) time. as we just argued above. Maintaining this array does not increase the overall running time. Here we summarize the operations we will use. it is almost a heap with the key value of H[i] either too small or too big. regardless of where it happens to be in the heap. The algorithm repeatedly swaps the element originally at position i down. the recursive call to Heap±fy-cloun fixes the heap property. We claim that the resulting array is either a heap or almost a heap with H[j] = v too big. This takes O(!) time. and so it takes O(log n) time.

First. A useful rule of thumb in such situations is to try taking logarithms to see whether this makes things clearer. Essentially. and so I0n = o(nn). where in this case c = 1. we have log2 z < z1/2. All of these can be viewed as functions of log2 n. it is just a matter of unwinding the definitions. this finishes the task of putting the functions in order. for some constants c and no. thus once n > 216 we have log/4(n) _< log/s(n). But this is exactly what is required to show that g = fl (f): we have established that g(n) is at least a constant multiple of f(n) (where the constant is ~). in fact. if function g(n) immediately follows function f(n) in your list. we come to function fls. and so we can write Is(n)= O(f2(n)). (Assume these are the exact number of operations performed as a function of the input size n. We’re given that. Finally. and suppose that f = O(g). and so/4(n) _< Is(n).) notation: for all n >_ 10. we can write 1 log fa(n) = -z 3 log f4(n) = log2 z log fs(n) = z~/2 Now it’s easier to see what’s going on. But the condition z > 16 is the same as n >_ 216 -= 65. and you need to compute a result in at most an hour of computation. Show that g = fl (f).) and fl (-) are in a sense opposites. while log fa(n) = ½ log2 n. we can conclude that g(n) >_ ~f(n) for all n >_ no. we have I0n _< cnn. In this case.(n) = n2 log n Take the following list of functions and arrange them in ascending order of growth rate. which is admittedly kind of strangelooking. for z > 16. Similarly we have z11~< ½z once z >_ 9--in other words. but once n >_ 10. what is the largest input size n for which you would be able to get the result within an hour? (a) rt~ (b) n3 (c) lOOn~ (d) n log n (e) 2n (f) 22" Take the foilowing list of functions and arrange them in ascending order of growth rate.66 Chapter 2 Basics of Algorithm Analysis Exercises 67 Now. then it should be the case that f(n) is O(g(n)). Dividing both sides by c. What do the logarithms of the other functions look like? log f4(n) = log2 log2 n. That is. or (b) increase the input size by one? (a) n2 n3 lOOn2 nlog n 2n Suppose you have algorithms with the sLx running times listed below. ~/f4(n) = lon ~/fstn) = 10on fc. . once n > 29 = 512. (Assume these are the exact running times. v/ fl(n) = n ~/ f3(n) = n + 10 Solved Exercise 2 Let f and g be two functions that take nonnegative values. Solution This exercise is a way to formalize the intuition that O(. This is exactly what we need for the definition of O(. then it should be the case that f(n) is O(g(n)). log2 fs(n) = ~ n = (!og2 n)l/2.) Suppose you have a computer that can perform 10t° operations per second. Since we have sandwiched fs between f4 and f2. Exercises Suppose you have algorithms with the five running times listed below. Thus we can write f4(n) _= O(fs(n)). the function f3 isn’t so hard to deal with. then clearly I0n < nn. For each of the algorithms. if function g(n) immediately follows function f(n) in your list. That is.536. It is. we have discovered that 2l~/i-~ n is a function whose growth rate lies somewhere between that of logarithms and polynomials. we have f(n) < cg(n) for all n >_ n0.) How much slower do each of these algorithms get when you (a) double the input size. for all sufficiently large n (at least no). not difficult to prove. For n above this bound we have log fs(n) < log f2(n) and hence fs(n)< f2(n). and so using the notation z = log2 n. It starts out smaller than I0n.

the two turtle doves. (The value of array entry B[i.. that ~ach line has a length that i~ bounded by a constant c.n n Add up array entries A[i] through A[j] Store the result in B[i. and you want to find the highest rung from which you can drop a copy of the jar and not have it break. Show how to encode such a song using a script that has length f(n). climbing one higher each time until the jar breaks. You’re doing some stress-testing on various models of glass jars to determine the height from which they can be dropped and still not break. also cover the four calling birds. a bound on the number of operations performed by the algorithm).We ca~. and of course the. you should design an algorithm with running time O(g(n)). You’re given an array A consisting A[n].. The Aramaic song "Had gadya" from the PassoVer Haggadah works like this as well. you only need a single j ar--at the moment Assume you have functions f and g such that f(n) is O(g(n)). it just iterates through . and suppose that the song. without ha~4ng to write out all the previous lines each time.. filling in a value for each--it contains some highly urmecessary sources of inefficiency. Suppose. when you get to the fifth verse.xample.68 Chapter 2 Basics of Algorithm Analysis Exercises 69 ~ gl(a) = 2~°4i~ " g2(n) = 2n i g4(n) ---. you can convey the words plus instructions for one of these songs by specifying just the new line that is added In each verse. for a function f(n) that grows as slowly as possible. is as follows. on the other hand. so it doesn’t matter what is output for these values. For i=1. even though it will appear in verses five and Onward. the sum A[i] +A[i + 1] +-" + A[j]. and so forth. on a particular type of jar. show that thertmning time of the algorithm on an input of size n is also ~2 (f(n)). with one extra line added on. this the highest safe rung. you could try the following strategy. runs for n words total.) (c) Although the algorithm you analyzed in parts (a) and (b) is the most natural way to solve the problem--after all. Consider the following basic problem. In particular. decide whether you think it is true or false and give a proof or counterexample..oo g(n)/f(n) = O.j] (for i <j) contains the sum of array entries A[i] through A~]--that is. where lim~_. (So the phrase "five golden rings" ouly has to be written once. and then recursively try from rung n/4 or 3n/4 depending on the outcome. If your primary goal were to conserve jars.e. "The Twelve Days of Christmas" has this property.. In other words. There’s a class of folk songs and holiday songs in which each verse consists of the previous verse. give a bound of the form O(f(n)) on the running time of this algorithm on an input of size n (i. (a) log2 f(n)’is O(log2g(n))(b) 2f(n) is O(2g(~)).j] is left unspecified whenever i >_j.]] End/or End/or (a) For some function f that you should choose. then the second rung.n4/3 g3(n) = n(log n)3 gs(n) = nlogn g6(n) = 22n i gT(n) = 2n2 the relevant entries of the array B. 2. For each of the following statements.) Here’s a simple algorithm to solve this problem. Give a different algorithm to solve this problem. for concreteness. with an asymptotically better nmning time. It might be natural to try binary search: drop a jar from the middle rung. you sing about the five golden rings and then. the three French hens. But this has the drawback that y9u could break a lot of jars in finding the answer.. as do many other songs. when sung out loud.partridge in the’pear tree. The setup for this experiment. (b) For this same function f. despite having relatively short scripts. You’d like to output a two-dimensional n-by-n array B in which B[i. In this way. (C) /(n)2 iS O(g(n)2). Start by dropping a jar from the first rung. (This shows an asymptotically tight bound of ®(f(n)) on the running time. see if it breaks. reprising the lines from the fourth verse. for e. You have a ladder with n rungs.) There’s someth~g asy~nptotic that can be analyzed here. These songs tend to last a long time.

and we will discuss randomized hashing in Chapter 13. and others on this issue. for some function f(n) that grows slower than linearly. it should be the case that limn-. The LEDA library (Library of Efficient Datatypes and Algorithms) of Mehlhorn and Ngher (1999) offers an extensive library of data structures useful in combinatorial and geometric applications. If fk(n) denotes the number of times you need to drop a jar should have. in later chapters we will discuss other data structures as they become useful for the implementation of particular algorithms. motivated by the work of a number of researchers including Cobham. A number of other data structures are discussed in the book by Tarjan (1983). Further discussion of asymptotic notation and the growth of basic functions can be found in Knuth (1997a). cost spanning trees. and Stearns. The priority queue is an example of a nontrivial data structure with many applications. for some given k. is generally credited to Williams (1964) and Floyd (1964). let’s consider how to run this experiment given a fixed "budget" of k >_ 1 jars. Tarjan’s Turing Award lecture (1987) offers an interesting perspective on the early thinking of researchers including Hopcroft. Hartmanis. Rahin. you have to determine the correct answer--the highest safe rung--and can use at most k jars In doing so. you have the correct answer--but you may have to drop it rt times (rather than log rt as in the binary search solution). So here is the trade-off: it seems you can perform fewer drops if you’re willing to break more jars.) (b) Now suppose you have a budget of k > 2 jars. The survey by Sipser (1992) provides both a historical and technical perspective on these developments.70 Chapter 2 Basics of Algorithm Analysis Notes and Further Reading 71 it breaks. and the application to sorting. (a) Suppose you are given a budget of k = 2 jars.~ f(n)/n = 0. Similarly. the use of asymptotic order of growth notation to bound the running time of algorithms--as opposed to working out exact formulas with leading coefficients and lower-order terms--is a modeling decision that was quite non-obvious at the time it was introduced. (In other words. Describe a strategy for finding the highest safe rung that requires you to drop a jar at most f(n) times. In other words. Notes and Further Reading Polynomial-time solvability emerged as a formal notion of efficiency by a gradual process. To understand better how this tradeoff works at a quantitative level. Notes on the Exercises Exercise 8 is based on a problem we learned from Sam Toueg. We will consider the Union-Find data structure in Chapter 4 for implementing an algorithm to find minimum- . Describe a strategy for fInding the highest safe rung using at most k jars. Tarian. the property that each grows asymptotically slower than the previous one: lirnn_~ fk(n)/fk_l(n) = 0 for each k. Edmonds. The implementation of priority queues using heaps.

where we cal! u and v the ends of e. One of the most fundamental and expressive of these is the graph. We thus represent an edge e E E as a two-element subset of V: e = {u. the more one tends to see them everywhere. Often we want to encode asymmetric relationships. we begin by introducing the basic definitions surrounding graphs. discrete mathematics has developed basic combinatorial structures that lie at the heart of the subiect.3 Graphs Our focus in this book is on problems with a discrete flavor. Each e’ ~ E’ is an ordered pair (a. We then discuss some basic algorithmic primitives for graphs. v} for some u. A directed graph G’ consists of a set of nodes V and a set of directed edges E’. and list a spectrum of different algorithmic settings where graphs arise naturally. and for this we use the c!osely related notion of a directed graph. v E V. each of which "joins" two of the nodes. Thus. vectors. and matrices.1 Basic Definitions and Applications Reca!l from Chapter 1 that a graph G is simply a way of encoding pairwise relationships among a set of objects: it consists of a collection V of nodes and a collection E of edges. v). Edges in a graph indicate a symmetric relationship between their ends. Just as continuous mathematics is concerned with certain basic structures such as real numbers. in other words. the roles of u and v are not interchangeable. 5. and we call u the tail of the edge and v the head. . beginning with the problem of connectivity and developing some fundamental graph search techniques. The more one works with graphs. We will also say that edge e’ leaves node u and enters node v.

Transportation networks. the directed edge (u. in this context. u has a stronger transmitter). by default. Alternatively. we propose the following list of specific contexts in which graphs serve as important models. but in practice when there is an edge (u. since the links indicate a formal agreement in addition to a physical connection. v). so we would not lose much by . one will more often see it written (even in this book) in the notation used for ordered pairs: e = (u. an agreement to exchange data under the standard BCP protocol that governs global Internet routing. however. The structure of all these hyperlinks can be used by algorithms to try inferring the most important pages on the Web. Note that this latter network is more "virtual" than the former. Thus. it’s hard to appreciate the typical kinds of situations in which they arise. But at this level of abstraction. since they roughly correspond to putting down points in the plane and then joining pairs that are close together. in which nodes correspond to Web pages and there is an edge from u to v if u has a hyperlink to v. since it may be the case that u can hear u’s signal but u cannot hear u’s signal (if. the context of the application. it will provide us with a lot of usefifl examples against which to check the basic definitions and algorithmic problems that we’ll be encountering later in the chapter. The map of routes served by an airline carrier naturally forms a graph: the nodes are airports. and in still others both nodes and edges are pure abstractions. Examples of Graphs Graphs are very simple to define: we just take a collection of things and join some of them by edges. or that u lists v in his or her e-mail address book. First. u}. Note that it’s often useful to view such a graph as directed. and it’s not important to remember everything on it. Also. with an edge joining u and v if they are friends with one another. and the network of bibliographic citations among scientific papers. people often define a node to be the set of all machines controlled by a single Internet service provider. with an edge joining u and v if there is a direct peering relationship between them--roughly. it’s usefi~ to digest the meaning of the nodes and the meaning of the edges in. We could have the edges mean a number of different things instead of friendship: the undirected edge (u. for example. we could have a node for each computer and an edge joining u and u if there is a direct physical link connecting them. Described this way. we will cal! it an undirected graph. One can also imagine bipartite social networks based on a . Other transportation networks can be modeled in a similar way. The hypertextual structure of the Web is anticipated by a number of information networks that predate the Internet by many decades. These include the network of cross-references among articles in an encyclopedia or other reference work. there is almost always an edge (u. the students in a high school. in others the nodes are real objects while the edges are virtual. u). the two words have exactly the same meaning. Inyormation networks. link to popular news sites. one typically defines a graph where the nodes are computing devices situated at locations in physical space. we could take a rail network and have a node for each terminal.1 Basic Definitions and Applications different ways. for example. A collection of computers connected via a 2. we’d quickly notice a few things: there are often a small number of hubs with a very large number of incident edges. For example.treating the airline route map as an undirected graph with edges joining pairs of airports that have nonstop flights each way. for studying the large-scale structure of the Internet. although an edge e in an undirected graph should properly be written as a set of nodes {u. The standard depiction of the subway map in a major city is a drawing of such a graph. The list covers a lot of ground. These graphs are also interesting from a geometric perspective. and an edge joining u and v if there’s a section of railway track that goes between them without stopping at any intermediate terminal. In some cases the nodes and edges both correspond to physical objects in the real world. It is also worth mentioning two warnings in our use of graph terminology. u). a node in a graph is also frequently called a vertex. v) could mean that u and v have had a romantic relationship or a financial relationship. communication network can be naturally modeled as a graph in a few 3. Given any collection of people who interact (the employees of a company. the graph is directed. Social networks. rather. and there is an edge from u to t~ if there is a nonstop flight that departs from u and arrives at v. or the residents of a small town). Communication networks. The directedness of the graph is crucial here. and it’s possible to get between any two nodes in the graph via a very small number of intermediate stops. First. many pages. the term "graph" will mean an undirected graph. we can define a network whose nodes are people. a technique employed by most current search engines. in going through the list. In studying wireless networks. The World Wide Web can be naturally viewed as a directed graph.74 Chapter 3 Graphs When we want to emphasize that the graph we are considering is not directed. Looking at such a graph (you can generally find them depicted in the backs of inflight airline magazines). and there is an edge from u to u if u is close enough to u to receive a signal from it. Second. v) could mean that u seeks advice from v. but these sites clearly do not reciprocate all these links. 1.

there is a path from u to v. the first k . Given a list of functions or modules in a large software system. 8 form a path in Figure 3. or an airline passenger traveling from San Francisco to Rome on a sequence of flights..1 correspond to the same tree T--the same pairs of nodes are joined by edges--but the drawing on the right represents the result of rooting T at node 1. since it’s possible for u to have a path to ~ while u has no path to u. 7. Trees We say that an undirected graph is a tree if it is connected and does not contain a cycle. v~_l. or a v~-vg path. and to track the spread of fads. rumors. for each other node v. v2 . (We can designate some symbol like oo to denote the distance between nodes that are not connected by a path. Paths and Connectiuity One of the fundamental operations in a graph is that of traversing a sequence of nodes connected by edges. Or given a set of species in an ecosystem. v~_l. the sequence of nodes in the path or cycle must respect the directionality of edges. v~ in which k > 2. E) to be a sequence P of nodes v1. the tree is rooted at node 1. We say that a directed graph is strongly connected if. In addition to simply knowing about the existence of a path between some pair of nodes a and u. for every two nodes u and u. jokes. we declare the parent of v to be the node u that directly precedes v on its path from r.. it is useful to root it at a particular node r. Thus we define the distance between two nodes a and u to be the minimum number of edges in a u-u path. to model trust relationships in a financial or political setting. All of these definitions carry over naturally to directed graphs. For example. this is the operation of grabbing T at the node r and letting the rest of it hang downward under the force of gravity. A path is called simple if all its vertices are distinct from one another. For example. vi+l) is an edge. the two pictures in Figure 3. We say that an undirected graph is connected if.1.1 Basic Definitions and Applications 77 notion of affiliation: given a set X of people and a set Y of organizations. For thinking about the structure of a tree T. trees are the simplest kind of connected graph: deleting any edge from a tree will disconnect it. given the list of courses offered by a college or university. a rumor passing by word of mouth from you to someone halfway around the world. the two graphs pictured in Figure 3. Choosing how to define connectivity of a Figure 3.. This is far from a complete list. and vl = v~--in other words. On the right.1 Two drawings of the same tree. we may also want to know whether there is a short path. the sequence of nodes "cycles back" to where it began. each pair of consecutive nodes has the property that (vi.! are trees.. 1. we "orient" each edge of T away ffomr. if we want to get from a to u. Networks such as this are used extensively by sociologists to study the dynamics of interaction among people. we declare w to be a child of v if v is the parent of w. With this notion in mind. vg+~ is ioined by an edge in G. P is often called a path from v~ to ug. In the examples iust listed. we could have a node for each course and an edge from u to v if u is a prerequisite for v. and we say that a node x is a leaf if it has no descendants. 2. v~ with the property that each consecutive pair v~. It is natural to define directed graphs that capture the interdependencies among a collection of objects. It is meant simply to suggest some examples that are useful to keep in mind when we start thinking about graphs in an algorithmic context. More precisely. for example. . and e-mail viruses.-we define a path in an undirected graph G = (V. Physically. They can be used to identify the most "influential" people in a company or organization. diseases.. More generally. we could define a graph--a food web--in which the nodes are the different species and there is an edge from u to v if u consumes v. with the fol!owing change: in a directed path or cycle. directed graph is a bit more subtle.76 Chapter 3 Graphs 3. the nodes 4. too far to even begin tabulating its omissions. A cycle is a path v~. such a traversal could correspond to a user browsing Web pages by following hyperlinks. In other words. there is a path from u to v and a path from v to u... we could have a node for each function and an edge from u to v if u invokes v by a function cal!. we say that w is a descendant of v (or v is an ancestor of w) if v lies on the path from the root to w. In a strong sense..) The term distance here comes from imagining G as representing a communication or transportation network. for every pair of nodes u and v.1 nodes are all distinct. For example. we could define an edge between u a X and v ~ Y if person u belongs to organization v. v2 . Dependency nenvorks. like a mobile. Thus. we may well want a route with as few "hops" as possible.

In the next section we discuss how to implement each of these efficiently.. For example.. We’d like to find an efficient algorithm that answers the question: Is there a path from s to t in G.1 as corresponding to the organizational structure of a tiny nineperson company. Figure 3. We can define the layers L1. Indeed. (ii) G does not contain a c31cle. 7. starting with node 1 as s. we turn to a very basic algorithmic question: n0de-to-node connectivity. E) and two particular nodes s and t. in which we explore outward from s in all possible directions. the s-t Coimectivity Problem could also be called the Maze-Solving Problem. but not to nodes 9 ~ough 13. node 1 has paths to nodes 2 through 8. and 7 report to employee 1. We then include all additional nodes that are joined by an edge to any node in the first layer--this is the second layer.I edges. there is a natural physical interpretation to the algorithm. and so on.2 Graph Connectivity and Graph Traversal 79 Rooted trees are fundamental objects in computer science. and 8. How efficient an algorithm can we design for this task? In this section.1 edges. Breadth-First Search Perhaps the simplest algorithm for determining s-t connectivity is breadth-first search (BFS). Suppose we are given a graph G = (V. (3. The layer containing a node represents the point in time at which the node is reached. and the third layer would consist just of node 6. As this example reinforces.78 Chapter 3 Graphs 3. (iiO G has n . (3. we can imagine the rooted tree in Figure 3. the People page is a child of this entry page (as is the Courses page). 5. individual professors’ home pages are children of the Faculty page. we start at s and "flood" the graph with an expanding wave that grows to visit all nodes that it can reach. But for large graphs. constructed by the BFS algorithm more precisely as follows.2.~ We wi~ call this the problem of determining s-t connectivity. note that nodes 9 through 13 are never reached by the search). In fact. the first layer of the search would consist of nodes 2 and 3. building on a data structure for representing a graph as the input to an algorithm. although we do not prove it here. we describe two natural algorithms for this problem at a high level: breadth-first search (BFS) and depth-first search (DFS). 5. 3. and so on. and conversely. For our purposes here. it can take some work to search for a path. roofing a tree T can make certain questions about T conceptually easy to answer. given a tree T on n nodes.2) Let G be an undirected graph on n nodes. employees 3 and 4 report to employee 2. employees 2. For very small graphs. At this point the search would stop. If we imagine G as a maze with a room corresponding to each node. and a hallway corresponding to each edge that joins nodes (rooms) together. since there are no further nodes that could be added (and in particular. In the example of Figure 3. (0 G is connected.2 In this graph. Essentially. . each edge leads upward from precisely one non-root node. Thus we start with s and include all nodes that are joined by an edge to s--this is the first layer of the search. L2. this question can often be answered easily by visual inspection. Thus we have very easily proved the following fact. then the problem is to start in a room s and find your way to another designated room t. adding nodes one "layer" at a time. the following stronger statement is true. the second layer would consist of nodes 4. how many edges does it have? Each node other than the root has a single edge leading "upward" to its parent. Any tmo of the following statements implies the third. For example.2 Graph Connectivity and Graph Traversal Having built up some fundamental notions regarding graphs. A Wpical computer science department’s Web site will have an entry page as the root. pages entitled Faculty and Students are children of the People page. We now turn to the role of trees in the fundamental algorithmic idea of graph trauersal.. L3 . to facilitate navigation.1) Every n-node tree has exactly n . because they encode the notion of a hierarchy. We continue in this way until no new nodes are encountered. Many Web sites are organized according to a tree-like structure.

We notice that as we ran BFS on this graph. y) be an edge of G. in a very natural way. which is added to layer L3. The execution of BFS that produces this tree can be described as follows. When we consider node 2. so nothing is put in layer L4. A further property of breadth-first search is that it produces. (For notational reasons.. but this isn’t added to the BFS tree. We first discover nodes 7 and 8 when we look at node 3. On the other hand.) Assuming that we have defined layers L1 . because they don’t result in the discovery of new nodes.3 (c). suppose i < j . 3}. Lj. because by the time we look at this edge out of node 3.3 depicts the construction of a BFS tree rooted at node 1 for the graph in Figure 3. and we find that it has an edge to the previously unseen node v. with (a). (a) Co) Figure 3. (3. so 2 becomes their parent.2. then it should have been discovered by this point at the latest and hence should belong to layer Li+1 or earlier. but the only new node discovered when we look through L2 is node 6. There is a path from s to t if and only if t appears in some.80 Chapter 3 Graphs 3. Thus. we will sometimes use layer L0 to denote the set consisting just of s. (a) Starting from node 1. the dotted edges are edges of G that do not belong to T. Specifically. (b). {3.. since we already know about node 3. let x and y be nodes in T belonging to layers Li and Lj respectively.. consider the moment when v is first "discovered" by the BFS algorithm. We call the tree T that is produced in this way a breadth-first search tree. Figure 3.4) Let T be a breadth-first search tree.2. then 3). and let (x. (c) We then consider the nodes in layer L2 in order. the dotted edges are in the connected component of G containing node !. A node falls to appear in any of the layers if and only if there is no path to it. Recalling our definition of the distance between two nodes as the minimum number of edges on a path joining them.1. BFS is not only determining the nodes that s can reach. in particular. the only nodes discovered from x belong to layers Li+1 and earlier. 5) and (7.3) For each j >_ !. [] (6) Layer Li is then grown by considering the nodes in layer L1 in order (say.3 The construction of a breadth-first search tree T for the gTaph in Figure 3. 8) don’t get added to the BFS tree. Note that the edges (4. it is also computing shortest paths to them. Thus we discover nodes 4 and 5 as soon as we look at 2. we add the edge (u. We sum this up in the fo!lowing fact. The solid edges are the edges of T. layer LI produced by BFS consists of all nodes at distaffce exactly j from s. Now consider the point in the BFS algorithm when the edges incident to x were being examined.2 Graph Connectivity and Graph Traversal 81 Layer L1 consists of all nodes that are neighbors of s. Proof. a tree T rooted at s on the set of nodes reachable from s. and (c) depicting the successive layers that are added.. layer. representing the fact that u is "responsible" for completing the path to v. we already know about node 5. if y is a neighbor of x. v) to the tree becomes the parent of v. we also discover an edge to 3. we see that layer L1 is the set of all nodes at distance 1 from s. The full BFS tree is depicted in Figure 3. then layer Lj+I consists of all nodes that do not belong to an earlier layer and that have an edge to a node in layer Li. hence. the BFS tree. Then i and j differ by at most 1. but do not belong to T. this happens when some node in layer Lj is being examined. the edge from 3 to 5 is another edge of G that does not end up in . layer L1 consists of the nodes {2. (d) No new nodes are discovered when node 6 is examined. and the algorithm terminates. the nontree edges all either connected nodes in the same layer. We now prove that this is a properW of BFS trees in general. for each such node v (other than s). first 2. The solid edges are the edges of T. Since x belongs to layer Li. and more generally layer Lj is the set of al! nodes at distance exactly j from s. At this moment. or connected nodes in adjacent layers. Suppose by way of contradiction that i and j differed by more than 1.

we define R = {s}. since v is the first node on P that does not belong to R. You’d then follow the first edge leading out of u. Thus there is a node u immediately preceding u on P. and suppose bY way of contradiction. for each node u. Now. consider a node tu ~ R. if one thinks about it. as a particular way of ordering the nodes we visit--in successive layers. we can simply check whether t belongs to it so as to answer the question of s-t connectivity. You’d then backtrack until you got to a node with an unexplored neighbor. Indeed. it’s clear that BFS is iust one possible way to produce this component. DFS is also a particular implementation of the generic component-growing algorithm that we introduced earlier. if there is a path P from s to u. eventually reaching s. Moreover. You’d start from s and try the first edge leading out of it. Then at any point in time. Now. We now go on to discuss a different one of these algorithms. Suppose we continue growing the set R until there are no more edges leading out of R. Exploring a Connected Component The set of nodes discovered by the BFS algorithm is precisely those reachable from the starting node s. To conclude. by tracing these edges backward from t. It follows that (u. It is most easily described in recursive form: we can invoke DFS from any starting point but maintain global knowledge of which nodes have already been explored. Here is the key property of this algorithm.82 Chapter 3 Graphs 3. (3 !5)SetR prod~ded at the end of the aIgori&m is ~re~isely the ~b. if we find an edge (u. and continue in this way until you reached a "dead end"--a node for which you had already explored all its neighbors. several of which lead to efficient algorithms for the connectivity problem while producing search patterns with different structures. and once we know the connected component containing s. in particular. We will refer to this set R as the connected component of G containing s. we look for nodes like v that have not yet been x4sited. so (u. v) is an edge where u ~ R and u ~g R. Since s ~ R but tu R. Depth-First Search Another natural method to find the nodes reachable from s is the approach you might take if the graph G were truly a maze of interconnected rooms and you were walking around in it. observe that it is easy to recover the actual path from s to t along the lines of the argument above: we simply record.u) where uER and Add u to R Endwhile For any node t in the component R. We call this algorithm depthfirst search (DFS). there is a path from s to v. to a node u. v) where u ~ R and v ~ R. so how do we decide which edge to consider next? The BFS algorithm arises. we can add u to R. We have already argued that for any node u ~ R. and resume from there. and develop some of its basic properties. then there is a path from s to v obtained by first following P and then following the edge (u. we proceed through a sequence of nodes that were added in earlier and earlier iterations.2 Graph Connectivity and Graph Traversal 83 Current component containing s Proof. R will consist of nodes to which s has a path Initially R = {s} While there is ~u edge (u. Figure 3. this defines an s-t path. To start off. But there are other natural ways to grow the component. we can build the component R by "exploring" G in any order. this contradicts the stopping rule for the algorithm. we must have u ~ R. there must be a first node v on P that does not belong to R. we notice that the general algorithm we have defined to grow R is underspecified. the edge (u. based on their distance from s. depth-first search.4 When growing the connected component containing s. v). u) is an edge. u) that was considered in the iteration in which u was added to R. Then. and this’node ~is not equal to s. in other words.4 illustrates this basic step in growing the component R. [] Figure 3. At a more general level. that there is an s-tu path P in G. since it explores G by going as deeply’ as possible and only retreating when necessary. we run the following algorithm.~ctea cOmpone~ Of G . starting from s.

There are some fundamental similarities and some fundamental differences between DFS and BFS. and let (x. . finds node 6. we prove (3. At this point there are no new nodes to find in the connected component. the dotted edges are edges of G that do not belong to T. y) be an edge of G that is not an edge of T. This example suggests the characteristic way in which DFS trees look different from BFS trees. backs up again to 3. To establish this. The full DFS tree is depicted in Figure 3. all nodes that are marked "Explored" between the invocation and end of this recursive call are descendants of u in T. While DFS ultimately visits exactly the same set of nodes as BFS. Figure 3. they tend to be quite narrow and deep. and invoke DFS(s). The execution reaches a dead. (3.7) Let T be a depth-first search tree.6) For a given recursive call DFS(u).5(g). we add the edge (u. 2. 4. let x and y be nodes in T. The solid edges are the edges of T.5 The construction of a depth-first search tree T for the graph in Figure 3. before backing up to try nearer unexplored nodes. the dotted edges are edges of G that do not belong to T. Ca) (d) (g) Figure 3. 3.end at 4. 5. so all the pending recursive DFS calls terminate. whenever DFS(v) is invoked directly during the ca!l to DFS(u).2. the DFS algorithm yields a natural rooted tree T on the component containing s.6).2. and the execution comes to an end. but the tree will generally have a very different structure.5 depicts the construction of a DFS tree rooted at node 1 for the graph in Figure 3. and make u the parent of v when u is responsible for the discovery of v. since there are no new nodes to find. We can see a reflection of this difference in the fact that. However. The solid edges are the edges of T. That is.2 Graph Connectivity and Graph Traversal 85 DFS(u) : Mark u as "Explored" and add u to R For each edge (u. Then one of x ory is an ancestor of the other. with (a) through (g) depicting the nodes as they are discovered in sequence.84 Chapter 3 Graphs 3. and we will see in the next section that they achieve qualitatively similar levels of efficiency. we can say something quite strong about the way in which nontree edges of G must be arranged relative to the edges of a DFS tree T: as in the figure. The resulting tree is called a depth-first search tree of the component R. Using (3. and so it "backs up" to 5. potentially getting very far from s. it typically " does so in a very different order. v) to T.u) incident to u If v is not marked "Explored" then Recursively invoke DFS(u) Endif Endfor To apply this to s-t connectivity. it probes its way down long paths. and finds nodes 7 and 8. The execution of DFS begins by building a path on nodes 1. as in the case of BFS. . The similarities are based on the fact that they both build the connected component containing s. we first observe the following property of the DFS algorithm and the tree that it produces. like BFS. we simply declare all nodes initially to be not explored. nontree edges can only connect ancestors of T to descendants. We make s the root of the tree T. one by one. Rather than having root-to-leaf paths that are as short as possible.

and 13. constructing a path between s and t." Since y was not marked "Explored" when DFS(x) was first invoked. and so a node is in the component of one if and only if it is in the component of the other. On the other hand. We will see that BFS and DFS differ essentially only in that one uses a queue and the other uses a stack. As usual. and the number of edges IEI. Throughout the book we wil! use the adjacency list representation. implement the basic graph search algorithms in time O(m + n). we will aim for polynomial running times.8) For any two nodes s and t in a graph. The same reasoning works with the roles of s and t reversed. the node v must also be reachable from t by a path: we can )fist walk from t to s. a running time of O(m + n) is the same as O(m). To prove the statement in general. The graph is divided into multiple pieces with no edges between them. however. the medium piece is the connected component of nodes 11. This proof suggests a natural algorithm for producing all the connected components of a graph.86 Chapter 3 Graphs 3. the number of nodes IVI. But these comparisons do not always tell us which of two running times (such as m2 and n3) are better.8). For if there were such a node v. We will refer to this as linear time. It follows from (3. Note that when we work with connected graphs.. In this section we aim to.3 Implementing Graph Traversal Using Queues and Stacks So far we have been discussing basic algorithmic primitives for working with graphs without mentioning any implementation details. so we will tend to keep the running times in terms of both of these parameters. then we could walk from s to v and then on to t. When the edge (x. it is not added to T because y is marked "Explored. What is the relationship between these components? In fact. the comparison is not always so clear. n}. 12. their connected components are either identical or disjoint. We claim that the connected components containing s and t are the same set. if one looks at a graph like the example in Figure 3. if there is no path between s and t. E) with n nodes. . A graph G = (V. The simplest way to represent a graph is by an adjacency . for any node v in the component of s. 3.1). and iterate. and the sma!lest piece is the connected component of nodes 9 and 10. and we use BFS (or DFS) to generate its connected Consider a graph G = (V... we )ust need to show how to define these "pieces" precisely for an arbitrary graph. . will be disioint from the component of s. y) is an edge of G that is not an edge of T. Then we use these data structures to implement the graph traversal algorithms breadth-first search (BFS) and depthfirst search (DFS) efficiently. y) is examined during the execution of DFS(x). since it takes O(m + n) time simply to read the input. if there is no path between s and t. and assume the set of nodes is V = {1 . and by (3.. connected graphs must have at least m > n . Is O(m2) or O(n3) a better running time? This depends on what the relation is between n and m.3 Implementing Graph Traversal Using Queues and Stacks 87 Proof. the largest piece is the connected component of nodes 1 through 8. by growing them one component at a time. Suppose that (x. Consider any two nodes s and t in a graph G with the property that there is a path between s and t. then their connected components are dis)tint.! edges. We start. E) has two natural input parameters. in many applications the graphs of interest are connected. (3. The Set of All Connected Components So far we have been talking about the connected component containing a particular node s. and then on from s to v. Proof. component. with two parameters in the running time. We then find a node v (if any) that was not visited by the search from s.. respectively. since m>_n-1. and suppose without loss of generality that x is reached first by the DFS algorithm. This is a statement that is very clear intuitively.. We will use n = IVI and m = IEI to denote these. Representing Graphs There are two basic ways to represent graphs: by an adjacency matrix and by an adjacency list representation.6) that y is a descendant of x. the number of edges m can be at most (~) < n2. Thus. But there is a connected component associated with each node in the graph. and lowerdegree polynomials are better. and we discuss the trade-offs between the different representations. this relationship is highly structured and is expressed in the following claim. With at most one edge between any pair of nodes. Running times will be given in terms of both of these two parameters. Here we discuss how to use lists and arrays to represent graphs. On the other hand. by (3. two simple data structures that we will describe later in this section. to generate its Connected component--which. Indeed. However. it is a node that was discovered between the invocation and end of the recursive call DFS(x). We start with an arbi~ary node s.2. then there cannot be a node v that is in the connected component of each. using BFg starting from v. by reviewing both of these representations and discussing the trade-offs between them. We continue in this way until all nodes have been visited.

u] for all nodes u. Proof.9) ~u~v nv=2m. Since the sum is the total of th~ contributions of each edge. and then be ready to read the list associated with node v. the representation has two basic disadvantages. or the set of all free men in the Stable Matching algorithm. while the adjacency list representation requires only O(m + n) ~pace. Each edge e = (u. the order in which elements are considered is crucial. this can take time proportional to the degree O(nv): we have to follow the pointers on u’s adjacency list to see if edge u occurs on the list. we claim that the adjacency list representation requires only O(m + n) space. w) contributes exactly twice to this sum: once in the. more compact representations are possible. the order in which we considered the free men did not affect the outcome. An adjacency matrix requires O(n2) space. In contrast. with m much smaller than n2. If the algorithm is currently looking at a node u. so the total length over all nodes is O (~v~v nu). each edge e = (v. But many graphs in practice have significantly fewer edges incident to most nodes. the sum of the degrees in a graph is a quantity that often comes up in the analysis of graph algorithms. (3. . Let’s compare the adiacency matrix and adiacency list representations. as we have done for maintaining the set of free men in the Stable Matching algorithm. the adjacency list is a natural representation for explorihg graphs. such as DFS and BFS. In the Stable Matching algorithm. When the graph has many fewer edges than n2. v] is equal to ! if the graph contains the edge (u. and then we need space for all the lists. For this purpose. but we argued in the previous paragraph that overall. v ~ V. If the graph is undirected. with A[u. the matrix A is symmetric. v] = A[v. each edge e = (v. Now. w) is present--and this takes ®(n) time. although this required a fairly subtle proof to verify. The adjacency matrix representation allows us to check in O(1) time if a given edge (u. w) appears in exactly two of the lists: the one for u and the one for w. and checking the matrix entry A[v. (3. it is 2m.3 Implementing Graph Traversal Using Queues and Stacks 89 matrix. Here is why. in which case checking all these edges will take ® (n) time regardless of the representation. v may have ® (n) incident edges. containing a list of the nodes to which v has edges. First consider the space required by the representation. On the other hand. w) ~ E occurs on two adjacency lists: node w appears on the list for node v. w] to see whether the edge (v. E). In view of this.88 Chapter 3 Graphs 3. First. Recall that in an adjacency matrix we can check in O(1) time if a particular edge (u. We define the degree nv of a node v to be the number of incident edges it has. Queues and Stacks Many algorithms have an inner step in which they need to process a set of elements. The representation of graphs used throughout the book is the adjacency list. which works better for sparse graphs--that is. move to a neighbor ~ once it encounters it on this list in constant time. In many other algorithms. v) is present in the graph. it is natural to maintain the set of elements to be considered in a linked list.10) The adjacency matrix representation of a graph requires O(n2) space. which is an n x n matrix A where A[u. o Many graph algorithms need to examine all edges incident to a given node v. and node ~ appears on the list for node w. In the adjacency list representation. and so it would be good to be able to find all these incident edges more efficiently. To be precise. and can read them off in constant time per neighbor. doing this involves considering all other nodes w. in which you learn the neighbors of a node u once you arrive at u. Another (essentially equivalent) way to iustify this bound is as follows. where Adj [v] is a record containing a list of all nodes adjacent to node v. One important issue that arises is the order in which to consider the elements in such a list. o The representation takes ®(n2) space. Now we consider the ease of accessing the information stored in these two different representations. ¯ We sum up the comparison between adjacency matrices and adjacency lists as follows. if the algorithm is currently looking at a node u. For an undirected graph G = (V. v) is present in the graph. and it is much better when the underlying graph is sparse. it can read the list of neighbors in constant time per neighbor. However. it can read this list of neighbors in constant time per neighbor. so it is useful to work out what this sum is. the lengths of these lists may differ from node to node. In the worst case. such the set of all edges adjacent to a node in a graph. we have an array Adj. v) and 0 otherwise. the bound O(m + n) is never worse than O(n2). those with many fewer than n2 edges. we need an array of pointers of length n to set up the lists in Adj. since it uses an n x n matrix. Since we have already argued that m < n2. quantity nu and once in the quantity nw. The length of the list at Adj [u] is list is nv. Now. The list representation thus corresponds to a physical notion of "exploring" the graph. In the adjacency list representation there is a record for each node u. the set of visited nodes in BFS and DFS. Thus the total length of al! lists is 2m = O(m). In the adjacency matrix representation.

3.. Instead of all these distinct lists. We will see that BFS can be thought of as using a queue to select which node to consider next. The algorithm. and the algorithm always processes the edges out of the node that is currently first in the queue. We need O(n) additional time to set up lists and manage the array Discovered. if the graph is given by the adjacency list representation. we choose the one that was added most recently. Now we need to consider the nodes u on these lists. so the total over all nodes is O(Y~u~v ha). Implementing Breadth-First Search The adjacency list data stxucture is ideal for implementing breadth-first search. When we consider a node u. so each of these insertions can be done in constant time. it is added to the end of the queue. In this way. so the For loop runs at most n times over a]. u) incident to u If Discovered[u] = false then Set Discovered[u] = true Add edge (u. it is easy to bound the running time of the algorithm by O(n2) (a weaker bound than our claimed O(m + n)). (3. so the total time is at most O(n2)... we always select the first element on our list. constructs layers of nodes LI.9) that ~. The algorithm examines the edges leaving a given node one by one.3 Implementing Graph Traversal Using Queues and Stacks 91 Add u to the list L[i+ I] Endif End/or Increment the layer counter i by one Endwhile In this implementation it does not matter whether we manage each list L[i] as a queue or a stack. first-out (FIFO) order: we select elements in the same order in which they were added. while DFS is effectively using a stack.90 Chapter 3 Graphs Two of the simplest and most natural options are to maintain a set of elements as either a queue or a stack. Recall from (3. A queue is a set from which we extract elements in first-in. first-out (LIFO) order: each time we select an element. let nu denote the degree of node u. So the total time spent is O(m + n) as claimed. L2 . linear in the input size). u) incident to u. and we spend O(1) time considering each edge. As a first step.v nu = 2m. To maintain the nodes in a layer Li. ~ We described the algorithm using up to n separate lists L[i] for each layer L~. the number of edges incident to u. In a queue a new element is added to the end of the list as the last element. we need to observe that the For loop processing a node u can take less than O(n) time if u has only a few neighbors. we maintain an array Discovered of length n and set Discovered[u] = true as soon as our search first sees u.0. There can be at most n such edges. A stack is a set from which we extract elements in last-in. we can implement the algorithm using a single list L that we maintain as a queue.u) to the tree T . the algorithm processes nodes in the order they are first discovered: each time a node is discovered. So the total time spent on one iteration of the For loop is at most O(n). Recall that a doubly linked list has explicit First and Last pointers to the beginning and end. since the algorithm is allowed to consider the nodes in a layer Li in any order. To see this. To make this simple.l iterations of the While loop.11) The above implementation of the BFS algorithm tans in time O(m + n) (i. Proof. Now. while in a stack a new element is placed in the first position on the list.. the time spent in the For loop considering edges incident to node u is O(nu). we need to look through all edges (u. so this takes O(n) time. and so the total time spent considering edges over the whole algorithm is O(m). as described in the previous section. and that each iteration takes at most O(n) time. 2 BFS (s) : Set Discovered[s] = true and Discovered[u] = false for all other u Initialize L[0] to consist of the single element s Set the layer counter i----0 Set the current BFS tree T=0 While /[f] is not empty Initialize an empty list i[i÷ I] For each node u E i[i] Consider each edge (u. we need to know whether or not node u has been previously discovered by the search. I. When we are scanning the edges leaving u and come to an edge (u. we have a list L[i] for each i --.. In both cases. where Li is the set of nodes at distance i from the source s. As before. Both queues and stacks can be easily implemented via a doubly linked list. We’ve thus concluded that there are at most n iterations of the For loop.e. To get the improved O(m + n) time bound. u). note that there are at most n lists L[i] that we need to set up. the difference is in where we insert a new element. Each node occurs on at most one list. respectively. Next we will discuss how to implement the search algorithms of the previous section in linear time.

and we deferred actually exploring these neighbors until we got to the processing of layer L~+I.12) The above algorithm implements DFS. Node v will be added to the stack S every time one of its nv adjacent nodes is explored. As a result. Thus. DFS (s) : Initialize S to be a stack with one element s While S is not empty Take a node u from S If Explored[u] = false then Set Explored[u] = true For each edge (u. and then it immediately shifts attention to exploring v. Note that the above algorithm. because it pushes all adjacent nodes onto the stack before considering any of them.. followed by all nodes in layer Li+l. The difference is that we only set Explored[v] to be true when we scan v’s incident edges (when the DFS search is at v). once we started to explore a node u in layer Li. Hence this implementation in terms of a single queue wi!l produce the same result as the BFS implementation above. we first add all of the nodes adjacent to u to our list of nodes to be considered. Essentially. in turn. As we explore v. as it can be adjacent to multiple nodes u that we explore. Depth-first search is underspecified. we will only use one of these copies to explore node v. it suffices to maintain one value parent [v] for each node v by simply overwriting the value parent [v] every time we add a new copy of v to the stack S. 2 .3 Implementing Graph Traversal Using Queues and Stacks 93 If we maintain the discovered nodes in this order. rather than in a queue. it suffices to count the number of nodes added to S. it can also be viewed as almost identical to BFS. v) incident to u Add v to the stack S Endfor Endif Endwhile Implementing Depth-First Search We now consider the depth-first search algorithm: In the previous section we presented DFS as a recursive procedure. all nodes in layer Li will be considered in a contiguous sequence. when all the incident edges to v are scanned. to bound t~e running time. Thus. The implementation in full looks as follows.. (3. we add the neighbors of v to the list we’re maintaining. v). In both BFS and DFS. and so forth. let nu denote the degree of node v. so that these neighbors will be explored before we return to explore the other neighbors of u. with the difference that it maintains the nodes to be processed in a stack. However. while moving on to more freshly discovered nodes. while BFS sets Discovered[v] to true as soon as v is first discovered. In addition. when the algorithm finds an edge leading to v--and the act of exploring a node v. we need to have each node u on the stack S maintain the node that "caused" u to get added to the stack. In BFS. The main step in the algorithm is to add and delete nodes to and from the stack S. there is a distinction between the act of discovering a node v--the first time it is seen. but after doing this we proceed to explore a new neighbor v of u. we also can add the edge (u. for i = 0. since the adjacency list of a node being explored can be processed in any order. which is a natural way to specify it.. we added all its newly discovered neighbors to the next layer L~+I. How many elements ever get added to S? As before. but we do so in stack order. in the sense that it visits the nodes in exactly the same order as the recursive DFS procedure in the previous section (except that each ad]acency list is processed in reverse order). which takes O(1) time. This can be easily done by using an array parent and setting parent[v] = u when we add node v to the stack due to edge (u.92 Chapter 3 Graphs 3. we use an array Explored analogous to the Discovered array we used for BFS. In contrast. as each node needs to be added once for every time it can be deleted from S. we need to bound the number of these operations. resulting in the potential discovery of further nodes. To count the number of stack operations. so the total number of nodes added to S is at . If we want the algorithm to also find the DFS tree. Note that a node v may be in the stack S multiple times. The difference between BFS and DFS lies in the way in which discovery and exploration are interleaved. the recursive structure of DFS can be viewed as pushing nodes onto a stack for later processing. To implement the exploration strategy of DFS. in fact processes each adjacency list in the reverse order relative to the recursive version of DFS in the previous section. and each such node adds a copy of v to the stack S. 1.parent[u]) to the tree T. However. then al! nodes in layer Li will appear in the queue ahead of all nodes in layer Li+l. There is one final wrinkle to mention. When we mark a node u # s as Explored. DFS is more impulsive: when it explores a node u. We now show how to implement DFS by maintaining this stack of nodes to be processed explicitly. We only come back to other nodes adjacent to u when there are no other nodes left. it scans the neighbors of u until it finds the fffst not-yet-explored node v (if any). the copy that we add last.

More generally. 2k + 1. Although we earlier expressed the running time of BFS and DFS as O(m +" n). It is easy to recognize that a graph is bipartite when appropriate sets X and Y (i. which is also red. 3. and so on. coloring . by (3. Are there other. this is natural. if the graph is given by the adjacency list representation. Next we pick any node s ~ V and color it red. Finding the Set of All Connected Components In the previous section we talked about how one c.. and its analysis can be used to show that odd cycles are the only obstacle. It follows that all the neighbors of s must be colored blue. it seems clear that there’s nothing we ’could have donei G simply is not bipartite. We now want to argue this point precisely and also work out an efficient way to perform the coloring. although it may run BFS or DFS a number of times. and so on--coloring odd-numbered nodes red and even-numbered nodes blue. we saw examples of bipartite graphs. either we have a valid red/blue coloring of G. since s must receive some color. and we use BFS (or DFS) to generate its connected component. ~ The Problem In the earlier chapters.4 Testing Bipartiteness: An Application of Breadth-First Search Recall the definition of a bipartite graph: it is one where the node set V can be partitioned into sets X and Y in such a way that every edge has one end in X and the other end in Y. 2k. if a graph G simply contains an odd cycle. But suppose we encounter a graph G with no annotation provided for us. thus we have established the following. and the nodes in the set Y are colored blue. and it has an edge to node 1. We then find a node v (if any) that was not visited by the search from s and iterate. there is a very simple procedure to test for bipartiteness.14) that an odd cycle is one simple "obstacle" to a graph’s being bipartite. With this imagery. But then we must color node 2k + 1 red.. The first thing to notice is that the co!oring procedure we have just described is essentially identical to the description of BFS: we move outward from s..8). It then follows that all the neighbors of these nodes must be colored red. so we do this. whether there exists a partition into red and blue nodes.e.d graph G is bipartite.) Thus the above algorithm. we can imagine that the nodes in the set X are colored red.4 Testing Bipartiteness: An Application of Breadth-First Search 95 most ~u nv = 2m. At this point.14) If. We start with an arbitxary node s. another way to describe the coloring algorithm is as follows: we perform BFS. How difficult is this? We see from (3. we can say a graph is bipartite if it is possible to color its nodes red and blue so that every edge has one red end and one blue end. If we color node 1 red. consider a cycle C of odd leng~. To make the discussion a little smoother.e. Indeed. more complex obstacles to bipartitness? /’~ Designing the Algorithm ~ O(m + n). one where no such partition of V is possible? In fact. 2. their neighbors must be colored blue.94 Chapter 3 Graphs 3. First we assume the graph G is connected. Here we start by asking: What are some natural examples of a nonbipartite graph.another one blue. and in many settings where bipartite graphs arise.. as required. This demonstrates that there’s no way to partition C into red and blue nodes as required. co!oring nodes as soon as we first encounter them.an use BFS (or DFS) to find all connected components of a graph. and we’d like to determine for ourselves whether it is bipartite--that is. In this latter case. 3 .. both BFS and DFS in fact spend work only on edges and nodes in the connected component containing the starting node. since we can color one node red. We continue in this way until all nodes have been visited. and then we must color node 3 red.. with nodes numbered 1. where m and n are the total number of edges and nodes in the graph. More generally. then we must color node 2 blue. or there is some edge with ends of the same color. red and blue nodes) have actually been identified for us. using BFS (or DFS) starting from v to generate its connected component--which. then we can apply the same argument. there is no loss in doing this. then it cannot contain an odd cycle. linear in the input size). unti! the whole graph is colored. (They never see any of the other nodes or edges. only spends a constant amount of work on a given edge or node in the iteration when the connected component it belongs to is under consideration.13) The above implementation of the DFS algorithm runs in time O( m + n) (i. This proves the desired O(m + n) bound on the running time of DFS. wil! be disjoint from the component of s. (3. since otherwise we can first compute its connected components and analyze each of them separately. in which every edge has ends of opposite colors. Hence the overall running time of this algorithm is still Clearly a triangle is not bipartite.. (3. and then we can’t do anything with the third node.

L2 .which is an odd number. the edge (u. . v) has a direction: it goes from u to v. the total running time for the coloring algorithm is O(m + n). Thus. we have been looking at problems on undirected graphs. each node has two lists associated with it: one list consists of nodes to which it has edges.5 Connectivity in Directed Graphs 97 s red. and so on. Layer Li Layer Lj Figure 3.. and a second list consists of nodes from which it has edges.i) + 1. Also. define a second layer to consist of all additional nodes to which these first-layer nodes have an edge. Whenever we get to a step in BFS where we are adding a node v to a list L[i + 1].. this is equal to 2(j . ProoL First consider case (i). be the layers produced by BFS starting at node s. recall that L0 ("layer 0") is the set consisting of just s. Now. for obvious reasons. Suppose this is the edge e = (x. demonstrating that the graph cannot be bipartite. and their lowest common ancestor z has odd length. then the edge e. this algorithm performs at most constant work for each node and edge. where we suppose that there is no edge joining two nodes of the same layer. Now consider the BFS tree T produced by our algorithm. and so it cannot be bipartite.5 Connectivity in Directed Graphs Thus far.i) + 1 + (j . instead of each node having a single list of neighbors.~ d z has odd length9 Representing Directed Graphs In order to represent a directed graph for purposes of designing algorithms. Thus an algorithm that is currently looking at a node u can read off the nodes reachable by going one step forward on a directed edge. and so forth. where i < j. why must G contain an odd cycle? We are told that G contains an edge joining two nodes of the same layer. Recall that in a directed graph. y. we discover nodes layer by layer as they are reached in this outward search from s. y).i). by simply taking the implementation of BFS and adding an extra array Color over the nodes. In Section 3. then the cycle through x. subject to the condition that z is an ancestor of both x and y in T. we now consider the extent to which these ideas carry over to the case of directed graphs. we simply scan al! the edges and determine whether there is any edge for which both ends received the same color. y ~ Lj. all of layer L1 blue. In this way. Our assumption for case (i) is precisely that the first of these two alternatives never happens. and it also shows that we can find an odd cycle in G whenever it is not bipartite. (3. At the end of this procedure. define a first layer of nodes to consist of all those to which s has an edge. ~ he cycle through x. We consider the cycle C defined by following the z-x path in T. We will focus here on BFSi We start at a node s. and let z be the node whose layer number is as large as possible.15} Let G be a connected graph. we assign Color[u] = red if i + I is an even number. The length of this cycle is (j . coloring odd-numbered layers blue and even-numbered layers red. Now suppose we are in case (ii). resulting in a running time of O(m + n). This includes the adjacency list representation and graph search algorithms such as BFS and DFS. and let LI. the relationship between u and v is asymmetric. The Graph Search Algorithms Breadth-first search and depth-first search are almost the same in directed graphs as they are in undirected graphs. since it’s not generally possible to browse "backwards" by following hyperlinks in the reverse direction. We can implement this on top of BFS. By (3. for example. with x..6 If two nodes x and y in the same layer a_re joined by an edge. and the nodes in layer j are precisely those for which the shortest path from s has exactly j edges. complex directed graph whose nodes are pages and whose edges are hyperlinks. Then exactly one of the following two things must hold. we know that every edge of G ioins nodes either in the same layer or in adiacent layers. as well as the nodes that would be reachable if one went one step in the reverse direction on an edge from u. The act of browsing the Web is based on following a sequence of edges in this directed graph.6. we discussed the World Wide Web as an instance of a large. As in the undirected case. and this has qualitative effects on the structure of the resulting graph. (0 There is no edge of G joining two nodes of the same layer. y. G contains an odd-length cycle. Suppose z ~ Li. and the nodes in odd-numbered layers can be colored blue. a number of basic definitions and algorithms have natural analogues in the directed case. ~ Analyzing the Algorithm We now prove a claim that shows this algorithm correctly determines whether G is bipartite. all of layer L2 red. we can cal! z the lowest common ancestor of x and y. for notational reasons.96 Chapter 3 Graphs 3. and so every edge has ends with opposite colors. we use a version of the adiacency list representation that we employed for undirected graphs. In this way. But our coloring procedure gives nodes in adjacent layers the opposite colors. iust as it is for BFS. Thus this coloring establishes that G is bipartite. In this case G is a bipartite graph in which the nodes in even-numbered layers can be colored red.-adding the length of its three parts separately. (ii) There is an edge of G joining two nodes of the same layer. [] 3. and Color[u] = blue if i + 1 is an odd number. We now have the situation pictured in Figure 3. At the same time. so this means that every edge joins two nodes in adjacent layers.4).1. and the directionality is crucial. and then the y-z path in T. In this case. We now discuss these in turn.

Similarly. with more work it is possible to compute the strong components for all nodes in a total time of O(m + n). implicitly based on (3. We pick any node s and run BFS in G starting from s. in order. a node has a path from s in Gre~ if and only if it has a path to s in G. To construct a path from u to w.6 Directed Acyclic Graphs and Topological Ordering If an undirected graph has no cycles. To construct a path from w to u.98 Chapter 3 Graphs 3. 3. since any two were either identical or disjoint..cycles and still have a very rich structure. There is a natural analogue of depth-first search as well. for a given node s. s and v are mutually reachable. But suppose we find that s has a path to every node.17) For any two nodes s and t in a directed graph. (So a graph is strongly connected if every pair of nodes is mutually reachable. and that On the other hand. By analogy with connected components in an undirected graph. then again by (3. In fact. Then s and v are mutually reachable for every v. the algorithm in the previous paragraph is really computing the strong component containing s: we run BFS starting from s both in G and in Gre".16) we also have that u and v are mutually reachable.) Mutual teachability has a number of nice properties. when DFS is at a node u. there is a path from u to v and a path from v to u. if s and v are mutually reachable. so from (3. then there cannot be a node v that is in the strong component of each. and hence this set is the strong component containing s.16). so by (3. Recall that connected components naturally partitioned the graph. It is again a recursive procedure that tries to explore as deeply as possible.16). then s and u would be mutually reachable. for each node to which u has an edge. Grev. There are further similarities between the notion of connected components in undirected graphs and strong components in directed graphs. let’s say that two nodes u and v in a directed graph are mutually reachable if there is a path from u to v and also a path from v to u. for any node v.a depth-first search. and what directed BFS is computing is the set of all nodes t with the property that s has a path to t. then it has an extremely simple structure: each of its connected components is a tree. that we obtain from G simply by reversing the direction of every edge. such graphs can have a large number of edges: if we start with the node . and so it follows that every two nodes u and v are mutually reachable: s and u are mutually reachable. and v and iv are mutually reachable. we claim that the strong components containing s and t are identical. Such nodes may or may not have paths back to s. every node has a path to s. and then on from v to u (along the path guaranteed by the mutual teachability of u and v). We could then run BFS or DFS in GreY. Proof. Strong Connectivity Recall that a directed graph is strongly connected if. We then also run BFS starting from s in Grev. For if there were such a node v. it is possible for a node s to have a path to a node t even though t has no path to s. if s and t are not mutually reachable. if one of these two searches fails to reach every node. In directed graphs. we wanted the set of nodes with paths to s. and then on from v to iv (along the path guaranteed by the mutual teachability of v and w). although we will not discuss the details of this here.16) If u and v are mutually reachable. we just reverse this reasoning: we first go from iv to v (along the path guaranteed by the mutual reachability of v and iv). and s and v are mutually reachable. (3. and for essentially the same reason. For example. then clearly G is not strongly connected. their strong Components are either identical or disjoint. . It’s worth also formulating some terminology for the property at the heart of this definition. for every two nodes u and v. a There is a simple linear-time algorithm to test if a directed graph is strongly connected. Strong components have this property as well. Proof. But it is possible for a directed graph to have no (directed). Consider any two nodes s and t that are mutually reachable. based on (3. then by (3.16). (3. An easy way to do this would be to define a new directed graph.6 Directed Acyclic Graphs and Topological Ordering 99 It is important to understand what this directed version of BFS is computing. if t and v are mutually reachable.16). Suppose that. Indeed. t and v are mutually reachable as wel!. rather than the set of nodes to which s has paths. it recursively launches . which also runs in linear time and computes the same set of nodes. Now. and ~ and t would be mutually reachable. If one thinks about it. we can define the strong component containing a node s in a directed graph to be the set of all v such that s and v are mutually reachable. Thus.16) it would follow that s and t were mutually reachable. the set of nodes reached by both searches is the set of nodes with paths to and from s. then u and iv are mutually reachable. in this case only following edges according to their inherent direction. we first go from u to v (along the path guaranteed by the mutual teachability of u and v). many of them stemruing from the following simple fact.

we call it--naturally enough--a directed acycIic graph. since none could be done first. and also has a cycle C. uj).7(b) we’ve labeled the nodes of the DAG from part (a) with a topological ordering. In Figure 3. Let’s continue a little further with this picture of DAGs as precedence relations. but with the nodes laid out in the topological ordering. with prerequisite requirements stating that certain courses must be taken before others.. when we come to the task vj. the resulting graph G must be a DAG. via the following. or a DAG for short. it would be natural to seek a valid order in which the tasks could be performed. vi) is an edge. u2 . so that all dependencies are respected.. we say that a topological ordering of G is an ordering of its nodes as ul.done before iob j. Or the tasks may correspond to a pipeline of computing iobs. and a directed edge (i.6 Directed Acyclic Graphs and Topological Ordering 101 ~e~ a topological ordering. we havej > i. then the resulting directed graph has (~) edges but no cycles. set {1. not spelled out as an acronym. no task in C could ever be done. Thus DAGs can be used to encode precedence relations or dependencies in a natural way. since any such incoming edge would violate the defining property of the topological . and there are dependencies among them stipulating. ~. Computing a Topological Ordering Themain question we consider here is the converse of (3.~ The Problem DAGs are a very common structure in computer science. Indeed. Let ui be the lowest-indexed node on C. It is immediately clear that the graph in (c) is a DAG since each edge goes from left to right. ~ The proof of acyclicity that a topological ordering provides can be very useful.. (c) A different drawing of the same DAG..j) whenever i <j. specified by the labels on each node. For example. (b) The same DAG with a topological ordering. there is an efficiently computable order in which to perform the tasks...7(c). and we establish this via an efficient algorithra to compute a topological ordering. the converse of (3. we have i < j. If a directed graph has no cycles. Proof.18} If G has a topological ordering. A topological ordering on tasks provides an order in which they can be safely performed. The key to this lies in finding a way to get started: which node do we put at the beginning of the topological ordering? Such a node Vl would need to have no incoming edges. note that each edge indeed goes from a lower-indexed node to a higher-indexed node... all the tasks that are required to precede it have already been done. that i must be performed before j. In fact. Suppose we have a set of tasks labeled {1. all ges point from left to right.1 are acyclic. Suppose. even visually. But by our choice of i. although it may take some checking to convince oneself that it really has no directed cycles.) In Figure 3.. ~ Designing and Analyzing the Algorithm In fact. there would be no way to do any of the tasks in C: since each task in C cannot begin until some other one completes. for certain pairs i and j. un so that for every edge (ui... we can view a topological ordering of G as providing an immediate "proof" that G has no cycles.18) does hold. u2 .. 2 . j) whenever i must be done before j. with assertions that the output of iob i is used in determining the input to iob j. that G has a topological ordering un. arranged so as to emphasize the topological ordering. and if so. how do we find one efficiently? A method to do this for every DAG would be very useful: it would show that for any precedence relation on a set of tasks without cycles. and let uj be the node on C just before ui--thus (vj. which contradicts the assumption that u1.100 Chapter 3 Graphs 3. 2 . (The term DAG is typically pronounced as a word. In Figure 3.) (c) Figure 3. If the precedence relation is to be at all meaningful. and hence job i must be . then G is a DAG. un was a topological ordering. the tasks may be courses. because many kinds of dependency networks of the type we discussed in Section 3. all edges point "forward" in the ordering. ! 8): Does every DAG have a topological ordering. Given a set of tasks with dependencies. we have drawn the same graph as in (a) and (b). n} and include an edge (i. In other words. (3. Specifically.7 (a) A directed acyclic graph.. by way of contradiction. for a directed graph G.. if it contained a cycle C..7(a) we see an example of a DAG. We can represent such an interdependent Set of tasks by introducing a node for each task.. n} that need to be performed.

This is not a bad running time. since deleting v cannot create any cycles that weren’t there previously. note that there is always at least one such edge at every stage of the algorithm’s execution. containing ®(n2) edges.20) ff G is a DAG. we can walk backward to u. that all edges point forward. then clearly C forms a cycle. let us claim by induction that every DAG has a topological ordering. and we explicitly maintain two things: (a) for each node m. the total running time is O(n2)..18). and deleting it from G. We pick any node v. If we let C denote the sequence of nodes encountered between successive visits to w. is that when we apply this algorithm to a DAG. and if G is very dense. and hence it is a topological ordering.{v}. and (b) the set S of all active nodes in G that have no incoming edges from other active nodes. Now suppose it is true for DAGs with up to some number of nodes n. In fact. a running time ofO(m + n) could be a significant improvement over ®(n2).. as guaranteed by (3.102 Chapter 3 Graphs ordering.19). To compute a topological ordering of G: Find a node v with no incoming edges and order it first (d) (e) (f) Figure 3. Then. (3. But we may well want something better when the number of edges m is much less than n2. We can continue this process indefinitely. we can walk backward to x.7. Thus. Now G-(v} is a DAG.8 we show the sequence of node deletions that occurs when this algorithm is applied to the graph in Figure 3.. Since the algorithm runs for n iterations. u). Also. G. But after n + I steps. We show how to find a cycle in G. we find a node v with no incoming edges. we will have visited some node w twice.{v} has n nodes. nodes are deleted one by one so as to be added to a topologica! ordering. and begin following edges backward from v: sihce v has at least one incoming edge (u. the crucial point. we need to prove the following fact. since u has at least one incoming edge (x. Thus we have proved the desired converse of (3. In such a case. We append the nodes of G. which is what . so we can apply the induction hypothesis to obtain a topological ordering of G. We declare a node to be "active" ff it has not yet been deleted by the algorithm.19) In every DAG G. This is clearly true for DAGs on one or two nodes. there will always be at least one such node available to delete. v). Let G be a directed graph in which every node has at least one incoming edge. . (3. we can achieve a running time of O(m + n) using the same highlevel algorithm--iteratively deleting nodes with no incoming edges. Proof.8 Starting from the graph in Figure 3.6 Directed Acyclic Graphs and Topological Ordering 103 (a) In fact. the existence of such a node v is all we need to produce a topological ordering of G by induction. We simply have to be more efficient in finding these nodes. The shaded nodes in each iteration are those with no incoming edges. To bound the running time of this algorithm. we note that identifying a node v with no incoming edges.19) guarantees. (3. ~ The inductive proof contains the following algorithm to compute a topological ordering of G. since all edges out of v will point forward. since every node we encounter has an incoming edge. this will prove the claim. this is safe. Specifically. The shaded nodes are those with no incoming edges. this is an ordering of G in which all edges point forward. m 3.{v} in this order after v. then G has a topological ordering. Delete v from G Recursively compute a topological ordering of G-{v} and append this order after u In Figure 3.7. then it is linear in the size of the input. can be done in O(n) time. We place v first in the topological ordering. given a DAG G on n + 1 nodes. the number of incoming edges that tv has from active nodes. there is a node v with no incoming edges. and we do tBis as follows. and so on. then.

between c and d. as vl. at the end of the schedule. while spending constant work per edge over the course of the whole algorithm. We have a set of possible configurations for the robots. we go through all nodes tv to which u had an edge. We’ll assume that the two starting nodes a and b are at a distance greater than r. or after both..c. v2 . representing the floor plan of a building.9.104 Chapter 3 Graphs At the start. subject to constraints on how we can move between configurations (we can only change one robot’s location to a neighboring node). Instead.c.9 How many topological orderings does this graph have? Solved Exercise 1 Consider the directed acyclic graph G in Figure 3. Let us define the following (larger) graph H. from one node to a neighboring node. all nodes are active.e a. in every topological ordering of G.. But t_his would take a while. The edge (c. 105 Solved Exercises Figure 3. How many topological orderings does it have? Solution Recall that a topological ordering of G is an ordering of the nodes.e a. and the other pair corresponds to an edge in G. After deleting u. u’ or v. The robot at node a wants to travel to node c along a path in G. and d can be arranged in the middle of the ordering. Proceeding in this way. then there are problems with interference among the transmitters. d). not in the original graph G but in the space of all possible configurations. This exhausts ~11 the possibilities. vn so that all edges point "forward": for every edge (vi. the first node in a topological ordering must be one that has no edge coming into it.robots occupy nodes that are at a distance < r from one another in the graph. and also subject to constraints on which configurations are "legal. u’ are equal. so we can initialize (a) and (b) with a single pass through the nodes and edges. H consists of a!! possible pairs of nodes in G. So one way to answer this question would be to write down all 5. This is accomplished by means of a schedule: at each time step.4. we have i < j. and so we conclude that there are three possible topological orderings: Solution This is a problem of the following general flavor.d. As we saw in the text (or reasoning directly from the definition). A schedule is interference-free if there is no point at which the two. We observe that our problem looks a lot like a path-finding problem. that is.. that is. each iteration consists of selecting a node u from the set S and deleting it.3. We join two nodes of H by an edge if they represent configurations that could be consecutive in a schedule.b. We are trying to get from a given starting configuration (a. (u. v) and (u’. Now we have to figure how the nodes b. then we add tv to the set S.e Solved Exercise 2 Some friends of yours are working on techniques for coordinating groups of mobile robots. and subtract one from the number of active incoming edges that we are maintaining for w. the node a must come first and the node e must come last. We can model this problem abstractly as follows. Suppose that we have an undirected graph G = (V.c. where we define a configuration to be a choice of location for each one. the last node must be one that has no edge leaving it. the current location of each one--it’s not clear what rule we should be using to decide how to move one of the robots next. Then. d) enforces the requirement that c must come before d. Thus. Solved Exercises with a base station. Analogously.d. So instead we apply an idea that can be very useful for situations in which we’re trying to perform this type of search.b. the schedule specifies that one of the robots moves across a single edge. and so are the two ending nodes c and d. we keep track of nodes that are eligible for deletion at all times.21 = 120 possible orderings and check whether each is a topological ordering. and there are two robots initially located at nodes a and b in the graph. we think about this as follows. E). but b can be placed anywhere relative to these two: before both. So a natural problem arises: how to plan the motion of the robots in such a way that each robot gets to its intended destination. Each robot has a radio transmitter that it uses to communicate ." This problem can be tricky to think about if we view things at the level of the underlying graph G: for a given configuration of the robots--that is. vj). and the robot from b should be sitting on d. for a given parameter r. and your friends find that if the robots get too close to one another.. a. The node set of H is the set of all possible configurations of the robots. and the robot at node b wants to travel to node d. c. b) to a given ending configuration (c. If this causes the number of active incoming edges tow to drop to zero. Give a polynomial-time algorithm that decides whether there exists an interference-free schedule by which each robot can get to its destination.b.d. u’) will be joined by an edge in H if one of the pairs u. the robot from node a should be sitting on c. but in the process the robots don’t come close enough together to cause interference problems.

6 for computing a topological ordering of a DAG repeatediy finds a node with no incoming edges and deletes it. provided that the ¯ input graph really is a DAG. Now we have H’. this final step takes polynomial time as well. The full algorithm is then as follows. For each node u in G. d). and at most n choices for (u.10 How many topoical orderings does it have? logical orderings does this graph have? Give an algorithm to detect whether a given undirected graph contains a cycle. We’ll leave this as a further exercise. d) in H. v’). we simply delete from H all nodes that correspond to configurations in which there would be interference. we need to consider the running time. Often when they return from a trip with specimens of butterf~es. inspired by the example of that great Cornellian. so there are at most 2n edges incident to each node of H’. thus establishing that a is a DAG. just one of them. Considhr the directed acyclic graph G in Figure 3. The nmning time of your algorithm should be O(m + n) for a directed graph with n nodes and m edges. and m denote the number of edges in G. b) to (c. d). at each step.9) we proved in Section 3. u). A simple upper bound says that there can be at most n choices for (u’. we have O(n3) edges. They’d like to divide the n specimens into two groups--those that belong to .10. To do this. v) and delete them from H. 3. Now we bound the time needed to construct H’. Exercises 1. Thus we define H~ to be the graph obtained from H by deleting all nodes (u. v) for each neighbor u’ of u in G. Extend the topological ordering algorithm so that. d) correspond to schedules for the robots: such a path consists precisely of a sequence of configurations in which. Vladimir Nabokov. Let n denote the number of nodes in G. some of your frien. Since H’ has O(n2) nodes and O(n~) edges./)) to (c. (2) bounding the time it takes to construct H’. and to (u. v’) for each neighbor v’ of v in G. The algorithm described in Section 3. We construct the graph H’. We list all these pairs (u. so the total time for this part is O(rnri + n2). (We can actually give a better bound of O(mn) on the number of edges in H~. b) to (c. given an input directed graph G. we have not yet encoded the notion that the schedule should be interference-free. So they decide to adopt the following approach. and we’re doing one from each node. H’ has at most nz nodes. We can do this as follows. by using the bound (3. it is very difficult for them to tell how many distinct species they’ve caught--thanks to the fact that many species look very similar to one another. it outputs one of two things: (a) a topological ordering. v) for which the distance between u and v in G is at most r. But suppose that we’re given an arbitrary graph that may or may not be a DAG. and the nodes in H’ correspond precisely to the configurations in which there is no interference. Finally.ds have become amateur lepidopterists (they study butterflies). b) to (c. we run a breadthfirst search from u and identify all nodes u within distance r of u. . one robot crosses a single edge in G. for a total of O(n3).106 Chapter 3 Graphs Exercises 107 We can already observe that paths in H from (a. First. Summing over the (at most) n2 nodes of H’. v) will have edges to (u’. The correctness of the algorithm follows from the fact that paths in H’ correspond to schedules. This will eventually produce a topological ordering. then your algorithm should output one. and then run the connectiviW algorithm from the text to determine whether there is a path from (a. how many edges does H’ have? A node (u.4 and those that belong to B--but it’s very hard for them to directly label any one specimen.Figure 3. thus establishing that a is not a DAG. One day they return with n butterflies.) 2. and (3) bounding the time it takes to search for a path from (a. since its nodes correspond to pairs of nodes in G. We first build H by enumerating all pairs of nodes in G in time O(n2). (It should not output all cycles in the graph. and so we just need to decide whether there is a path from (a. This can be done using the connectivity algorithm from the text in time that is linear in the number of nodes and edges of H’. and constructing edges using the defiNtion above in time O(n) per node. Now we need to figure out which nodes to delete from H so as to produce H’. However. and thfiy believe that each belongs to one of two different species. We’ll analyze the running time by doing three things: (1) bounding the size of H’ (which will in general be larger than G). which we’ll call A and B for purposes of this discussion. 1.3 on the sum of the degrees in a graph. Each breadth-first search in G takes time O(m + n). then. Now. If the graph contains a cycle. or (b) a cycle in G. let’s consider the size of H’.) The running time of your algorithm should be O(m + n) for a graph with n nodes and m edges. How many topolog.

" it is the case that i andj have the same label. So more concretely. u). of the distance between a and ~. then G cannot contain anY edges that do not belong to Exercises 109 the following property: at all times. w}.]) labeled "same. E) is the minimum number of edges in a path joining them. while apd(G) = [dist(u. and obtain a tree T that includes all nodes of G. Prove that G = T. then G is connected. and obtain the same tree T. as their human owners move around). and for each pair (i. We say that the diameter of G is the maximum distance between any pair of nodes. and so they’ve constrained the motion of the devices to satisfy Here’s a simple example to convince yourself that there are graphs G for which diam(G) # apd(G). We define apd(G) to be the average. They’d like to know if this data is consistent with the idea that each butterfly is from one of species A or B. If every node of G has degree at least hi2. when one of them realizes that you probably have an algorithm that would answer this question right away. where n is an even number.) Some friends of yours work on wireless networks. We have a connected graph G = (V.) What they’d like to know is: Does this property by itself guarantee that the network will remain connected? Here’s a concrete way to formulate the question as a claim about graphs. and there is an edge between device i and device j ff the physical locations of i andj are no more than 500 meters apart. w) + dist(u. They also have the option of rendering no judgment on a given pair. (We’ll assume n is an even number. That is. A number of stories In the press about the structure of the Internet and the Web have focused on some version of the following question: How far apart are typical nodes in these networks? If you read these stories carefully. E). they often jump back and forth between these concepts as though they’re the same thing. and they’re currently studying the properties of a network of n mobile devices. T. eac~ device i is within 500 meters of at least n/2 of the other devices. As in the text. A binary tree is a rooted tree in which each node has at most two children. we’]] denote this by dist(a. As the devices move around (actually. you find that many of them are confused about the difference between the diameter of a network and the average distance in a network. If they’re confident enough in their judgment. .108 Chapter 3 Graphs For each pair of specimens i and j. (In other words. v. Show by induction that in any binary tree the number of nodes with two children is exactly one less than the number of leaves. Decide whether you think the claim is true or false. Then diam(G) = dist(a." it is the case that i andj have different labels. we say that i and ] are "in range" of each other. v) + dist(a. and give a proof of either the claim or its negation.j) labeled "different. Give an algorithm with running time O(m + n) that determines whether the m judgments are consistent. they study them carefully side by side. which we’H ca]] the average pairwise distance In G (denoted apd(G)). and with the two edges {a. over all (~) sets of two distinct nodes a and u. and we’H denote this quantity by diam(G). then they 1abe! the pair (i. we say that the distance between two nodes a and v in a graph G = (V. as we]] as a collection of m judgments (either "same" or "different") for the pairs that were not declared to be ambiguous.]) either "same" (meaning they believe them both to come from the same species) or "different" (meaning they believe them to come from different species). w) = 2. they defIne a graph at any point in time as follows: there is a node representing each of the n devices. So now they have the collection of n specimens. in which case we’]] call the pair ambiguous. ~} and {v. and a specific vertex a ~ V. Claim: Let G be a graph on n nodes. They’re in the middle of tediously working out whether their judgments are consistent.) They’d like it to be the case that the network of devices is connected at all times. Let’s define a related quantity. if T is both a depth-first search tree and a breadth-first search tree rooted at a. Suppose we compute a depth-first search tree rooted at a. Suppose we then compute a breadth-first search tree rooted at a. (if so. Let G be a graph with ~ee nodes a. w)]/3 = 4/3. w. we’ll declare the m judgments to be consistent if it is possible to label each specimen either A or/3 in such a way that for each pair (i.

Suppose we are given an undirected graph G = (V. and so it’s natural to ask whether there s alway dose relation between them. 8). (Note that it is Okay for t~ to be equal to 6. t~) appears In the trace data). Thus the data is a sequence of ordered triples (Ci. government official. as we discussed in Section 3. Cn. consisting of a set of intricately rendered graphs. 10.. for example. and we identif3. these graphs encode the relationships among people involved in major political scandals over the past several decades: the nodes correspond to participants. to a shadowy arms dealer. In the case of Mark Lombardi’s graphs. such a triple indicates that Ci and Cj exchanged bits at time tk. and each edge indicates some type of relationship between a pair of participants. Suppose that an n-node undirected graph G = (V. to a former business partner. Of course.) Give an algorithm with runnin~ time O(m + n) to find such a node v. as people ponder what they mean. If Ci is infected by time tk. This rams out to be a problem that can be solved efficiently.) For example. (The algorithm should not list all the paths. C4. Give an algorithm that computes the number of shortest u-w paths In G.) The nmning time of your algorithm should be O(m + n) for a graph with n nodes and m edges. this would mean that Cj had open connections to both Ci and Cq at the same time.. labeled C1. these two numbers aren’t all that far apart in the case of S a this three-node graph.S. Building on a great deal of research. tk). not equal to either s or t. if one of the triples (Ci. (C3. Here’s a claim that tries to make this precise. So In addition to the problem of computing a single shortest v-w path in a graph G. a large number of short paths between u and w can be much more convincing. And the short paths that abound in these networks have attracted considerable attention recently. and give a proof of either the claim or its negation. two nodes v and w in G. (In other words. (Cl. Thus. C4. tD and (Cp Cq. just the number suffices. and so a virus could move from Ci to Ca. we’ll assume that each pair of computers communicates at most once during the interval you’re observing. could it possibly have infected computer Cb by time y? The mechanics of infection are simple: if an infected computer Ci communicates with an uninfected computer Cj at time t~ (in other words. where tk <_ tr. and as input you’re given a collection of trace data Indicating the times at which pairs of computers communicated. A number of art museums around the countts. social networks researchers have looked at the problem of determining the number of shortest u-w paths. it is the case that diam(G) apd(G) - Exercises 111 Decide whether you think the claim is true or false. ~q~There’s a natural intuition that two nodes that are far apart in a communication network--separated by many hops--have a more tenuous connection than two nodes that are close together. and the trace data contains triples (Ci. C2 . then Ca will become infected via C~. have nodes representing people and organizations. Show that there must exist some node u. E). to a bank in Switzerland. The security analysts you’re working with would like to be able to answer questions of the following form: If the virus was inserted into computer Ca at time x. There are m triples total. Here’s one that involves the susceptibiliw of paths to the deletion of nodes. you can trace out ominous-looking paths from a high-ranking U. There are n computers in the system. E) contains two nodes s and t such that the distance between s and t is strictly greater than n/2. (C2. Cp t~) or (Cj.. such that deleting v from G destroys all s-t paths. Ci. There are a number of algorithmic results that are based to some extent on different ways of making this notion precise.. 11. the trace data consists of the triples (Ci. they hint at the short set of steps that can carry you from the reputable to the disreputable. have been featuring work by an artist named Mark Lombardi (1951-2000). For purposes of simplicity. 4).1. . 8). suppose n = 4. a single. 12). You’re helping some security analysts monitor a collection of networked computers. Such pictures form striking examples of social networks. provided that no step in this sequence involves a move backward in time. tr). tracking the spread of an online virus. C2. Cj. Claim: There exists a positive natural number c so that for all connected graphs G.110 Chapter 3 Graphs Of course. spurious short path between nodes v and w in such a network may be more coincidental than anything else. C4. starting at time t~. the graph obtained from G by deleting v contains no path from s to t. then computer Ci becomes infected as well. Cj. Infection can thus spread from one machine to another across a sequence of communications. and edges representing relationships of various kinds. We’ll assume that the tTiples are presented to you in sorted order of time. which. And so. if you peer c!osely enough at the drawings.

and there has been interest in understanding properties of networks that span all these different domains. biological. encompassing both algorithmic and nonalgorithmic issues. From these interviews. We will see a number of these in subsequent chapters. the algorithm should decide whether a virus introduced at computer Ca at time x could have infected computer Cb by time y. grown through interest in graph representations of maps and chemical compounds in the nineteenth century. extensive data has become available for studying large networks that arise in the physical. with presentations aimed at a general audience. they’re not sure that all these facts are correct. Recently. person Vi died before person Pj was born. Each fact has one of the following two forms: * For some i and j. (C1. Design an algorithm that answers questions of this type: given a collection of trace data. C4. and Diestel (2000) provide substantial further coverage of graph theory. Notes and Further Reading The theory of graphs is a large topic. Then C3 would be infected at time 8 by a sequence of three steps: first C2 becomes ~ected at time 4. first as a branch of mathematics and later also through its applications to computer science. if the trace data were 8). The books by Barabasi (2002) and Watts (2002) discuss this emerging area of research. then C4 gets the virus from C2 at time 8. or it should report (correctly) that no such dates can exist--that is. The basic graph traversal techniques covered in this chapter have numerous applications. and social sciences. (C~0 Cz. and we refer the reader to the book by Tarjan (1983) for further results. they’ve collected by interviewing members of a village to learn about the fives of people who’ve lived there over the past two hundred years. The books by Berge (1976). It is generally considered to have begun with a paper by . Bollobas (1998). So what they’d like you to determine is whether the data they’ve collected is at least internally consistent. There is no sequence of communications moving forward in time by which the virus could get from C1 to C~ in this second example. Naturally. 12). Give an efficient algorithm to do this: either it should produce proposed dates of birth and death for each of the n people so that all the facts hold true. in the sense that there could have existed a set of people for which all the facts they’ve learned simultaneously hold. or o for some i and j. 14). and then G3 gets the virus from C4 at time 8. the facts collected by the ethnographers are not internally consistent. and a lot of this was passed down by word of mouth. the life spans of Vi and Pj overlapped at least partially. memories are not so good. Notes and Further Reading 113 and again the virus was inserted into computer C1 at time 2. then C3 would not become infected during the period of observation: although ¢z becomes infected at time 14. and emerged as a systematic area of study in the twentieth century. they’ve learned about a set of n people (all Vn" They’ve also collected facts about when these people lived relative to one another. Notes on the Exercises Exercise 12 is based on a result of Martin Golumbic and Ron Shamir.112 Chapter 3 Graphs and the virus was inserted into computer C1 at time 2. 12. we see that ¢~ only communicates with Cz before ~2 was infected. You’re helping a group of ethnographers analyze some oral history data Euler (1736). The algorithm sh6uld run in time O(rn + n). On the other hand.

finding cases in which they work well. Greed works. there is a local decision role that one can use to construct optimal solutions. Greed is right. it typically implies something interesting and useful about the structure of the problem itself. Michael Douglas gets up in front of a room full of stockholders and proclaims. By this we mean that if one measures the greedy algorithm’s progress . "Greed. These are the kinds of issues we’ll be dealing with in this chapter. The first two sections of this chapter will develop two basic methods for proving that a greedy algorithm produces an-optimal solution to a problem. even if it does not achieve the precise optimum.. the same is true of problems in which a greedy algorithm can produce a solution that is guaranteed to be close to optimal. we’ll be taking a much more understated perspective as we investigate the pros and cons of short-sighted greed in the design of algorithms. in Chapter 11. our aim is to approach a number of different computational problems with a recurring set of questions: Is greed good? Does greed work? It is hard. choosing a decision at each step myopically to optimize some underlying criterion.. each one locally. When a greedy algorithm succeeds in solving a nontrivial problem optimally. And as we’ll see later.Greedy A~gorithms In Wall Street. It’s easy to invent greedy algorithms for almost any problem. to define precisely what is meant by a greedy algorithm. incrementally optimizing some different measure on its way to a solution. and proving that they work well. Indeed. An algorithm is greedy if it builds up a solution in sma!l steps. that iconic movie of the 1980s. is the interesting challenge." In this chapter. is good. One can often design many different greedy algorithms for the same problem. if not impossible. One can view the first approach as establishing that the greedy algorithm stays ahead.

s(i) is as small as possible. one sees that it does better than any other algorithm at each step. For example. the request for which f(i). This change of notation will make things easier to talk about in the proofs. In this case our greedy method would accept a single request.1 Interval Scheduling: The Greedy Mgorithm Stays Ahead 116 Chapter 4 Greedy Algorithms 117 in a step-by-step fashion. The most obvious rule might be to always select the available request that starts earliest--that is. In a really bad case--say. We continue in this fashion until we run out of requests. I (a) l ~ Designing a Greedy Algorithm Using the Interval Scheduling Problem. it does not work to select the interval that starts earliest. the one with minimal start time s(i).1 (a). Compatible sets of maximum size will be called optimaL. it will follow that the greedy algorithm must have found a solution that is at least as good as any other solution. They each provide nice examples of our analysis techniques. 2 . Once a request il is accepted. and our goal is to accept as large a compatible subset as possible. We also explore an interesting relationship between minimum spanning trees and the long-studied problem of clustering. This might suggest that we should start out by accepting the request that requires the smallest interval of time--namely. the MinimumCost Arborescence Problem. . We have a set of requests {1. we focus on several of the most well-known applications of greedy algorithms: shortest paths in a graph. which form an optimal solution. in Figure 4..) We’ll say that a subset of the requests is compatible if no two of them overlap in time. but it still can produce a suboptimal schedule. we consider a more complex application.4. Let’s try to think of some of the most natural rules and see how they work. Again. it does not work to select the interval with the fewest conflicts. As it turns out. the Minimum Spanning Tree Problem. (Note that we are slightly changing the notation from Section 1. Following our introduction of these two styles of analysis. accepting the short interval in the middle would prevent us from accepting the other two. Finally. 4. and in (c). lI (c) Figure 4. where we used si rather than s(i) and fi rather than f(i). The cha!lenge in designing a good greedy algorithm is in deciding which simple Me to use for the selection--and there are many natural rules for this problem that do not give good solutions. and it is more general: one considers any possible solution to the problem and gradually transforms it into the solution found by the greedy algorithm without hurting its quality. it then follows that it produces an optimal solution. n}.1 Some instances of the Interval Scheduling Problem on which natural greedy algorithms fail to find the optimal solution. then by accepting request i we may have to reject a lot of requests for shorter time intervals. while the optimal solution could accept many.!(b). The basic idea in a greedy algorithm for interval scheduling is to use a simple rule to select a first request il. it does not work to select the shortest interval. when the finish time f(i) is the maximum among al! requests--the accepted request i keeps our resource occupied for the whole time. this is a somewhat better rule than the previous one. we will end up with a suboptimal solution. which was the first of the five representative problems we considered in Chapter 1. which further extends our notion of what a greedy algorithm is. If the earliest request i is for a very long interval. This method does not yield an optimal solution. In (a). Such a situation is depicted in Figure 4. This way our resource starts being used as quickly as possible.1 Interval Scheduling: The Greedy Algorithm Stays Ahead Let’s recall the Interval Scheduling Problem. and again reject all requests that are not compatible with i2..2. in (b). we reject all requests that are not compatible with We then select the next request i2 tO be accepted. and the construction of Huff-man codes for performing data compression. Since our goal is to satisfy as many requests as possible. The second approach is known as an exchange argument. the ith request corresponds to an interval of time starting at s(i) and finishing at f(i)... we can make our discussion of greedy algorithms much more concrete.

The unique optimal solution in this example is to accept the four requests in the top row.. What we need to show is that this solution is optimal. it is certainly not obvious that it returns an optimal set of intervals. In fact. This is also quite a natural idea: we ensure that our resource becomes free as soon as possible while still satisfying one’ request. in the order of the start and finish points.. that is. (4.1 Interval Scheduling: The Greedy Algorithm Stays Ahead 118 Chapter 4 Greedy Mgofithms 119 o In the previous greedy rule.k.. Indeed. For an example of how the algorithm runs. we count the number of other requests that are not compatible. Let us state the algorithm a bit more formally. Let il . which implies that the start points have the same order as the finish points. Note that IAI --. ik be the set of requests in A in the order they were added to A. for purposes of comparison. and use A to denote the set of accepted requests. ~ Analyzing the Algorithm While this greedy method is quite natural. we select the interval with the fewest "conflicts. Note that the requests in (9 are compatible. Ideally one might want to show that A = (9. We could design a greedy algorithm that is based on this idea: for each request. it is quite a bit harder to design a bad example for this rule. and show that the greedy algorithm is doing better in a step-by-step fashion. Assume that the requests in (9 are also ordered in the natural left-toright order of the corresponding intervals. In this way we can maximize the time left to satisfy other requests. our problem was that the second request competes with both the first and the third--that is. .") This greedy choice would lead to the optimum solution in the previous example. At each step the selected intervals are darker lines. Our goal is to prove that k = m. that A contains the same number of intervals as (9 and hence is also an optimal solution. and accept the request that has the fewest number of noncompatible requests... We will compare the partial solutions that the greedy algorithm constructs to initial segments of the solution (9. see Figure 4. as we suggested initially.4. but this is too much to ask: there may be many optimal solutions. The greedy method suggested here accepts the middle request in the second row and thereby ensures a solution of size no greater than three. the reques~t i for which f(i) is as small as possible. we can immediately declare that the intervals in the set A returned by the algorithm are all compatible. We introduce some notation to help with this proof. Jrn. Similarly. and we’ve drawn an example in Figure 4.1 (c). it would only be sensible to reserve judgment on its optimality: the ideas that led to the previous nonoptimal versions of the greedy method also seemed promising at first.2 Sample run of the Interval Scheduling Algorithm. A greedy rule that does lead to the optimal solution is based on a fourth idea: we should accept first the request that finishes first..2. and at best A is equal to a single one of them. Initially let R be the set of all requests. So. let the set of requests in (9 be denoted by jl . (In other words. and let A be empty While R is n6t yet empty Choose a request i ~R that has the smallest finishing time Add request i to A Delete all requests from R that axe not compatible with request i EndWhile Keturn the set A as the set of accepted requests 8 Intervals numbered in order 1 2 3 4 5 7 8 9 Selecting interval 1 ~ 1 t 3 ! 4 I 5 t 9 7 Selecting interval 3 -- ~ ~ 3t I I 5I 7 t 9~ Selecting interval 5 ~ ~ 1 3I I 5I I 8 t------4 9I 8 Selecting interval Figure 4... We will use R to denote the set of requests that we have neither accepted nor rejected yet. As a start. accepting this request made us reject two other requests. but it can be done. and the intervals deleted at the corresponding step are indicated with dashed lines. The idea underlying the proof. that is.1) A is a compatible set of requests. that is. wil! be to find a sense inwhich our greedy algorithm "stays ahead" of this solution (9. So instead we will simply show that ]A] = 1(91. let (9 be an optimal set of intervals.

and hence after ik ends. the rth accepted request in the algorithm’s schedule finishes no later than the rth request in the optimal schedule.1 Interval Scheduling: The Greedy Algorithm Stays Ahead 121 I~ lr-1 Jr-1 C~an the greedy algorithm’s interval really finish later?) ir ? I (4. and it is only supposed to stop when R is empty--a contradiction. This completes the induction step. spending constant time per interval. the set of possible requests R still contains Jk+l. More generally. But there’s a simple reason why this could not happen: rather than choose a later-finishing interval. of course. We know (since (9 consists of compatible intervals) that f(Jr-1) -< s(Jr). that is. n] with the property that S[i] contains the value s(i). This is the sense in which we want to show that our greedy rule "stays ahead"--that each of its intervals finishes at least as soon as the corresponding interval in the set O. As shown in Figure 4. For r = 1 the statement is clearly true: the algorithm starts by selecting the request i1 with minimum finish time. . This takes time O(n log n). then an optimal set (9 must have more requests. Extensions The Interval Scheduling Problem we considered here is a quite simple scheduling problem.3) The greedy algorithm returns an optimal set A.In order for the algorithm’s rth interval not to finish earlier as well. we then select this one as well. the induction hypothesis lets us assume that f(ir_1) _< f(Jr-1). We now select requests by processing the intervals in order of increasing f(i). without knowledge of future input. So after deleting all requests that are not compatible with requests ik. Applying (4. our greedy rule guarantees that f(i1) < f(Jl).. Thus we now prove that for each r >_ !. we get that f(ik) < f(Jk). and we will try to prove it for r. Since m > k.120 Chapter 4 Greedy Algorithms 4. we have f(ir) < f(Jr). we get f(ir_1) < s(Jr). We can make this argument precise as follows. We now see why this implies the optimality of the greedy algorithm’s set A. [] Implementation and Running Time We can make our algorithm run in time O(n log n) as follows. In defining the problem. we construct an array S[1. it would need to "fall behind" as shown.1. We begin by sorting the n requests in order of finishing time and labeling them in this order. I Figure 4. to think about the version of the problem in which the scheduler needs to make decisions about accepting or rejecting certain requests before knowing about the full set of requests. We will prove this statement by induction. We will assume as our induction hypothesis that the statement is true for r. The following point out issues that we will see later in the book in various forms. We will prove the statement by contradiction. there is a request Jk+~ in (9. which must make decisions as time proceeds.3. Proof.3 The inductive step in the proof that the greedy algorithm stays ahead. In this way. Combining this with the induction hypothesis f(ir_1) < f(jr-1). This request starts after request jk ends. z Thus we have formalized the sense in which the greedy algorithm is remaining ahead of (9: for each r. In an additional O(n) time. if the most recent interval we’ve selected ends at time f. we assumed that all requests were known to the scheduling algorithm when it was choosing the compatible subset. Thus the interval Jr is in the set R of available intervals at the time when the greedy algorithm selects The greedy algorithm selects the available interval with smallest finish time. We always select the first interval. But the greedy algorithm stops with request ik.2) with r = k. we implement the greedy algorithm analyzed above in one pass through the intervals. (4. we then iterate through the intervals in order until reaching the first interval ] for which s(j) > f(1).2) For all indices r < k rye have f (ir) <_ [(Jr)- Proof. the greedy algorithm always has the option (at worst) of choosing jr and thus fulfilling the induction step. since interval Jr is one of these available intervals. that is. Our intuition for the greedy method came from wanting our resource to become flee again as soon as possible after satisfying the first request. An active area of research is concerned with such online algorithms.. we will assume that f(i) < f(j) when i <j. we must have m > k. Thus this part of the algorithm takes time O(n). There are many further complications that could arise in practical settings. It would also be natural. If A is not optimal. And indeed. Now let r > 1. we continue iterating through subsequent intervals until we reach the first ] for which s(J) _> f. Customers (requestors) may well be impatient. and they may give up and leave if the scheduler waits too long to gather information about all other requests. the rth interval it selects finishes at least as soon as the rth interval in (9.

and the goal would be to maximize our income: the sum of the values of all satisfied requests.4(b). one can make this last argument in general for any instance of Interval Partitioning. We wish to satisfy a!l these requests.. "long-range" obstacles that push the number of required resources even higher. Now. the second of the representative problems we described in Chapter 1. consider the sample instance in Figure 4. The requests in this example can all be scheduled using three resources. It’s not immediately clear that there couldn’t exist other. We now discuss one of these further variants in more detail. Interval Scheduling: The Greedy Algorithm Stays Ahead 123 Our goal was to maximize the number of satisfied requests. assigned to the first resource. we will see a different application of this problem in which the intervals are routing requests that need to be allocated bandwidth on a fiber-optic cable. the interval requests could be iobs that need to be processed for a specific period of time. For example.. Proof. (b) A solution in which all intervals are scheduled using three resources: each row represents a set of intervals that can all be scheduled on a single resource.4 (a) An instance of the Interval Partitioning Problem with ten intervals (a through j). Suppose we define the depth of a set of intervals to be the maximum number that pass over any single point on the time-line. A Related Problem: Scheduling All Intervals The Problem In the Interval Scheduling Problem. We need at least three resources since. which turn out to be closely related. intervals a.. Much later in the book. where the requests are rearranged into three rows. in Chapter 10. Equivalently. one can imagine a solution using k resources as a rearrangement of the requests into k rows of nonoverlapping intervals: the first row contains all the intervals c I b a 1 e d I lj l h I i (a) II Figure 4. Then we claim (4. In general. a positive answer to this second question would say that the only obstacles to partitioning intervals are purely local--a set of intervals all piled over the same point. so we must choose which requests to accept and which to reject. As an illustration of the problem. this is indicated in Figure 4. In fact. Suppose a set of intervals has depth d. . is there any hope of using just two resources in this sample instance? Clearly the answer is no. 1 The problem is also referred to as the Interval Coloring Problem. b. there is a single resource and many requests in the form of time intervals. There are many other variants and combinations that can arise. so the whole instance needs at least d resources. we will refer to this as the Interval Partitioning Problem) For example. Then each of these intervals must be scheduled on a different resource. can we design an efficient algorithm that schedules all intervals using the minimum possible number of resources? Second. A related problem arises if we ha{re many identical resources available and we wish to schedule all the requests’ using as few resources as possible.1. Id all pass over a common point on the time-line. each containing a set of nonoverlapping intervals. since it forms another case in which a greedy algorithm can be used to produce an optimal solution.122 Chapter 4 Greedy Algorithms 4. Because the goal here is to partition all intervals across multiple resources.. and the basic constraint is that any two lectures that overlap in time must be scheduled in different classrooms. and the resources are machines capable of handling these jobs. using as few classrooms as possible. and hence they all need to be scheduled on different resources. and so forth. each request i could also have a value vi (the amount gained by satisfying request i). is there always a schedule using a number of resources that is equal to the depth? In effect. suppose that each request corresponds to a lecture that needs to be scheduled in a classroom for a particular interval of time. the terminology arises from thinking of the different resources as having distinct colors--al~ the intervals assigned to a particular resource are given the corresponding color. This leads to the Weighted Interval Scheduling Problem. [] We now consider two questions. But we could picture a situation in which each request has a different value to us. and c all pass over a common point on the time-line. for example.4(a). the number of resources needed is at least the depth of the set of intervals. and let 11 . First. the second row contains all those assigned to the second resource.4) In any instance of Interval Partitioning. The classrooms at our disposal are thus the multiple resources.

We sum this up as follows. where the labels come from the set of numbers {1.. (4. We go through the intervals in this order.6) The greedy algorithm above schedules every interval on a resource. (4. and no two overlapping intervals will receive the same label.12 .!. breaking ties arbitrarily Let I1. every interval will be assigned a label. and so there is a label that can be assigned to Ij. Each accepted request must be assigned an interval of time of length t~. Thus t < d ... In denote the intervals in this order For j=1. and try to assign to each interval we encounter a label that hasn’t already’ been assigned to any previous interval that overlaps it. 4. Essentially.2. "structural" bound asserting that every possible solution must have at least a certain value. Proof. and some are computationally much more difficult than Analyzing the Algorithm We claim the following. no solution could use a number of resources that is smaller than the depth.. consequently. Designing the Algorithm Let d be the depth of the set of intervals. 2 .d} that has not been excluded then Assign a nonexcluded label to Else Leave 11 unlabeled Endif Endfor /]). There are many objective functions we might seek to optimize when faced with this situation.. the algorithm will not assign to I’ the label that it used for I. Indeed.. the request i has a deadline di.. form a set of t + 1 intervals that all pass over a common point on the time-line (namely.4) to conclude that it is.. In contrast to the previous problem. you can never reach a point where all the labels are currently in use. consider any two intervals I and I’ that overlap.2 Scheduling to Minimize Lateness: An Exchange Argument 125 We now design a simple greedy algorithm that schedules al! intervals using a number of resources equal to the depth. and different requests must be assigned nonoveflapping intervals. always using the minimum possible number of labels. the proof that this algorithm is optimal will require a more sophisticated kind of analysis.3 . It follows that at least one of the d labels is not excluded by this set of t intervals.. I is in the set of intervals whose labels are excluded from consideration. ~ The Problem Consider again a situation in which we have a single resource and a set of n requests to use the resource for an interval of time.5) If we use the greedy algorithm above. if you have d labels at your disposal. Then when I’ is considered by the algorithm. Assume that the resource is available starting at time s. The algorithm we use for this is a simple one-pass greedy strategy that orders intervals by their starting times.. and the label of each interval as the name of the resource to which it is assigned. using a number of resources equal to the depth of the set of intervals. assigning an available label to each interval you encounter. Next we claim that no two overlapping intervals are assigned the same label. Consider one of the intervals Ij.2 Scheduling to Minimize Lateness: An Exchange Argument We now discuss a schedt~ng problem related to the one with which we began the chapter.... then as you sweep through the intervals from left to right. 2 . and the assignment has the property that overlapping intervals are labeled with different numbers.. For each interval Ii that precedes Ii in sorted order and overlaps it Exclude the label of Endfor If there is any label from {I. we can use (4. and then one shows that the algorithm under consideration always achieves this bound. however.. Sort the intervals by their start times. since we can interpret each number as the name of a resource. the start time of . The analysis of our algorithm wil! therefore illustrate another general approach to proving optimality: one finds a simple. First let’s argue that no interval ends up unlabeled. Instead of a start time and finish time. and it requires a contiguous time interval of length ti.4). Since our algorithm is using d labels. we show how to assign a label to each interval.. and suppose there are t intervals earlier in the sorted order that overlap it. but it is willing to be scheduled at any time before the deadline. This gives the desired solution. each request is now more flexible. Despite the similarities in the problem formulation and in the greedy algorithm to solve it. This immediately implies the optimality of the algorithm: in view of (4. and so t + 1 < d. These t intervals. Specifically.124 Chapter 4 Greedy Algorithms 4. in fact. This is the optimal number of resources needed. d}. we have the following description. together with Ij. [] The algorithm and its analysis are very simple. and suppose I precedes I’ in the sorted order.

but we are allowed to let certain requests run late.126 Chapter 4 Greedy Algorithms Length 1 Deadline 2 4.2 Scheduling to Minimize Lateness: An Exchange Argument 127 Length 2 Length 3 Deadline 4 Deadline 6 . Earlier we were skeptical of the approach that sorted by length on the grounds that it threw away half the input data (i. so as to get the short jobs out of the way quickly.. The goal in our new optimization problem will be to schedule all requests. an equally basic greedy algorithm that always produces an optimal solution.t~ is very small--they’re the ones that need to be started with minimal delay. Earliest Deadline First does produce optimal solutions. Thus. There is.) There is an intuitive basis to this rule: we should make sure that jobs with earlier deadlines get completed earlier. Then the second job has to be started right away if we want to achieve lateness L = 0. and so we will refer to our requests as jobs. o The previous example suggests that we should be concerned about jobs whose available slack time d.~ There are several natural greedy approaches in which we look at the data (t~. Figure 4. algorithm. 3 incurs a maximum lateness of O. and the first job would incur a lateness of 9. We will use/: to denote the finishing time of the last scheduled job. Job 1 will start at time s = s(1) and end at time/:(1) = s(1) + tl. that is. with f(i) = s(i) + ti. And indeed. and the third has t3 = 3 and d3 = 6. So a more natural greedy algorithm would be to sort jobs in order of increasing slack di . and schedule them in this order. the algorithm must actually determine a start time (and hence a finish time) for each interval. if we schedule the first job first.Jdn Initially. and we will now prove this. beginning at our overall start time s. (This nile is often called Earliest Deadline First. Unlike the previous problem. if f(i) > di. the second has tz = 2 and d2 = 4. <_ dn. First we specify some notation that will be useful in talking about the algorithm. nine units beyond its dead~ne. since it completely ignores the deadlines of the jobs. the deadlines). Sorting by increasing slack would place the second job first in the schedule. .. Unfortunately. then. 2. We say that a request i is late if it misses the deadline. however. (It finishes at time 11. f= s . Consider a two-job instance where the first job has q = 1 and d~ = 2. Here we consider a very natural goal that can be optimized by a greedy. using nonoverlapping intervals. We wil! say that li = 0 if request i is not late. The lateness of such a request i is defined to be li = f(i) . and so forth. so as to minimize the maximum lateness. then it finishes on time and the second job incurs a lateness of only 1. Suppose that we plan to satisfy each request. Job 2 wfl! start at time s(2) =/:(1) and end at time/:(2) = s(2) + t2.di.ob 3 I Solu~on: I [ Job 1: " Job 2: Job 3: done at time 1 done at time 1 + 2 = 3 done at time 1 + 2 + 3 = 6 Figure 4. f(/)]. we have dI <_ . Order the jobs in order of their deadlines Assume for simplicity of notation that dlj . Nevertheless. others.. we can assume that the jobs are labeled in the order of their deadlines.) On the other hand. and scheduling the second job first is indeed the optimal solution. let s be the start time for all jobs. it’s a little hard to believe that this algorithm always produces optimal solutions-specifically because it never looks at the lengths of the jobs.e. while the second job has t2 = 10 and d2 = 10. that is.ti. let us denote this interval by [s(i). By renaming the jobs if necessary. consisting of three iobs: the first has length tl = 1 and deadline dl= 2.5 A sample instance of scheduling to minimize lateness. L = maxi li. di) about the jobs and use this to order them according to some simple nile. we will assign each request i an interval of time of length ti. . At the same time. This immediately We will simply schedule all jobs in this order. One approach would be to schedule the jobs in order of increasing length o t~. looks too simplistic. We simply sort the jobs in increasing order of their deadJ~es d~. but now we’re considering a solution that throws away the other half of the data. ~ Designing the Algorithm What would a greedy algorithm for this problem look like. This problem arises naturally when scheduling jobs that need to use a single machine.5 shows a sample instance of this problem.!0. Again. It is not hard to check that scheduling the iobs in the order 1. consider a two-job instance where the first job has t1 = 1 and dl= 100. this greedy rule fails as well. We write this algorithm here. while the second job has t2 = 10 and d2 ---..

If there are jobs with identical deadlines then there can be many different schedules with no inversions. The hardest part of this proof is to argue that the inverted schedule is also optimal. there is an optimal schedule (9 with no idle time. but they can only differ in the order in which jobs with identical deadlines are scheduled. Our plan here is to gradually modify (9. we can show that all these schedules have the same maximum lateness L. the schedule A produced by our algorithm has no inversions. its maximum lateness L is as small as possible? As in previous analyses. ~ Analyzing the Algorithm To reason about the optimality of the algorithm.9) There is an optimal schedule that has no inversions and no idle time. and no new inversions are created. then they might not produce exactly the same order of jobs. swapping a pair of consecutive. but eventually transforming it into a schedule that is identical to the schedule A found by the greedy algorithm. We refer to this type of analysis as an exchange argument. preserving its optimality at each step.. If two different schedules have neither inversions nor idle time. how can we prove that our schedule A is optimal. Consider such a deadline d.7). by definition. The proof will consist of a sequence of statements.. If we advance in the scheduled order of jobs from a to b one at a time.. It is clear that if we can prove (c). we will start with any optimal schedule having no idle time. Not only does the schedule A produced by our algorithm have no idle time. This corresponds to a pair of consecutive iobs that form an inversion.. (a) If (9 has an inversion. and by (a). (4.-~Fo do this. we do not increase the maximum lateness L of the schedule. n in this order Assign job i to the time interval from s(f)----/ to f(O=f+ti Let f = f + ti End Keturn the set of scheduled intervals [s(O. it is also very easy to see that there is an optimal schedule with this property..The initial schedule 0 can have at most (~) inversions (if all pairs are inverted). consider an inversion in which a iob a is scheduled sometime before a iob b. we will then convert it into a schedule with no inversions without increasing its maximum lateness. n 4.. So we now conclude by proving (c). Proof. showing that b~. (4. the last one has the greatest lateness. We do not write down a proof for this. In both schedules. Among the jobs with deadline d.8) All schedules with no inversions and no idle time have the same maximum lateness. We first try characterizing schedules in the following way. f(r)] and has lateness l’r. We say that a schedule A’ has an inversion if a job i with deadline di is scheduled before another job j with earlier deadline d] < di. (c) The new swapped schedule has a maximum lateness no larger than that of O. We wil! decrease the number of inversions in 0 by swapping the requests i andj in the schedule O.7) There is an optimal schedule with no idle time. (4. this inversion is eliminated by the swap.2 Scheduling to Minimize Lateness: An Exchange Argument 129 ~he main step in showing the optimality of our algorithm is to establish that there is an optimal schedule that has no inversions and no idle time. Thus the resulting schedt~ng after this conversion will be optimal as wel!. and da > db.. the jobs with deadline d are all scheduled consecutively (after all jobs with earlier deadlines and before all jobs with later deadlines). that is. we first observe that the schedule it produces has no "gaps’--times when the machine is not working yet there are iobs left. r l’r denote . However. [] Proof of (c). inverted jobs.128 Chapter 4 Greedy Algorithms Consider the jobs i=l . Now. then we are done. The time that passes during a gap will be called idle time: there is work to be done. and hence after at most (~) swaps we get an optimal schedule with no inversions./(0] for f= 1 . Ihdeed. Let L’r = the maximum lateness of this schedule.. Thus we have (b) After swapping i and ] we get a schedule with one less inversion. then there is a pair of jobs i and j such that j is scheduled immediately after i and has d] < di. We invent some notation to describe the schedule (9: assume that each request ismax scheduled for the time interval [s(r). and this lateness does not depend on the order of the jobs. The first of these is simple to establish. there has to come a point at which the deadline we see decreases for the first time. and we will see that it is a powerful way to think about greedy algorithms in general. Proof. [] Now suppose (9 has at least one inversion. we wi~ start by considering an optimal schedule (9. The pair (i. Notice that.. By (4. let i andj be a pair of inverted requests that are consecutive in the scheduled order. j) formed an inversion in (9. yet for some reason the machine is sitting idle.

(Do you see where?) Unfortunately. dI di Figure 4. Now by (4. and what if this actually raises the maximum lateness of the whole schedule? After the swap. sometime between 1 P. But the crucial point is that i cannot be more late in the schedule -~ than j was in the schedule (9.10) The schedule A produced by the greedy algorithm has optimal maximum lateness L.. when job j was finished in the schedule (9. job j will get finished earlier in the new schedule. and hence the swap does not increase the lateness of job j.~ The Problem To motivate caching. m (a) After swapping: Extensions There are many possible generalizations of this scheduling problem. If job i is late in this new schedule. its lateness is ~i = ~(i) .. but harder. 4. and when should you return some in exchange for others. ~. and ~ to denote the corresponding quantities in the swapped schedule. inverted jobs i and]. Statement (4. The problem is that of cache maintenance.M.M. and we develop an aig~~rithm whose analysis requires a more subtle use of the exchange argumem. [] The optimality of our greedy algorithm now follows immediately. This earliest possible starting time is usually referred to as the release time. as we wi!l see later in the book. Now recall our two adiacent..3 Optimal Caching: A More Complex Exchange Argument We now consider a problem that involves processing a sequence of requests of a different form. Since the lateness of the schedule (9 was L’ >_ lj > ~. Problems with release times arise natura!ly in scheduling problems where requests can take the form: Can I reserve the room for a two-hour lecture.di = f(j) -di. The situation is roughly as pictured in Figure 4. job i finishes at time f(j)..? Our proof that the greedy algorithm finds an optimal solution relied crucially on the fact that all jobs were available at the common start time s. For example. would also have an earliest possible starting time ri. consider the following situation. Proof. you’d like to have ready access to the eight books that are most relevant at that tirng.. and your draconian library will only allow you to have eight books checked out at once.6. inverted jobs. in addition to the deadline dz and the requested time t~.6 The effect of swapping two consecutive. in Chapter 8. How should you decide which books to check out. Specifically. You’re working on a long research paper. A natural. and 5 P. ~r. and so the schedule obtained by the greedy algorithm is optimal. Moreover.9) proves that an optimal schedule with no inversions exdsts. of i and ~ e affected by the swap.di = l~. this more general version of the problem is much more difficult to solve optimally. we will use ~(r). Thus the only thing to worry about is job i: its lateness may have been increased. our assumption d~ > dj implies that ~ = f(]) -di < f(]) . version of this problem would contain requests i that. . we assumed that all jobs were available to start at the common start time s. ~(r). Let ~ denote the swapped schedule.8) all schedules with no inversions have the same maximum lateness..131 I Before swapping: Oarnly the finishing times. but dt any point in time. Thus all jobs other than jobs i and ] finish at the same time in the two schedules. (4. this shows that the swap does not increase the maximum lateness of the schedule. The finishing time of] before the swap is exactly equal to the finishing time of i after the swap. You know that you’ll probably need more than this over the course of working on the paper. to minimize the number of times you have to exchange a book at the library? This is precisely the problem that arises when dealing with a memory hierarchy: There is a small amount of data that can be accessed very quickly.

b. and the disk acts as a cache for the Internet. for example.e. if the cache is ful!. one concludes that any eviction schedule for this sequence must include at least two cache misses. Of course. we take an abstract view of the problem that underlies most of these settings. Suppose we have three items [a. a cache maintenance algorithm determines an eviction schedule--specifying which items should be evicted from the cache at which points in the sequence--and t_his determines the contents of the cache and the number of misses over time. the fact that it is optimal on all sequences is somewhat more subtle than it first appears. since going to disk is stil! much faster than downloading something over the Internet. that can hold k < n pieces of data at any one time. Suppose that the cache initially contains the items a and b. which in turn is smaller and faster to access than disk. When it is time to evict something. and go to disk as infrequently as possible. we are required to bring it from main memory into the cache and.d. otherwise. and we are presented with the sequence a..a. d2 .b. This is another level of hierarchy: smal! caches have faster access time than main memory. And one can see extensions of this hierarchy in many other settings. d2 . data in the main memory of a processor can be accessed much more quickly than the data on its hard disk. as opposed. systems researchers very early on sought to understand the nature of the optimal solution to the caching problem..a. though. on a particular sequence of memory references. we thereby incur two cache misses over the whole sequence.d.d. For our purposes here. as the caching problem arises in different settings. In the previous examples.b. and choose the one for which this is as late as possible. After thinking about it.. Then on the third item in the sequence.b.. To achieve this. A sequence of data items D = dl. We consider a set U of n pieces of data stored in main memory. a cache maintenance algorithm determines what to keep in the cache and what to evict from the cache when new data needs to be brought in.c. what is the eviction schedule that incurs as few cache misses as possible? ~ Designing and Analyzing the Algorithm In the 1960s. the on-chip cache reduces the need to fetch data from main memory. we can access it very quickly if it is already in the cache.b. and the assorted facts you’re able to remember without looMng them up constitute a cache for the books on your desk. dm drawn from U is presented to us--this is the sequence of memory references we must process-and in processing them we must decide at all times which k items to keep in the cache. Memory hierarchies have been a ubiquitous feature of computers since very early in their history. we could evict a so as to bring in c. At the same time. and you must decide which pieces of data to have close at hand.. Thus. This is called a cache miss. evict the item that is needed the farthest into the future We will call this the Farthest-in-Future Algorithm. we look at the next time that each item in the cachewill be referenced. it involves various different considerations based on the underlying technologY.c.3 Optimal Caching: A More Complex Exchange Argument 133 and a large amount of data that requires more time to access. Les Belady showed that the following simple rule will always incur the minimum number of misses: When di needs to be brought into the cache. Thus. When one uses a Web browser. c]. qualitatively. it is important to keep the most regularly used pieces o~ data in main memory.c. and so data can be retrieved from cache much more quickly than it can be retrieved from main memory. cache maintenance algorithms must process memory references dl. These can be accessed in a few cycles. but the disk has much more storage capaciW. the disk often acts as a cache for frequently visited Web pages. We also have a faster memory. occurs with on-chip caches in modern processors. to the one that will be used least frequently in the future? Moreover. This is a very natural algorithm. the main memory acts as a cache for the disk. and on the sixth item we could evict c so as to bring in a. Let’s consider an example of this process.a. Caching is a general term for the process of storing a small amount of dat~ in a fast memory so as to reduce the amount of time spent interacting with a slow memory. and we want to have as few of these as possible. but for purposes of evaluating the quality of these algorithms. (Much as your desk acts as a cache for the campus library. consider a sequence like a.) For caching to be as effective as possible. it should generally be the case that when you go to access a piece of data. We will assume that the cache initially holds some set of k items. To begin with. Given a fifll sequence S of memory references. it is already in the cache.. the cache size is k = 2. to evict some other piece of data that is currently in the cache to make room for di.b. When item di is presented.132 Chapter 4 Greedy Algorithms 4. the cache. Under real operating conditions.c .. The same phenomenon. without knowledge of what’s coming in the future. Why evict the item that is needed farthest in the future.

the cache miss incurred by ~ in step j can be charged to the earlier cache operation performed by S in step i. from request j + 2 onward. As a first step. Let S be a schedule that may not be reduced. and let S* denote a schedule that incurs the minimum possible number of misses. if d needs to be brought into the cache. Here is the basic fact we use to perform one step in the transformation. and now the caches of S and S’ are the same. Once the caches are the same. Proving the Optimalthy of Farthest-in-Future We now proceed with the exchange argument showing that Farthest-in-Future is optimal. and to do this S evicts item f while SEE evicts item e ~ f. We can then have S’ behave exactly like S for the rest of the sequence. Since S and SEE have agreed up to this point. then no eviction decision is necessary (both schedules are reduced). since S and S’ have slightly different caches from this point onward. Here S and SEE do not already agree through step j + ! since S has e in cache while SEE has f in cache. incurring the same number of misses. let S~F denote the schedule produced by Farthest-in-Future. and we can set S’ = S. Consider the schedule S’ that evicts b on the fourth step and c on the seventh step. and d is not already in the cache. by bringing in items in steps when they are not requested.12) Let S be a reduced scheduIe that makes the same eviction deasions as SEE through the first j items in the sequence. and S evicts an item e’. Similarly. So instead we’ll have S’ try to get its cache back to the same state as S as quickly as possible. These are some of the kinds of things one should worry about before concluding that Farthest-in-Future really is optimal. one decision for another--forms the first outline of an exchange argument that proves the optimality of Farthest-in-Future. Specifically.~n a step i if there is a request to d in step i. we can swap the choices of b and c without changing the cost. without increasing the number of misses. Consider the (j + 1)st request. it must be that g is not in the cache of S’ either. we quickly appreciate that it doesn’t really matter whether b or c is evicted at the fourth step. so given a schedule where b is evicted first. c} initially in the cache.134 Chapter 4 Greedy Algorithms 4. We will now gradually "transform" the schedule S* into the schedule SEE. We now show that for every nonreduced schedule. the number of items that are brought in is exactly the number of misses. the schedule S’ is actually evicting an item (c) that is needed farther into the future than the item evicted at this point by Farthest-in-Future. (i) There is a request to an item g ~ e.11) -~ is a reduced schedule that brings in at most as many items as the schedule S. So the interesting case arises when d needs to be brought into the cache. but S and SEE both evict the same item to make room for d. and S evicts e to make room for it. S’ behaves exactly like S until one of the following things happens for the first time. If d is in the cache for both. In thinking about the example above. let’s clear up one important issue. but this is no longer possible. we should have S’ evict e rather than f. and given this flexibility. our construction of ~ "pretends" to do this but actually leaves d in main memory.3 Optimal Caching: A More Complex Exchange Argument 135 with k = 3 and items {a. Proof. since the other one should be evicted at the seventh step. when it brought in d. But in general one could imagine an algorithm that produced schedules that are not reduced. Consider an arbitrary sequence D of memory references. so we can have S’ evict f. Since S’ and S only differ on e and f. If e’ = e. Before delving into this analysis. It only really brings d into the cache in the next step j after this in which d is requested. This reasoning--swapping. Now we need to further ensure that S’ incurs no more misses than S. So in fact it’s easy to find cases where schedules produced by rules other than Farthest-in-Future are also optimal. and incurs no more misses than S does. then we’re all set: S’ can simply access f from the cache.~ For example. f that is not in the cache of S. there is an equally good reduced schedule. on the seventh step in our example. (ii) There is a request to f. and after this step the caches . and so S in fact agrees with SEE through step j + 1. for a number j. to item d = dy+l. one eviction decision at a time. But there are other eviction schedules that are just as good. Hence we must actually do something nontrivial to construct S’. Hence we have the following fact. then we can again set S’ = S. Note that for any reduced schedule. why might a deviation from Farthest-in-Future early on not yield an actual savings farther along in the sequence. Al! the cache maintenance algorithms we’ve been considering so far produce schedules that only bring an item d into the cache . they have the same cache contents. In this way. The Farthest-in-Future rule will produce a schedule S that evicts c on the fourth step and b on the seventh step. Let us ca!l such a schedule reduced--it does the minimal amount of work necessary in a given step. Then there is a reduced schedule S’ that makes the same eviction decisions as SEE through the first ] + 1 items. b. We define a new schedule -~--the reduction of S--as follows. (4. (4. since Farthest-in-Future gave up c earlier on. An easy way to do this would be to have S’ agree with S for the remainder of the sequence. we can finish the construction of S’ by just having it behave like S. while not incurring unnecessary misses. In any step i where S brings in an item d that has not been requested.

The algorithm maintains a set S of vertices u for which we have determined a shortest-path distance d(u) from s. 3 . we consider the quantity . the length of P--denoted g(P)--is the sum of the lengths of all edges in P. zJ The Problem As we’ve seen. Farthest-in-Future evicted the item (e) that would be needed farthest in the future. Now. If e’ ~ e. the best caching algorithms under this requirement seem to be variants of the Least-Recently-Used (LRU) Principle. this is the "explored" part of the graph.4 Shortest Paths in a Graph 137 " ’ e as wel!. which proposes evicting the item from the cache that was referenced longest ago. so before there could be a request to e. v). We begin by describing an algorithm that just determines the length of the shortest path from s to each other node in the graph. We should mention that although the problem is specified for a dkected graph. we further transform S’ to its reduction S’ using (4. As a result. For a path P. We continue applying (4. 4. (It is easy to invent pathological exceptions to this principle. one generally must make eviction decisions on the fly without knowledge of future requests. Experimentally.12) to construct a schedule $1 that agrees with SFF through the first step..11). m. The concrete setup of the shortest paths problem is as follows.136 Chapter 4 Greedy Algorithms 4. or traversing a sequence of communication links through intermediate touters. and then case (ii) above would apply. Thus we have (4. we must be careful here. v) of length ~e by two directed edges (u. it is then easy to produce the paths as well. Here we apply a greedy algorithm to the problem of finding shortest paths.. However. but these are relatively rare in practice. since it agrees with it through the whole sequence. and use (4.. in both these cases.13) S~F incurs no more misses than any other schedule S* and hence is optimal. graphs are often used to model networks in which one travels from one point to another--traversing a sequence of highways through interchanges.4 Shortest Paths in a Graph Some of the basic algorithms for graphs are based on greedy design principles. And crucially--here is where we use the defining property of the Farthest-inFuture Algorithm--one of these two cases will arise before there is a reference to e.. Hence. then we have S’ evict bring in e from main memory. So to finish this part of the construction. and by definition Sm= Sw. this is just Belady’s Algorithm with the direction of time reversed--longest in the past rather than farthest in the future. Each edge e has a length ~e -> 0. when we return to the caching problem in Chapter 13. and it still agrees with SFF through step j + 1. We are given a directed graph G = (V. u). this doesn’t increase the number of items brought in by S’. That is. Extensions: Caching under Real Operating Conditions As mentioned in the previous subsection. but in applications. Our goal is to determine the shortest path from s to every other node in the graph. Belady’s optimal algorithm provides a benchmark for caching performance. there would have to be a request to f. producing schedules Sj that agree with S~F through the first j steps. for each node v ~ V-S. we have a new reduced schedule S’ that agrees with SFF through the first j + 1 items and incurs no more misses than S does. If one thinks about it. a basic algorithmic problem is to determine the shortest path between nodes in a graph. we can handle the case of an undirected graph by simply replacing each undirected edge e = (u. each of length ge- ~ Designing the Algorithm In 1959. indicating the time (or distance. It is effective because applications generally exhibit locality of reference: a running program will generally keep accessing the things it has just been accessing. Sleator and Tartan showed that one could actually provide some theoretical analysis of the performance of LRU. what is the shortest path from s to each other node? Using this result. since S’ is no longer a reduced schedule: it brought in e when it wasn’t immediately needed. and d(s) = O. and of S and S’ will be the same. we determine the shortest path that can be constructed by traveling along a path through the explored part S to some u ~ $. We may ask this as a point-to-point question: Given nodes u and v. Long after the adoption of LRU in pradtice.) Thus one wants to keep the more recently referenced items in the cache. it is easy to complete the proof of optimality. This is because in step j + 1. as well as the analysis of a randomized variant on LRU. We begin’ with an optima! schedule S*. v) and (u. bounding the number of misses it incurs relative to Farthest-in-Future. Edsger Dijkstra proposed a very simple greedy algorithm to solve the single-source shortest-paths problem. We assume that s has a path to every other node in G. with a designated start node s. E). Each schedule incurs no more misses than the previous one.12) inductively for j = 1. this too results in S and S’ having the same caches. We will discuss this analysis. 2. Initially S = {s}. followed by the single edge (u. or cost) it takes to traverse e. what is the shortest u-v path? Or we may ask for more information: Given a start node s. and in the next section we look at the construction of minimum-cost spanning trees.

and the second added node 12. Dijkstra’s Algorithm (G. add v to S. two iterations have been performed: the first added node u. The situation is now as depicted in Figure 4. At the point the picture is drawn. ~). We prove its correctness using a variant of our first style of analysis: we show that it "stays ahead" of all other solutions by establishing. and so on until we reach s.7 A snapshot of the execution of Dijkstra’s Algorithm. 12) be the final edge on our s-12 path P~. that path is shorter than every other possible path to v. We now answer this by proving the correctness of the algorithm. :~ Note that this fact immediately establishes the correctness of Dijkstra’s Mgofithm. (4. ~) Let S be the set of explored nodes For each ueS. Figure 4. x). follow the edge we have stored for v in the reverse direction to a. at which point S includes all nodes. We prove this by induction on the size of S. we get the true shortest-path distance to 127. we now grow S to size k + 1 by adding the node 12. For each u ~ S. and the crux of the proof is very simple: P cannot be shorter than P~ because it is already at least as ~ Analyzing the Algorithm We see in this example that Dijkstra’s Algorithm is doing the fight thing and avoiding recurring pitfalls: growing the set S by the wrong node can lead to an overestimate of the shortest-path distance to that node.7. Note that s must be reached. showing that the paths Pa really are shortest paths. In other words. that each time it selects a path to a node 12. We choose the node v e V-S for which t~s quantity is minimized. we wish to show that it is at least as long as P~.8.4 Shortest Paths in a Graph 139 d’(v) = mine=(a. consider the snapshot of its execution depicted in Figure 4. The path Pv is implicitly represented by these edges: if (u.v):ues d(u) + £e. to construct P~. then P~ is just (recursively) the path P~ followed by the single edge (u. Let (u. To get a better sense of what the algorithm is doing. the sense that we always form the shortest new s-12 path we can make from a path in S followed by a single edge. Suppose the claim holds when IS] = k for some value of k > 1. we simply start at 12. In order to reach ~. and let x ~ S be the node just before y. As each node v is added to the set S. Note that attempting to add y or z to the set S at this point would lead to an incorrect value for their shortest-path distances. ultimately. we simply record the edge (a. inductively. since our backward walk from 12 visits nodes that were added to S earlier and earlier. then follow the edge we have stored for a in the reverse direction to its predecessor. we store a distsnce d(u) Initially S = Is} and d(s) = 0 While S ~ V Select a node u ~S with at least one edge from S for which d’(u) = nfine=(u. The question becomes: Is it always true that when Dijkstra’s Algorithm adds a node v. In the iteration that is about to be performed. The case IS] = 1 is easy.v):u~s d(u) + ~-e is as small as possible Add u to S and define d(u)=d’(u) EndWhile 3 Set S: ~ nodes already explored It is simple to produce the s-u paths corresponding to the distances found by Dijkstra’s Algorithm. Now consider any other s-12 path P. and define d(v) to be the value d’(v). Dijkstra’s Algorithm is greedy in .138 Chapter 4 Greedy Algorithms 4. Pu is the shortest s-u path for each u ~ S. let y be the first node on P that is not in S. the node x wil! be added because it achieves the smallest value of d’(x). thanks to the edge (u. the path Pu is a shortest s-u path. v) is the edge we have stored for v.14) Consider the set S at any point in the algorithm’s execution. they will be added because of their edges from x. since we can apply it when the algorithm terminates. due to the path through u. since then we have S = {s] and d(s) = 0. this path P must leave the set S sornetuhere.v):a~s d(a) + ~e. v) on which it achieved the value rnine=(a. Proof. The next node that will be added to the set S is x. we have d’(x) = d(a) + lax = 2. By induction hypothesis.

it is easy to see that this is exactly the path to v found by Dijkstra’s Algorithm. To see how to update the keys.1 iterations of the krt~±]. joined together at the nodes. w) ~ E. In summary. and the full path P is at least as long as this subpath. so this would lead to an implementation that runs in O(mn) time. Selecting the correct node u efficiently is a more subtle issue. This means that there is no path from s to y through x that is shorter than Pv. Priority queues were discussed in Chapter 2. For a graph with m edges. ’~ Here are two observations about Dijkstra’s Algorithm and its analysis. the algorithm does not always find shortest paths if some of the edges can have negative lengths. and a more complex algorithm--due to Bellman and Ford--is required for this case. First. even simpler than we’ve described here. We will see this algorithm when we consider the topic of dynamic programming. delete elements. consider an iteration in which node u is added to S.8 The shortest path Pv and an alternate s-v path P through the node long as Pv by the time it has left the set S. w) is not an edge. computing all these minima can take O(m) time. as well. on the other hand. Combining these inequalities shows that g(P) >_ ~(P’) + ~. .4 Shortest Paths in a Graph 141 lth Set S The alternate s-v path P through~ x and y is already too long by | e time it has left the set S. they are data structures designed to maintain a set of n elements.(x.S. the expanding sphere Implementation and Running Time To conclude our discussion of Dijkstra’s Algorithm. then the new value for the key is min(d’(w). so that we can select the node v for which this minimum is smallest. Finally. A priority queue can efficiently insert elements. ) of wavefront reaches nodes in increasing order of their distance from s. we will explicitly maintain the values of the minima d’(u) = mJne=(u. we have the following result. (Do you see where the proof breaks?) Many shortest-path applications involve negative edge lengths.. There are n . As the wave expands out of node s at a constant speed.S in a priority queue with d’(u) as their keys. Since x ~ S. since Dijkstra’s Algorithm selected u in this iteration. each with a key. and extract the element with the minimum key. each edge e has length ge and a fixed cross-sectional area. as each iteration adds a new node v to S. We will need the third and fourth of the above operations: ChangeKey and Ex~cractN±n. Indeed. y) > d’(y). and go through all the edges between S and u to determine the minimum mine=(u. change an element’s key. What do we have to do to update the value of d’(w)? If (v. in iteration k + 1.e loop for a graph with n nodes. y) >_ g(P~). We can further improve the efficiency by keeping the nodes V . in a sense. Dijkstra’s Algorithm must have considered adding node y to the set S via the edge (x. How do we implement Dijkstra’s Algorithm using a priority queue? We put the nodes V in a priority queue with d’(u) as the key for u ~ V. and that the nodes are discovered by the expanding water in the same order that they are discovered by Dijkstra’s Algorithm. Suppose the edges of G formed a system of pipes filled with water. we need the Extrac~cN±n operation. Let P’ be the Subpath of P from s to x. we consider its running time. Figure 4. then we don’t have to do anything: the set of edges considered in the minimum mihe=(u.But the subpath of P up to y is such a path. Thus the subpath of P out to node y has length ~(P’) + g(x. This is a complete proof. It is easy to believe (and also true) that the path taken by the wavefront to get to any node u is a shortest path. rather than recomputing them in each iteration. the full path P is at least as long as P. Now suppose an extra droplet of water falls at node s and starts a wave from s. The second observation is that Dijkstra’s Algorithm is.(x. First. and so g(P’) > g(Px) = d(x).u):u~s d(u) + ~e for each node v V . Dijkstra’s Algorithm is really a "continuous" version of the standard breadth-first search algorithm for traversing a graph.w):a~s d(u) + ~e is exactly the same before and after adding v to S. To select the node v that should be added to the set S.u):u~s d(u)+g-e. This ChangeKey operation can occur at most once per edge. and so this subpath is at least as long as P.140 Chapter 4 Greedy Algorithms 4. y) > d(x) + g. and let tv ~ S be a node that remains in the priority queue. If e’ = (v. We can do considerably better if we use the right data structures. Since edge length~ are nonnegative. y) and rejected this option in favor of adding u. Indeed. one can also spell out the argument in the previous paragraph using the following inequalities. we know by the induction hypothesis that Px is a shortest s-x path (of length d(x)). One’s first impression is that each iteration would have to consider each node v ~ S. and it can be motivated by the following physical intuition. we know that d’(y) >_ d’(u) = g(Pv). when the tail of the edge e’ is added to S. If d’(ro) > d(u) + ~e’ then we need to use the ChangeKey operation to decrease the key of node w appropriately. d(u) + ~-e’).

T . and the total cost ~e~T Ce is as small as possible. this is a case where many of the first greedy algorithms one tries turn out to be correct: they each solve the problem optimally. T) must be connected. and let e be any edge on C. adding the node v that minimizes the "attachment cost" mine=(u. we assume only that the costs Ce are nonnegafive). By definition. We will call a subset T __c E a spanning tree of G if (V. More concretely. we delete it as . we can design a greedy algorithm by running sort of a "backward" version of Kruskal’s Algorithm..{e}) is also a valid solution to the problem. As we move through the edges in this order. then a minimum-cost solution to the network design problem may have extra edges--edges that have 0 cost and could option!lly be deleted.5 The Minimum Spanning Tree Problem We now apply an exchange argument in the context of a second fundamental problem on graphs: the Minimum Spanning Tree Problem. there is always a minimum-cost solution that is a tree. If. The problem is to find a subset of the edges T_ E so that the graph (V.{e}) is still connected. we wish to build it as cheaply as possible. For certain pairs (vi. we start with the full graph (V. vn}. T) is a tree. But curiously. and including the edge e = (u. Specifically. At each step. we show that it also will contain no cycles. So it is not at all clear how to efficiently find the cheapest tree from among all these options. This approach is called Prim’s Algorithm. then we simply discard e and continue. in fact. vj). S = {s}. Here are three greedy algorithms. cycles until we had a tree. Proof.142 Chapter 4 Greedy Algorithms 4. plus the time for n Extrac~Min and m ChamgeKey operations. suppose it contained a cycle C. (V. " If we allow some edges to have 0 cost (that is.) Here is a basic observation. This approach is called Kruskal’s Algorithm. it is even simpler to specify than Dijkstra’s Algorithm. Using the heap-based priority queue implementation discussed in Chapter 2. (4. The network should be connected-there should be a path between every pair of nodes--but subiect to this’ requirement.. we maintain a set S _c V on which a spanning tree has been constructed so far. T) is connected. for this reason. Thus we can represent the set of possible links that may be built using a graph G = (V. some of the underlying reasons for this plethora of simple. whose structures may look very different from one another. We claim that (V. and it is cheaper--a contradiction. it is easy to come up with a number of natural greedy algorithms for the problem. Unless G is a very simple graph. it is generally called the Minimum Spanning Tree Problem. we grow S by one node. no solution is possible. it will have exponentially many different spanning trees.u):u~s ce. We start with a root node s and try to greedily grow a tree from s outward. via a nice pair of exchange arguments. But even in this case. Finally. on the other hand. (We will assume that thefull graph G is connected. T . T) is a tree.16) says that the goal of our network design problem can be rephrased as that of finding the cheapest spanning tree of the graph. vj). otherwise. In each iteration. we may build a direct link between vi and vj for a certain cost c(vi. each priority queue operation can be made to run in O(log n) time. E) and begin deleting edges in order of decreasing cost. each of which correctly finds a minimum spanning tree. We will review a few of these algorithms now and then discover.1S) Using a priority queue. Another simple greedy algorithm can be designed by analogy with Dijkstra’s Algorithm for paths. "the long way" around the remainder of the cycle C instead.. Then (V. with nonnegative edges. we insert each edge e as long as it does not create a cycle when added to the edges we’ve already inserted. we could keep deleting edges on fi Designing Algorithms As with the previous problems we’ve seen. inserting e would result in a cycle. 4..16) Let T be a minimum-cost solution to the network design problem defined above. v) that achieves this minimum in the spanning tree. ~ The Problem Suppose we have a set of locations V = {vl. v2 . and fortunately. As we get to each edge e (starting from the most expensive).5 The Minimum Spanning Tree Problem 143 (4. we simply add the node that can be attached as cheaply as possibly to the partial tree we already have. Statement (4. Starting from any optimal solution. and we want to build a communication network on top of them. optimal algorithms. E). One simple algorithm starts without any edges at all and builds a spanning tree by successively inserting edges from E in order of increasing cost. It follows that (V. Thus the overall time for the implementation is O(m log r~). since any path that previously used the edge e can now go. Indeed. Di]kstra’s Algorithm can be implemented on a graph with n nodes and m edges to run in O(m) time. the cost would not increase during this process. although. vj) > 0. Initially. with a positive cost Ce associated with each edge e = (vi.

we get a set of edges T’= T. See Figure 4.9 Sample run of the Minimum Spanning Tree Algorithms of (a) Prim and (b) Kruskal. T’ U {e’}) is the one composed of e and the path P.10 for the situation at this stage in the proof. Since T is a spanning tree.(e’} U {e). and this cycle is not present in (V. But e is the cheapest edge with this property. The fact that each of these algorithms is guaranteed to produce an optimal solution suggests a certain "robustness" to the Minimum Spanning Tree Problem--there are many ways to get to the answer. T) that used the edge e’ = (~’. correspondingly. Let u’ E S be the node just before w’ on P. Let T be a spanning tree that does not contain e. Since e is the cheapest edge with this property.9 shows the first four edges added by Prim’s and Kruskal’s Algorithms respectively. f! Analyzing the Algorithms All these algorithms work by repeatedly inserting or deleting edges from a partial solution. so there must be a path P in T from v to w. w) be the minimumcost edge with one end in S and the other in V. consider the following shorter but incorrect argument for (4.S. (b) Figure 4. Then every minimum spanning tree contains the edge e. For want of a better name. no two are equal). we will make the simplifying assumption that all edge costs are distinct from one another (i. To see that (V. we have ce < cf. and let e’ = (v’. suppose we follow the nodes of P in sequence.S. If we exchange e for e’. For purposes of the analysis. and we will show later in this section how this assumption can be easily eliminated. and with the property exchanging e for e’ results in another spanning tree.S. Thus. To appreciate this subtlety. which we wil! refer to as the Cut Property. and then the portion of P from w to w’.S.) Thus the total cost of T’ is less than that of T. on a geometric instance of the Minimum Spanning Tree Problem in which the cost of each edge is proportional to the geometric distance in the plane. The crux is therefore to find an edge that can be successfully exchanged with e.. When Is It Safe to Include an Edge in the Minimum Spanning Tree? The crucial fact about edge insei-tion is the following statement.17) Assumethatalledgecostsaredistinct. Next we explore some of the underlying reasons why so many different algorithms produce minimumcost spanning trees. and any path in (V. the ne~xt edge to be added is a dashed line. it would be useful to have in hand some basic facts saying when it is "safe" to include an edge in the minimum spanning tree. Recall that the ends of e are v and w. then the edge e. T’) due to the deletion of e’. as desired. This resulting spanning tree will then be cheaper than T. long as doing so would not actually disconnect the graph we currently have. e’ is an edge of T with one end in S and the other in V . and hence T . note that the only cycle in (V. This assumption makes it .. T’) is also acyclic.17). Clearly (V. Let T be a spanning tree that does not contain e. The first 4 edges added to the spanning tree are indicated by solid lines. and so ce < ce. LetSbeanysubsetofnodesthat is neither empty nor equal to all of V. For example.. w’) can now be "rerouted" in (V. We noted above that the edge e’ has one end in S and the other in V . T) is connected. and let edge e = (v. We claim that T’ is a spanning tree. we need to show that T does not have the minimum possible cost. (The inequality is strict since no two edges have the same cost.5 The Minimum Spanning Tree Problem 145 easier to express the arguments that follow. since (V.If} U {el is a spanning tree that is cheaper than T. The proof of (4.17) is a bit more subtle than it may first appear. w’) be the edge joining them. (4. and.S. .e. there is a first node w’ on P that is in V . it must contain an edge f with one end in S and the other in V . Proof. T’) to follow the portion of P from v’ to v. T is a spanning tree. We’!l do this using an exchange argument: we’ll identify an edge e’ in T that is more expensive than e. Starting at ~. as desired. it’s never been named after a specific person). when it is safe to eliminate an edge on the grounds that it couldn’t possibly be in the minimum spanning tree. on the same input. Figure 4.144 Chapter 4 Greedy Algorithms 4. this approach is generally called the Reverse-Delete Algorithm (as far as we can te!l. T’) is connected. So. to analyze them.

if (V. there is at least one edge between S and V .{f} U {e} may not be a spanning tree. Consider any edge e = (v.S. since any such edge could have been added without creating a cycle. Clearly (V.S.4.5 The Minimum Spanning Tree Problem 146 Chapter 4 Greedy Algorithms So if we can show that the output (V.S. there is a set S _ V on which a partial spanning tree has been constructed. and the algorithm will add the first of these that it encounters. For Prim’s Algorithm.S. The point is that both algorithms only include an edge when it is justified by the Cut Property (4. The edges of C other than e form. it is also very easy to show that it only adds edges belonging to every minimum spanning tree. The difficulty is that T .19) Prim’s Algorithm produces a minimum spanning tree of G~ (e can be swapped for e’. e is the cheapest edge with one end in S and the other end in V . Moreover. We can find such an edge by following the cycle C. T) were not connected. and let S be the set of all nodes to which v has a path at the moment iust before e is added. swapping e for a cheaper edge in such a way that we still have a spanning tree. and let edge e = (v. and so by the Cut Property (4. Indeed. It is also straightforward to show that Prim’s Algorithm produces a spanning tree of G.10 Swapping the edge e for the edge e’ in the spanning tree T. and V . tu) added by Kruskal’s Algorithm. or that T {f} U {e} is cheaper than T. in each iteration of the algorithm. and so by (4. So again the question is: How do we find a cheaper edge that can be exchanged in this way with e? Let’s begin by deleting e from T.17). we’ll do this with an exchange argument. then we will be done. By analogy with the proof of the Cut Property (4.S has been encountered yet. and a node u and edge e are added that minimize the quantity mine=(u. and hence would have been added by Kruskal’s Algorithm.20) Assume that all edge costs are distinct. but tu S. Further. [] When Can We Guarantee an Edge Is Not in the Minimum Spanning Tree? The crucial fact about edge deletion is the following statement. (4. no edge from S to V . The point is that we can’t prove (4. The Optimality of Kraskal’s and Prim’s Algorithms We can now easily prove the optimality of both Kruskal’s Algorithm and Pfim’s Algorithm.17). Thus e is the cheapest edge with one end in S and the other in V. since the algorithm is explicitly designed to avoid creating cycles.17) by simply picking any edge in T that crosses from S to V . this partitions the nodes into two components: S. . w) be the most expensive edge belonging to C.17) it is in every minimum spanning tree. so as to stitch the tree back together.S. Proof. Then e does not belong to any minimum spanning tree of G. [] 147 (4.17). T) of Kruskal’s Algorithm is in fact a spanning tree of G. since adding e does not create a cycle. then there would exist a nonempty subset of nodes S (not equal to all of V) such that there is no edge from S to V . (4. If we follow P from u to tu. by definition. containing node tu. Let T be a spanning tree that contains e. as described in the proof of (4. T) contains no cycles.17) it belongs to every minimum spanning tree.S. which we wil! refer to as the Cycle Property. a path P with one end at u and the other at tu. Proof. containing node u. we need to show that T does not have the minimum possible cost. some care must be taken to find the right one. so there is some The problem with this argument is not in the claim that f exists. the edge we use in place of e should have one end in S and the other in V .S.u):u~s Ce. By definition. Let C be any cycle in G. and hence it produces a minimum spanning tree.18) Kruskal’s Algorithm produces a minimum spanning tree of G. Clearly v ~ S. Proof.10.) Figure 4. But this contradicts the behavior of the algorithm: we know that since G is connected. we begin in S and end up in V .S. as shown by the example of the edge f in Figure 4. Now.

and since all of our algorithms are based on just comparing edge costs.20). since e is the most expensive edge on the cycle C. Eliminating the Assumption that All Edge Costs Are Distinct Thus far. T) contains a cycle C. it lies on a cycle C. we will produce a minimum spanning tree T that is also optimal for the original instance. which would be the first one encountered by the algorithm. so T’ is a spanning tree of G. Consider any edge e = (v. the change in the cost of T cannot be enough to make it better than T* under the new costs. Consider the most expensive edge e on C.4. T~) is connected and has no cycles. we claim that any minimum spanning tree T for the new.17).21) of G. and hence T’ is cheaper than T. The basic idea is analogous to the optimality proofs for the previous two algorithms: Reverse-Delete only adds an edge when it is justified by (4.11 Swapping the edge e’ for the edge e in the spanning tree T. Now. Now. (4. extremely small numbers.20) implies that something even more general is going on. Thus by (4. We will see how to do this for Prim’s Algorithm . [] The Optimality of the Reverse-Delete Algorithm Now that we have the Cycle Property (4. We will see that both Prim’s and Kruskal’s Algorithms can be implemented. T) of Reverse-Delete is a spanning tree of G. since its removal would not have disconnected the graph. and since it is the first edge encountered by the algorithm in decreasing order of edge costs. [] 149 ~Tcan be swapped for e. perturbed instance must have also been a minimum spanning tree for the original instance.how can we conclude that the algorithms we have been discussing still provide optimal solutions? There turns out to be an easy way to do this: we simply take the instance and perturb all edge costs by different.17) and the Cycle Property (4.5 The Minimum Spanning Tree Problem 148 Chapter 4 Greedy Algorithms contradiction that (V. this. the combination of the Cut Property (4. suppose by way of Implementing Prim’s Algorithm We next discuss how to implement the algorithms we have been considering so as to obtain good running-time bounds. it must be that e’ is cheaper than e. edge e’ on P that crosses from S to V . w) removed by Reverse-Delete. and e’ belongs to C. it is easy to prove that the Reverse-Delete Algorithm produces a minimum spanning tree.dge should have been removed.20). and it provides an explanation for why so many greedy algorithms produce optimal solutions for this problem. suppose we are given an instance of the Minimum Spanning Tree Problem in which certain edges have the same cost . Now consider the set of edges T~ = T . since the algorithm never removes an edge when this will disconnect the graph. any two costs that differed originally will sti!l have the same relative order. it must be the most expensive edge on C. to run in O(m log n) time. we will be done. then for small enough perturbations. Thus. so that they all become distinct. Now. and this assumption has made the analysis cleaner in a number of places. with the right choice of data structures. Arguing just as in the proof of the Cut Property (4. Any algorithm that builds a spanning tree by repeatedly including edges when justified by the Cut Property and deleting edges when justified by the Cycle Property--in any order at all--will end up with a minimum spanning tree. as desired.20). Moreover. if we run any of our minimum spanning tree algorithms. This e. the perturbations effectively serve simply as "tie-breakers" to resolve comparisons among costs that used to be equal. and this contradicts the behavior of Reverse-Delete. So if we show that the output (V. At the time that e is removed. as described in the proof of (4. Moreover. See Figure 4.20). To see this. since the perturbations are so small. e does not belong to any minimum spanning tree. we note that if T cost more than some tree T* in the original instance. using the perturbed costs for comparing edges. we have assumed that all edge costs are distinct. Clearly (V. This principle allows one to design natural greedy algorithms for this problem beyond the three we have considered here. Proof.S. T) is connected.) Figure 4. The Reverse-Delete Algorithm produces a minimum spanning tree While we will not explore this further here.11 for an illustration of.{e} LJ [e’}. the graph (V.

A minimum spaxming tree optimizes a particular goa!. w) is considered. By analogy with Dijkstra’s Algorithm. which will store a representation of the components in a way that supports rapid searching and updating. we keep the nodes in a priority queue with these attachment costs a(v) as the keys. be concerned about point-to-point distances in the spanning tree we . so we do not focus on Reverse-Delete in this discussion. More generally. we need to be able to decide which node v to add next to the growing set S. B) to take two sets A and B and merge them to a single set. 4. For Pfim’s Algorithm. That is. achieving connectedness with minimum total edge cost.S. plus the time for n Ex~. Thus we have (4. As with Dijkstra’s Algorithm. though due to their importance in practice there has been research on good heuristics for them. and hence edge e should be included. But there are a range of fllrther goals one might consider as well. we need to efficiently find the identities of the connected components containing v and w. We may. Alternately. These operations can be used to maintain connected components of an evolving graph G = (V. One could instead make resilience an explicit goal. for example seeking the cheapest connected network on the set of sites that remains connected after the deletion of any one edge. and m ChangeKey operations. When an edge is added to the graph. it is easy to find cases in which the minimum spanning tree ends up concentrating a lot of traffic on a single edge. then there is no path from v and w.rac~Iqin. for example. we select a node with an Extra¢~cNin operation. and update the attachment costs using ChangeKey operations.build.6 Implementing Kruskal’s Algorithm: The Union-Find Data Structure 151 here. This is exactly the data structure needed to implement Kruskal’s Algorithm efficiently. A tree has the property that destroying any one edge disconnects it. ~ The Problem The Union-Find data structure allows us to maintain disjoint sets (such as the components of a graph) in the following sense. This operation can be used to test if two nodes u and v are in the same set. we don’t want to have to recompute the connected components from scratch. the data structure should also support the efficient merging of the components of v and w into a single new component. Obtaining a running time close to this for the Reverse-Delete Algorithm is difficult. and defer discussing the implementation of Kruskal’s Algorithm to the next section. suggesting some tension between these goals. and so e should be omitted. if we use a heap-based priority queue we can implement both Ex~crac~cMin and ChangeKey in O(log n) time. the operation Find(u) will return the name of the set containing u. we consider the scenario in which a graph evolves through the addition of edges. This raises new issues. Rather. All of these extensions lead to problems that are computationally much harder than the basic Minimum Spanning Tree problem. The data structure will also implement an operation Union(A. and so get an overall running time of O(m log n). one could seek a spanning tree in which no single edge carries more than a certain amount of this traffic. the operation Find(u) will return the Extensions The minimum spanning tree problem emerged as a particular formulation of a broader network design goal--finding a good way to connect a set of sites by installing edges between them. The sets will be the connected components of the graph. In Chapter 3 we discussed linear-time algorithms using BFS or DFS for finding the connected components of a graph. which means that trees are not at . then there is a v-w path on the edges already included. As before. Here too.6 Implementing Kruskal’s Algorithm: The IJnion-Find Data Structure One of the most basic graph problems is to find the set of connected components. the graph has a fixed population of nodes. Given traffic that needs to be routed between pairs of nodes. it is reasonable to ask whether a spanning tree is even the right kind of solution to our network design problem.v):aEs Ce for each node v ~ V . while the proof of correctness was quite different from the proof for Dijkstra’s Algorithm for the Shortest-Path Algorithm. Our goal is to maintain the set of connected components of such a graph thxoughout this evolution process. E) as edges are added.22) Using a priority queue. all robust against failures. the implementations of Prim and Dijkstra are almost identical. we may care more about the congestion on the edges. There are n . and be willing to reduce these even if we pay more for the set of edges. by maintaining the attachment costs a(v) = mine=(u. In the event that e is included. In this section. and we perform ChangeKey at most once for each edge. If these components are different. by simply checking if Find(u) = Find(v).150 Chapter 4 Greedy Algorithms 4.I iterations in which we perform Ex~crac~cNin. since it is not hard to construct examples where the minimum spanning tree does not minimize point-to-point distances. but if the components are the same. Prim’s Algorithm can be implemented on a graph with n nodes and m edges to run in O(m) time. As each edge e = (v. but it grows over time by having edges appear between certain paizs of nodes. For a node u. we will develop a data structure that we ca!l the Union-Find structure. Given a node u.

Let’s briefly discuss what we mean by the name of a set--for example. It may be that in some of these Unions... at most 2k elements are involved in any Union operations at all. where size[A] is the size of set A. However. set A: this way we only have to update the values Component [s] for s ~ B. we can maintain an additional array size of length n. Moreover. As v’s set is involved in a sequence of Union operations. where unions keep the name of the larger set. we will name each set using one of the elements it contains. and different names otherwise. The claims about the MakeUnionFind and Find operations are easy to verify. Now consider a particular element v. for this operation. To improve this bound.6 Implementing Kruskal’s Algorithm: The Union-Find Data Structure 152 Chapter 4 Greedy Algorithms 153 name of the component containing u. this idea by itself doesn’t help very much. and when a Union(A. This way. This corresponds. Let S be a set. so we don’t have to look through the whole array to find the elements that need updating. fewer elements need to have their Componen~c values updated. n}. In our implementations. B) will change the data structure by merging the sets A and B into a single set. Further. Our goal . o MakeUnionFind(S) for a set S will return a Union-Find data structure on set S where all elements are in separate sets. Recall that we start the data structure from a state when all n elements are in their own separate sets. so .4. When set B is big. Now consider a sequence of k Union operations.. First. it is useful to explicitly maintain the list of elements in each set. A single Union operation can consider at most two of these original one-element sets. all but at most 2k elements of S have been completely untouched. v) to the graph.Find(v)) can be used to merge the two components into one. where Component [s] is the name of the set containing s.will be to implement Union in O(log n) time. it is not designed to handle the effects of edge deletion. B) operation is performed. Instead of bounding the time spent on one Union operation. A Simple Data Structure for Union-Find Maybe the simplest possible way to implement a Union-Find data structure is to maintain an array Component that contains the name of the set cuirenfly containing each element. To summarize. then we first test if u and v are already in the same connected component (by testing if Find(u) = Find(v)). so after any sequence of k Union operations. This implementation makes Find(u) easy: it is a simple lookup and takes only O(. and in others it is not. its size grows. say. we may want to keep its name and change Component [s] for all s ~ A instead. Union(A. we can bound the total (or average) running time of a sequence of k Union operations. Some implementations that we discuss will in fact . we set up the array and initialize it to Component Is] = s for all s ~ S. we will bound the total time spent updating Component[v] for an element u fi-Lroughout the sequence of k operations.take only 0(1) time. Of course. However. as we have to update the values of Component Is] for all elements in sets A and B. The size of v’s set starts out at I. for example.1) time. such bad cases for Union cannot happen very often. Our goal will be to implement Find(u) in O(log n) time. the value of Component[v] is updated. this happens if we take the union of two large sets A and B. More generally. Even with these optimizations. There is a fair amount of flexibility in defining the names of the sets. (4. Proof. we save some time by choosing the name for the union to be the name of one of the sets. they should simply be consistent in the sense that Find(v) and Find(w) should return the same name if v and w belong to~ the same set. and the maximum possible size it can reach is 2k (since we argued above that all but at most 2k elements are untouched by Union operations). But our convention is that the union uses the name of the larger set. If we add an edge (u. if set B is large.23) Consider the array implementation of the Union-Find data structure for some set S of size n. each containing a constant fraction of all the elements. as returned by the Find operation. Our goal will be to implement MakeUnionFind in time O(n) where n o For an element u ~ S. the worst case for a Union operation is still O(n) time. Thus we add one further optimization. The only part of a Union operation that takes more than O(I) time is updating the array Component. and assume it has n elements denoted {1 . which may result in a single component being "split" into two. then Union(Find(u). We will set up an array Component of size n. as the resulting set A U B is even bigger. we will do a few simple optimizafions. so in every update to Component [v] the size of the set containing u at least doubles. o For two sets A and B. the operation Find(u) will return the name of the set containing u. the operation Union(A. MakeUnionFind(S) takes O(n) time. the Union-Find data structure will support three operafions. to the connected components of a graph with no edges. If they are not. How can we make this statement more precise? Instead of bounding the worst-case running time of a single Union operation. To implement MakeUnionFind(S). Thus Component[v] gets updated at most 1og2(2k) times throughout the process. and any sequence of k Union operations takes at most O(k log k) time. we use the name of the larger set for the union. It is important to note that the Union-Find data structure can only be used to maintain components of a graph as we add edges. but not for any s ~ A.. The Find operation takes O(1) time. B) for two sets A and B can take as long as O(n) time. .

]). u. and that the name of the union set is v. assume we select v as the name. How long can a Find(u) operation take. we wll! use the same optimization we used before: keep the name of the larger set as the name of the union. For example. that is. See Figure 4. The time to evaluate Find(u) for a node u is the number of thnes the set containing node u changes its name during the process. it follows that every time the name of the set containing node u changes. the O(log n) time per operation is good enough in the sense that further improvement in the time for operations would not translate to improvements . where unions keep the name o[ the larger set. u. and a Find operation takes O(log n) time. w} was merged into {t.154 Chapter 4 Greedy Algorithms 4. To reduce the time required for a Find operation. answering the query Find(i) would involve following the arrows i to x.~ The number of steps needed is exactly the number of times the set containing node u had to change its name.6 Implementing Kruskal’s Algorithm: The Union-Find Data Structure 155 we get a bound of O(k log k) for the time spent updating Component values in a sequence of k Union operations. Each node v ~ S will be contained in a record with an associated pointer to the name of the set that contains v.12 could be the outcome of the following sequence of Union operations: Union(w.12 followed this convention. . u). The dashed arrow from u to u is the result of the last Union operation.. and assume that the name we used for set A is a node v ~ A. first lead~g them to the "old name" u and then via the pointer from u to the "new name" v. and then x to ]. v). u). its size can double at most log2 rt times. structure in Figure 4. Union(y. Proof. including implementing Kruskal’s Algorithm.12 for what such a representation looks like. we will maintain an additional field with the nodes: the size of the corresponding set. By the convention that the union keeps the name of the larger set. and so there can be at most log2 n name changes. x). A Union operation takes O(1) t~me. to indicate that v is in its own set.12 A Union-Find data structure using pointers. we initiafize a record for each element v ~ S with a pointer that points to itself (or is defined as a null pointer). We do not update the pointers at the other nodes of set B. Since the set containing ~ starts at size 1 and is never larger than n. we simply update u’s pointer to point to v. we will use the elements of the set S as possible set names. For example. The sequence of Unions that produced the data IThe set {s. named after nodes u andj. we follow the arrows unit we get to a node that has no outgoing arrow. As a resuk. MakeUnionFind(S) takes O(n) time. (4. this improvement will not be necessary for our purposes in this book: for all the applications of Union-Find data structures that we consider. Consider a Union operation for two sets A and/3. Union(s. the size of this set at least doubles. while set B is named after node u ~ B. the name of the set they belong to must be computed by following a sequence of pointers. As before. For the MakeUnionFind(S) operation. the number of times the Component[u] array position would have been updated in our previous array representation. Union(i. naming each set after one of its elements. To answer a Find query. But a Find operation is no longer constant time. To indicate that we took the union of the two sets.) Figure 4. [] Further Improvements Next we will briefly discuss a natural optimization in the pointer-based UnionFind data structure that has the effect of speeding up the Find operations. the twO sets in Figure 4. u). This can be as large as O(n) if we are not careful with choosing set names. This pointer-based data structure implements Union in O(1) time: all we have to do is to update one pointer. We’ll do this at the expense of raising the time required for the Find operationto O(log n). Strictly speaking. While this bound on the average running time for a sequence of k operations is good enough in many applications. The data structure has only two sets at the moment.24) Consider the above pointer-based implementation of the Union-Find data structure [or some set S oy size n. j). To implement this choice efficiently. The idea is to have either u or u be the name of the combined set. Union(x. as we have to follow a sequence of pointers through a history of old names the set had. for elements w ~/3 other than u. Union(t. in order to get to the current name. A Better Data Structure for Union-Find The data structure for this alternate implementation uses pointers. we will try to do better and reduce the worst-case time required. z}. and Union(u. The statements about Union and MakeUnionFind are easy to verify. Union(z.

.7 Clustering We motivated the construction of minimum spanning trees through the problem of finding a low-cost network connecting a set of sites. But minimum .. Now consider the running time of the operations in the resulting implementation. we akeady "know" the name x of the set containing v.. Figure 4.. As before. This takes time O(m log m). if the algorithm decides to include edge e in the tree T. rather than the worst-case time for any one of them.1 Union operations over the course of Kruskal’s Algorithm. As each edge e = (v. (In particular. and for some Find operations we actually increase ¯ e time.23) for the array-based implementation of Union-Find. The real gain from compression is in making subsequent calls to Find cheaper. and reset each of these pointers to point to x directly. or (4. we will compress the path we followed after every Find operation by resetting all pointers along the path to point to the current name of the set. which has an unavoidable O(m log n) term due to the initial sorting of the edges by cost. and it takes log rt for each such call.24) for the pointer-based implementation.13 for a Union-Find data structure and the result of Find(v) using path compression. we use the Union-Find data structure to maintain the connected components of (V.7 Clustering 157 in the overall running time of the algorithms where we use them.) To motivate the improved version of the data structure.) To sum up.. (While more efficient implementations of the Union-Find data structure are possible. To do this. I ow points directly to x. and we also know that all other nodes that we touched during our path from v to the current name also are all contained in the set x. Having to follow the same sequence of log rt pointers every time for finding the name of the set containing v is quite redundant: after the first request for Find(v). Although we do not go into the details here. since after finding the name x of the set containing v.25) K. to conclude that this is a total of O(m log rt) time. we have to go back through the same path of pointers from v to x.23): bounding the total tLme for a sequence of n Find operations. and this can be made precise by the same type of argument we used in (4. we compute Find(u) and Find(v) and test if they are equal to see if v and w belong to different components. (The UnionFind operations will not be the only computational bottleneck in the running time of these algorithms. w) is considered.156 Chapter 4 Greedy Algorithms 4. we can repeatedly take Unions of equal-sized sets. o~(rt) < 4 for any value of rt that could be encountered in practice. Assume v is a node for which the Find(v) operation takes about log rt time. We are doing a total of at most 2m Find and n.) Implementing Kruskal’s Algorithm Now we’ll use the Union-Find data structure to implement Kruskal’s Algorithm. this would not help the running time of Kruskal’s Algorithm.. and so does not change the fact that a Find takes at most O(log n) time. We use Union(Find(u). How did the time required for a Find(v) operation change? Some Find operations can still take up to log n time..Find(v)) to merge the two components.. we have ’ AIgortthm " cart be implemertted on a graph’ with n : . the actual upper bound is O(not(rt)). and (b) the result of the operation Find(u) on this structure.13 (a) An instance of a Union-Find data structure. So in the improved implementation. First we need to sort the edges by cost. ruskal s rtodes artd m edges to rurt irt O(m log rt) time. See. Since we have at most one edge between any pair of nodes. We can use either (4. Now we can issue Find(v) repeatedly. Enverything on the path from v to x1 (a) Figttre 4. a Union operation takes O(1) time and MakeUnionFind(S) takes O(rt) time to set up a data structure for a set of size ft. After the sorting operation. (4. 4. we have m < rt2 and hence this running time is also O(m log rt). First we build up a structure where one of the Find operations takes about log n time. a sequence of n Find operations employing compression requires an amount of time that is extremely close to linear in rt. But this additional work can at most double the time required. No information is lost by doing this. using path compression. T) as edges are added. where or(n) is an extremely slow-growing function of n called the irtverse Ackermartrt furtctiort. let us first discuss a bad case for the running time of the pointer-based Union-Find data structure. and it makes subsequent Find operations run more quickly.

An appealing example is the role that minimum spanning trees play in the area of clustering. We require only that d(Pi. and we will try to bring nearby points together into the same cluster as rapidly as possible. distance may actually be related to their physical distance. a natural goal is to seek the k-clustering with the maximum possible spacing.." Starting from this vague set of goals. Pj). p~ and pj. Each time we add an edge that spans two distinct components. C2 . iteratively merging clusters is equivalent to computing a minimum spanning tree and deleting the most expensive edges. The question now becomes the following. distance takes on a much more abstract meaning.. We are doing exactly what Kruskal’s Algorithm would do if given a graph G on U in which there was an edge of cost d(Pi. we are growing a graph H on U edge by edge. given a distance function on the objects. pj) between each pair of nodes (Pi.... and defining the k-clustering to be the resulting connected components C1. We then draw an edge between the next closest pair of points. We continue adding edges between pairs of points. f! Designing the Algorithm To find a clustering of maximum spacing. This is equivalent to taking the rill minimum spanning tree T (as Kruskal’s Algorithm would have produced it). it is natural to look first for measures of how similar or dissimilar each pair of obiects is. Q. The connected components will be the clusters. (This way. pj) and find that pi and pj already belong to the same cluster.7 Clustering 159 spanning trees arise in a range of different settings. how can we efficiently find the one that has maximum spacing? ~ Analyzing the Algorithm Have we achieved our goal of producing clusters that are as spaced apart as possible? The following claim shows that we have." and objects in different groups are "far apart. we could define the distance between two species to be the number of years since they diverged in the course of evolution. and that distances are symmetric: d(pi. p~). we have a numerical distance d(p~.1 most expensive edges (the ones that we never actually added). each seeking to formalize this general notion of what a good set of groups might look like. .1 edges. p]). so if we are about to add the edge (pi. f! The Problem Clustering arises whenever one has a co!lection of obiects--say.. our graph-growing process will never create a cycle. single-link means that we do so as soon as a single link joins them together. we are running Kruskal’s Algorithm but stopping it just before it adds its last k . C2 . CIusterings of Maximum Spacing Minimum spanning trees play a role in one of the most basic formalizations. in order of increasing distance d(p~... P~) = 0.The only difference is that we seek a k-clustering. a set of photographs. Pn. our procedure is precisely Kruskal’s Minimum Spanning Tree Algorithm.. What is the connection to minimum spanning trees? It’s very simple: although our graph-growing procedure was motivated by this cluster-merging idea.158 Chapter 4 Greedy Algorithms 4. with connected components corresponding to clusters. several of which appear on the surface to be quite different from one another. In this way. we consider growing a graph on the vertex set U. with the interpretation that obiects at a larger distance from one another are less similar to each other. they don’t end up as points in different clusters that are very close together.. In other. we could define the distance between two images in ~/ video stream as the number of corresponding pixels at which their intensity values differ by at least some threshold. p]) = d(pj. We define the spacing of a k-clustering to be the minimum distance between any pair of points lying in different clusters. labeledpl. In the clustering literature. that d(p~.p2 .) Thus we start by drawing an edge between the closest pair of points. documents. so H will actually be a union of trees. pj).. because it won’t change the set of components. Notice that we are only interested in the connected components of the graph H. For each pair. obiects within the same group are "close. or microorganisms--that one is trying to classify or organize into coherent groups. for a given parameter k. which we describe here. but in many applications. Given that we want points in different clusters to be far apart from one another. words.14 for an example of an instance with k = 3 clusters where this algorithm partitions the points into an intuitively natural grouping. In this way. (Agglomerative here means that we combine clusters. a special case of hierarchical agglomerative clustering. we will refrain from adding the edge--it’s not necessary. Suppose we are seeking to divide the obiects in U into k groups. For points in the physical world. One common approach is to define a distance function on the objects. Thus. Now. it is as though we have merged the two corresponding clusters. deleting the k . Suppose we are given a set U of n obiects. We say that a k-clustering of U is a partition of U into k nonempty sets C1. Faced with such a situation. the field of clustering branches into a vast number of technically different approaches. not the full set of edges. the clustering problem seeks to divide them into groups so that. p]) > 0 for distinct p~ and pT. intuitively. For example.. so we stop the procedure once we obtain k connected components. the iterative merging of clusters in this way is often termed single-link clustering. There are exponentially many different k-clusterings of a set U.) See Figure 4.

. C. based entirely on relatively short-sighted considerations.. p’) < d*. In particular........ C~. l \ Cluster C.26).26) The components C!. in e’.8 Huffman Codes and Data Compression In the Shortest-Path and Minimum Spanning Tree Problems.. so that an equivalent smaller problem can then be solved by recursion. Let e denote the clustering C1... at the moment we stopped it.. in these cases). we’ve seen how greedy algorithms can be used to commit to certain parts of a solution (edges in a graph. : . (4. P has length at most d*.. and let p be the node on P that comes just before p’.. We have just argued that d(p.8 Huffrnan Codes and Data Compression 161 Cluster I Cluster Cr Cluster 2 ¯ Cluster 3 .... this means that each edge on .... The spacing of e is precisely the length d* of the (/( ...... but the global consequences of the initial greedy decision do not become fully apparent until the full recursion is complete. p’) was added by Kruskal’s Algorithm. p’) _< d*.. Figure 4. Pi ~ C~s and Now consider the picture in Figure 4. and hence the spacing of e’ is at most d(p. 1 most expensive edges of the minimum spanning tree T constitute a k-clustering of maximum spacing.15 An illustration of the proof of (4. The dusters are formed by adding edges between points in order of increasing distance. we know that Pi ~ C~s but pj ~ C~s. an area that forms part of the foundations for digital communication.. This completes the proof.. Ck. Ca . Ck [ormed by deleting the k.. which partitions U into nonempW sets C[.. But p and p’ belong to different sets in the clustering e’... so let p’ be the first node on P that does not belong to C£. to shrink the size of the problem instance. / .. Now consider some other/(-clustering e’. Since the two clustefings e and e’ are not the same. Figure 4. this is the length of the edge that Kruskal’s Mgorithm would have added next. essentially.14 An example of single-linkage clustering with k = 3 dusters. since the edge (p.. it must be that Kruskal’s Algorithm added all the edges of a PrPj path P before we stopped it.. The problem itself is one of the basic questions in the area of data compression.. We must show that the spacing of e’ is at most d*. Since pi and pj belong to the same component Cr.160 Chapter 4 Greedy Algorithms 4.. We now consider a problem in which this style of "committing" is carried out in an even looser sense: a greedy rule is used. The greedy operation here is ’proved to be "safe. Proof.. Pj ~ Cr that belong to different clusters in e’--say.. m 4. Hence there are points Pi..... it must be that one of our clusters Cr is not a subset of any of the/( sets C.. Cluster C~ : ..." in the sense that solving the smaller instance still leads to an optimal solution for the original instance. .15. Ca . Now.!)st most expensive edge in the minimum spanning tree.. showing that the spacing of any other dustering can be no larger than that of the clustering found by the single-linkage algorithm.

. Samuel Morse. For example. This would give us 32 symbols in total to be encoded. sequences consisting only of the symbols 0 and 1). you can form 2b different sequences out of b bits. In fact. parentheses. a to 01 (dot-dash). instead we could use a small number of bits for the frequent letters. . developed Morse code. At the end of the section. and so forth up to 11111.e. This is a reasonable solution--using very short bit strings and then introducing pauses--but it means that we haven’t actually encoded the letters using just 0 and 1. and so this is the approach he took. So it’s really a tremendous waste to translate them all into the same number of bits. and hope to end up using fewer than five bits per letter when we average over a long string of typical text. before the radio and telephone.8 Huffman Codes and Data Compression 163 ~ The Problem Encoding Symbols Using Bits Since computers ultimately operate on sequences of bits (i. Now. or stored on hard disks. we really need to be spending an average of five bits per symbol. and so this is simply a mapping of symbols into bit strings. When large files need to be shipped across communication networks. t: a. So. Morse understood the point that one could communicate more efficiently by encoding frequent letters with short strings. Nevertheless. the’ point is simply that five bits per symbol is sufficient. involving other letters. there was the telegraph. If we think about it." Thus. which could represent the apostrophe. Note that the mapping of bit strings to symbols is arbitrary. 1. To take a basic example. aa. and so if you wanted to send a message. if we really needed to encode everything using only the bits 0 and !. x.) Thus. Morse code maps e to 0 (a single dot). and z (by more than an order of magnitude). (so the encoding of aa would actually be dot-dash-pause-dot-dash-pause). you needed a way to encode the text of your message as a sequence of pulses. For our purposes. But telegraphs were only capable of transmitting pulses down a wire.162 Chapter 4 Greedy Algorithms 4. (There are other possibilities as well. for example. Morse code transmissions involve short pauses between letter. except that they use a larger number of bits per symbol so as to handle larger character sets. subject to the requirement that a subsequent reader of the file should be able to correctly reconstruct it. it’s important to represent them as compactly as possible. the pioneer of telegraphic communication. the letters in most human alphabets do not get used equally frequently. and all those other special symbols you see on a typewriter or computer keyboard. plus the space (to separate words) and five punctuation characters: comma. it’s not clear that over large stretches of text. Morse code uses such short strings for the letters that the encoding of words becomes ambiguous. the letters e. taking advantage of features other than nonuniform frequencies. Variable-Length Encoding Schemes Before the Internet. we see that the string 0101 could correspond to any of the sequences of letters eta. J. then we can encode 2s= 32 symbols--just enough for our purposes. We now describe one of the fundamental ways of formulating this issue. In one sense. we will discuss how one can make flLrther progress in compression. Let’s think about our bare-bones example with just 32 symbols. and a. period. A huge amount of research is devoted to the design of compression algorithms that can take files as input and reduce their space ~rough efficient encoding schemes. the bit string 00001 represent b. exclamation point. just as in ASCII. Communicating by telegraph was a lot faster than the contemporary alternatives of hand-delivering messages by railroad or on horseback. In English. o. including capital letters. We couldn’t ask to encode each symbo! using just four bits. and n get used much more frequently than q. In fact. and so if we use 5 bits per symbol. we could let the bit string 00000 represent a. question mark. etet. This issue of reducing the average number of bits per letter is a fundamental problem in the area of data compression. just using what we know about the encoding of e. or aet. since 24 is only 16--not enough for the number of symbols we have. Is there anything more we could ask for from an encoding scheme?. translating each letter into a sequence of dots (short pulses) and dashes (long pulses). such an optimal solution is a very appealing answer to the problem of compressing data: it squeezes all the available gains out of nonuniformities in the frequencies. To deal with this issue. and a larger number of bits for the less frequent ones. building up to the question of how we might construct the optimal way to take advantage of the nonuniform frequencies of the letters. and then just concatenate the bit strings for each symbol to form the text. (He consulted local printing presses to get frequency estimates for the letters in English. one needs encoding schemes that take text written in richer alphabets (such as the alphabets underpinning human languages) and converts this text into long strings of bits. there would need to be some flLrther encoding in which the pause got mapped to bits. for example. and in general maps more frequent letters to shorter bit strings. we’ve actually encoded it using a three-letter alphabet of 0. t. and apostrophe. t to 1 (a single dash). The simplest way to do this would be to use a fixed number of bits for each symbol in the alphabet. i. and "pause. encoding schemes like ASCII work precisely this way. suppose we wanted to encode the 26 letters of English. we can think of dots and dashes as zeros and ones.) To deal with this ambiguity. before the digital computer.

since no shorter or longer prefix of the bit sequence could encode any other letter. We denote this quantity by ABL0.05. o Scan the bit sequence from left to right. Neither 0 nor O0 encodes a letter. f~=. assuming there are n letters total. in such a way that for distinct x.~ We can convert this to a sequence of bits by simply encoding each letter as a bit sequence using ~ and then concatenating all these bit sequences together: ~ (xl) y (x2) ¯ ¯ ¯ y (xn).2+.20. This is a safe decision.) With a set S of five letters. e}. b. since we can check that no encoding is a prefix of any other. x~S Dropping the leading coefficient of n from the final expression gives us ~x~s fxl}’(x)l. but 001 does. a savings of 25 percent. Then the average number of bits per letter using the prefix code Yl defined previously is . The recipient now iterates on the rest of the message. e}. o Now delete the corresponding set of bits from the front of the message and iterate.B2.25. we can write this as encoding length = ~ nfxl}. they will be able to reconstruct the text according to the following rule. encoded as 000. b. f~=. NOW. and hence to obtain an encoding scheme that has a well-defined interpretation for every sequence of bits. the string cecab would be encoded as 0010000011101.’). since no longer sequence of bits beginning with 001 could encode a different letter. The encoding ~1 specified by y~(a) = 11 Zl(b) = O1 y~(c) = 001 y~(d) = 10 }q(e) = 000 is a prefix code. if we use a prefix code ~. And. d.(x) is not a prefix of the sequence y(y). Now. output this as the first letter of the text. and we want to take advantage of the fact that more frequent letters can have shorter encodings. the recipient can produce the correct set of letters without our having to resort to artificial devices like pauses to separate the letters. for example. Optimal Prefix Codes We’ve been doing all this because some letters are more frequent than others. Consider the prefix code ya given by . we say that a prefix code for a set S of letters is a function y that maps each letter x ~ S to some sequence of zeros and ones. A recipient of this message. suppose we have a text with the letters S = {a. o As soon as you’ve seen enough bits to match the encoding of some letter.o5. fa=. the average number of bits required per letter. To eliminate this problem. would begin reading from left to right. c. ~x~S fx = 1. over all letters x ~ S. it is enough to map letters to bit strings in such a way that no encoding is a prefix of any other.2+. This must be the correct first letter. f~=. Now suppose we have a text consisting of a sequence of letters xlx2x3 ¯ ¯ ¯ x~. we now introduce some notation to express the frequencies of letters. To make this objective precise.3+. we would need three bits per letter for a fixed-length encoding. that is. y ~ S.18. nfx of these letters are equal to x.8 Huffman Codes and Data Compression 165 164 Prefix Codes The ambiguity problem in Morse code arises because there exist pairs of letters where the bit string that encodes one letter is a prefix of the bit string that encodes another. there is a frequency fx. (x) used to encode x. It is interesting to compare this to the average number of bits per letter using a fixed-length encoding. Thus. Suppose that for each letter x ~ S.25. For example.25.3 =2.Chapter 4 Greedy Algorithms 4. suppose we are trying to encode the set of five letters S = {a.(x)[ = n ~ fx[y(x)l. to encode the given text. Yl is not the best we can do in this example. Concretely. in fact.25. knowing y~. 0000011101. using the code ~1 reduces the bits per letter from 3 to 2. next they will conclude that the second letter is e. To continue the earlier example. In this way. since two bits could only encode four letters.32. Using Iy(x)l to denote the length y(x). the sequence }. what is the total length of our encoding? This is simply the sum. and their frequencies are as follows: £=. d.~8. representing the fraction of letters in the text that are equal to x. c.2+. then clearly no encoding can be a prefix of any other. If we then hand this message to a recipient who knows the function y. We notice that the frequencies sum to 1. so the recipient concludes that the first letter is c.20. (Note that a fixed-length encoding is a prefix code: if all letters have encodings of the same length. of the number of times x occurs times the length of the bit string }. In other words.

Further suppose that the number of leaves is equal to the size of the alphabet S. it is useful to develop a tree-based means of representing prefix codes that exposes their structure more clearly than simply the lists of function values we used in our previous examples.27) The enCoding of S Constructed from T is a prefix code. For each letter x ~ S.18. (As fwo bits of notational convenience. we will drop the subscript T when it is clear from context. the labeled tree in Figure 4. has a different structure. and all letters y ~ S whose encodlngs begin with a 1 will be leaves in the right subtree of the root. We now build these two subtrees recursively using this rule. each time the path goes from a node to its left child. we call such a tree a binary tree.20 ¯ 2 + .011 g0(c) = 010 g0(d) = 001 g0(e) = 000 To see this. it includes all possible ways of mapping letters to bit strings. we would like to produce a prefix code that is as efficient as possible--namely. [] This relationship between binary trees and prefix codes works in the other direction as well. g2(d) = 001 g2(e) = 000 The average number of bits per letter using gz is . As a first step. By similar reasoning.x~S fxlg(x)l. which isn’t possible if x is a leaf. we follow the path from the root to the leaf labeled x. this average quantity has a natural interpretation in the terms of the structure of T: the length of the encoding of a letter x ~ S is simply the length of the path from the root to the leaf labeled x. but this rapidly becomes infeasible. all letters x ~ S whose encodings begin with a 0 will be leaves in the left subtree of the root. that minimizes the average number of bits per letter. So now it is natural to state the underlying question. note that the leaf labeled a is obtained by simply taking the righthand edge out of the root (resulting in an encoding of !). For alphabets consisting of an extremely small number of letters. and analogous explanations apply for b.16(c) corresponds to the prefix code g2 defined earlier.) Thus we dre seeking the labeled tree that minimizes the weighted average of the depths of all leaves..2 + .3 = 2. We will use ABL(T) to denote this quantity. We take the resulting string of bits as the encoding of x. and we will denote the depth of a leaf u in T simply by depthw(u). We will refer to the length of this path as the depth of the leaf.32. we write down a 0.3 4. as follows. We will call such a prefix code optimal. together with a labeling of the leaves of T. we can build a binary tree recursively as follows. We start with a root. Now we observe (4. subiect to the defining property of prefix codes. the leaf labeled e is obtained by taMng three successive left-hand edges starting from the root. and each time the path goes from a node to its right child. the path from the root to x would have to be a prefix of the path from the root . The tree for go. and the labeled tree in Figure 4. For example. In order for the encoding of x to be a prefix of the encoding of y. c. on the other hand. and we label each leaf with a distinct letter in S. and d. f! Designing the Algorithm The search space for this problem is fairly complicated.05. But this is the same as saying that x would lie on the path from the root to y.25.23.1 go(b) -. where the average is weighted by the frequencies of the letters that label the leaves: ~x~s Ix" depthw(X). Given a prefix code g. Thus the search for an optimal prefix code can be viewed as the search for a binary tree T. Note also that the binary trees for the two prefix codes gl and g2 are identical in structure.8 Huffman Codes and Data Compression 167 g2(a) = 11 g2(b) = 10 g2(c) = 01 to y. one can see that the labeled tree in Figure 4.16(b) corresponds to the prefix code gl defined earlier. only the labeling of the leaves is different. Moreover. We now describe a greedy method to construct an optimal prefix code very efficiently. Proof. it is feasible to search this space by brute force.2 -k . Given an alphabet and a set of frequencies for the letters. Representing Prefix Codes Using Binary Trees Suppose we take a rooted tree T in which each node that is not a leaf has at most two children.16(a) corresponds to the prefix code g0 specified by go(a) -. we write down a 1. Such a labeled binary tree T naturally describes a prefix code. a prefix code that minimizes the average nu}nber of bits per letter ABL(g) = ~_.166 Chapter 4 Greedy Algorithms 4. and we will often use a letter x ~ S to also denote the leaf that is labeled by it.

two of the major early figures in the area of information theory. So the prefix code corresponding to T’ has a smaller average number of bits per letter than the prefix code for T.05. c.16(b). Figure 4. contradicting the optimality of T. If u is not the root.25. but for our present purposes they represent a kind of dead end: no version of this top-down splitting strategy is guaranteed to always produce an optimal prefix code. then we can try for a split that is as nearly balanced as possible. and again there is a unique way to split the set into two subsets of equal frequency. and sticking a 1 in front of the encodings we produce for $2. For this fact. we need to continue recursively. let’s note a simple fact about the optimal tree. We then recursively construct prefix codes for S1 and S2 independently. fb=. (4. d. c. this would mean sticking a 0 in front of the encodings we produce for S1. e}. These types of prefix codes can be fairly good in practice. (In terms of bit strings. such that the total frequency of the letters in each set is exactly ½. This is what will give us a small average leaf depth. we need a definition: we say that a binary tree is full if each node that is not a leaf has two children. This is easy to prove using an exchange argument. A natural way to do this would be to try building a tree from the top down by "packing" the leaves as tightly as possible. . fd=. we can use a single bit to encode each. let w be the parent of u in T. d} and {b. e} and frequencies fa=. named after Claude Shannon and Robert Fano. Now convert T into a tree T’ by replacing node u with v.168 Chapter 4 Greedy Algorithms 4. b. b. there are no nodes with exactly one chiAd. given by the labeled tree in Figure 4.18. (In other words. Now we delete node u and make v be a child of w in place of u. c. and make these the two subtrees of the root. If u was the root of the tree. fc=.20. d. and suppose it contains There is a unique way to split the alphabet into two sets’ of equal frequency: {a. we simply delete node u and use u as the root. Consider again our example with the five-letter alphabet S = {a.16(c).32. and we’ve already seen that 1~ is not as efficient as the prefix code ~2 corresponding to the labeled tree in Figure 4. e}. So suppose we try to split the alphabet S into two sets S1 and S2. The resulting encoding schemes are called Shannon-Fano codes. but there are ways to make this precise. and (c) of the figure depict three different prefix codes for the alphabet S = {a.) Note that all three binary trees in Figure 4. (b). As a first step in considering algorithms for this problem.16 Parts (a). our goal is to produce a labeled binary tree in which the leaves are as close to the root as possible. To be precise. For {b. For {a.16 are full.8 Huffman Codes and Data Compression 169 a node u with exactly one child u. If such a perfect split is not possible. re=. The resulting code corresponds to the code gl.) It is not entirely clear how we should concretely define this "nearly balanced" split of th6 alphabet. Let T denote the binary tree corresponding to the optimal prefix code. el. which deals with representing and encoding digital information. we need to distinguish two cases. and it does notaffect other leaves. d}.28) The binary tree corresponding to the optimal prefix code is full. Proof. This change decreases the number of bits needed to encode any leaf in the subtree rooted at node u. c. [] A First Attempt: The Top-Down Approach Intuitively.

in which :the two lowest-frequency letters are assigned to leaves that are Siblings in T*. Proof. that one knows something partial about the optimal solution. We now describe the ideas leading up to the greedy approach that Huffrnan discovered for producing optimal prefix codes. it is useful to ask: What if someone gave us the binary tree T* that corresponded to an optimal prefix code. How hard is this? In fact. leaf u is labeled with y ~ S and leaf v is labeled with z ~ S. The leaves at this level will get the lowest-frequency letters. but about the very end.fz). We first take all leaves of depth 1 (if there are an. we would need to figure out which letter should label which leaf of T*.29). If w were not a leaf. Specifically. we will see in fact that this technique is a main underpinning of the dynamic programming approach to designing algorithms. and since there are exponentially many possible trees (in the size of the alphabet). we aren’t going to be able to perform a brute-force search over all of them.29) gives us the following intuitively natura!. Then fy >_ fz. If fy < fz. ~. such that depth(u) < depth(v).8 Huffman Codes and Data Compression Statement (4. m We can see the idea behind (4.depth(u))(fy . contradicting the supposed optimality of the prefix code that we had before the exchange. But how is all this helping us? We don’t have the structure of the optimal tree T*. this change is a negative number. at the time a graduate student who learned about the question in a class taught by Fano. in Chapter 6. suppose that in a labeling of T* corresponding to an optimal prefix code. will get to the level containing v and w last. In fact. but not the labeling of the leaves? To complete the solution.170 Chapter 4 Greedy Algorithms Shannon and Fano knew that their approach did not always yield the optimal prefix code.28) T* is a till binary tree. with the leaves of maximum depth--the ones that receive the letters with lowest frequency. If ~fy < fz. (4.~BL(T*)= ~x~S fx depth(x). But then w’ would have a depth greater than that of v. so u has another child w.29) in Figure 4. the effect of this exchange is as follows: the multiplier on fy increases (from depth(u) to depth(v)). as a thought experiment. but they didn’t see how to compute the optimal code without brute-force search. and by (4. Proof. there would be some leaf w’ in the subtree below it.30) w is a leaf of T*. with corresponding tree T*. since they have a common parent. Now. among the labels we assign to a block of leaves all at the same depth. assigning letters in order of decreasing frequency. with the leaves of minimum depth. We begin by formulating the following basic fact.29) rules out for an optimal solution. Since we have already argued that the order in which we assign these letters to the leaves within this level doesn’t matter. then consider the code obtained by exchanging the labels at the nodes u and v. and then to see how one would make use of this partial knowledge in finding the complete solution. we have (4. We then take all leaves of depth 2 (if there are any) and label them with the next-highest-frequency letters in any order. and the multiplier on fz decreases by the same amount (from depth(v) to depth(u)). and optimal. Thus the change to the overall sum is (depth(v) . this is quite easy. Further. way to label the tree T* if someone should give it to us. and so the choice of assignment among leaves of the same depth doesn’t affect the average number of bits per letter. Since the depths are all the same.) For the current problem. This has a quick proof using an exchange argument. The point is that this can’t lead to a suboptimal labeling of T*. 171 What If We Knew the Tree Structure of the Optimal Prefix Code? A technique that is often helpful in searching for an efficient algorithm is to assume. 4. Leaf u has a parent u. there is an optimal labeling in which u and w get the two lowest-frequency letters of all. contradicting our assumption that v is a leaf of maximum depth in T*. (4. (Later. since any supposedly better labeling would be susceptible to the exchange in (4. We sum this up in the following claim.31) There is an optimal prefix code. the corresponding multipliers in the expression Y~x~s fxlY (x) l are the same. We refer to v and w as siblings. . Having a lower-frequency letter at a strictly smaller depth than some other higher-frequency letter is precisely what (4. our reasoning about T* becomes very useful if we think not about the very beginning of this labeling process. . So v and w are sibling leaves that are as deep as possible in T*. and then we’d have our code. consider a leaf v in T* whose depth is as large as possible. In the expression for the average number of bits per letter.y) ~nd label them with the highest-frequency letters in any order. it doesn’t matter which label we assign to which leaf. It is also crucial to note that. Thus our level-by-level process of labeling T*. The problem was solved a few years later by David Huffman.29) Suppose that u and v are leaves of T*. We continue through the leaves in order of increasing depth. as justified by (4.29). !6 (b): a quick way to see that the code here is not optimal is to notice that it can be improved by exchanging the positions of the labels c and d.

16(c). on the other hand. A concrete description of the algorithm is as follows. 0 0"~-~Tw° l°west-frequency letters ) Figure 4. That approach was based on a top-down strategy that worried first and foremost about the top-level split in the binary tree--namely.172 Chapter 4 Greedy Algorithms 4. c. however. Next we merge a and b. Essentially. with given frequencies: If S has two letters then Encode one letter using 0 and the other letter using I Else Let y* and z* be the two lowest-frequency letters Form a new alphabet S’ by deleting y* and z* and replacing them with a new letter ~ of frequency ~. we pause to note some further observations about the algorithm. ~=. We now have an instance of the problem on the four letters S’ = {a. This gives us the three-letter alphabet {a. b.05 = . (We can break ties in the frequencies arbitrarily.. We refer to this as Huffman’s Algorithm. d. the algorithm forms a natural contrast with the earlier approach that led to suboptimal Shannon-Fano codes. Rather. using (4. equivalent problem with one less letter.31). Moreover. and this is enough to produce a new.20.17 There is an optimal solution in which the two lowest-frequency letters labe! sibling leaves. it says that it is safe to "lock them together" in thinking about the solution. c. We recursively find a prefix code for the smaller alphabet. ÷ ~* Kecursively construct a prefix code Z’ for S’.23 = . To construct a prefix code for an alphabet S. e} and frequencies . In effect. (cde)}. with tree T’ Define a prefix code for S as fol!ows: Start with T’ . 5=. it is natural to try establishing optimaliW by induction An Algorithm to Construct an Optimal Prefix Code Suppose that y* and z* are the two lowest-frequency letters in S. it will not be difficult to prove that the algorithm in fact produces an optimal prefix code. b. and the prefix code that it produces for a given alphabet is accordingly referred to as a Huffman code.43. it is clear that this algorithm always terminates.o5. we get the tree pictured in Figure 4. so in the next step we merge these into the single letter (cde) of frequency . because we know they end up as sibling leaves below a common parent.17. If we unfold the result back through the recursive calls. Before doing this.20 + . at the time we merge these two letters. deleting them and labeling their parent with a new letter having t~e combined frequency yields an instance ~th a smaller alphabet. The two lowest-frequency letters in S’ are c and (de). we don’t know exactly how they will fit into the overall code.31) is important because it tells us something about where y* and z* go in the optim!l solution. this common parent acts like a "meta-letter" whose frequency is the sum of the frequencies of y* and z*. we simply commit to having them be children of the same parent. In general.) Statement (4. invoking itself on smaller and smaller alphabets. obtaining an alphabet that is one letter smaller. Moreover. and then "open up" the meta-letter back into y* and z* to obtain a prefix code for S. ’. since it simply invokes a recursive call on an alphabet that is one letter smaller. This directly suggests an algorithm: we replace y* and z* with this metaletter.18 + . It is interesting to note how the greedy rule underlying Huffman’s Algorithm--the merging of the two lowest-frequency letters--fits into the structure of the algorithm as a whole. follows a bottom-up approach: it focuses on the leaves representing the two lowest-frequency letters~ and then continues by recursion. Since the algorithm operates recursively. b. the two subtrees directly below the root.8 Huffman Codes and Data Compression 173 Take the leaf labeled ~ and add two children below it labeled y* and z* Endif letter with sum of ffequenciesJ . and this gives us a two-letter alphabet. The algorithm would first merge d and e into a single letter--let’s denote it (de)--of frequency .18. Huffman’s Algorithm. (de)}. First let’s consider the behavior of the algorithm on our sample instance with S = {a. ~ Analyzing the Mgorithm The Optimality of the Algorithm We first prove the optimaliW of Huffman’s Mgorithm. This recursive strategy is depicted in Figure 4.23. at which point we invoke the base case of the recursion.

The recursive calls of the algorithm define a sequence of k . the number of letters in the alphabet. and by induction produces an optimal prefix code for S’.8 Huffman Codes and Data Compression 175 on the size of the alphabet.(O))) q- depthT. If we delete the leaves labeled y* and z* from Z. the tree Z is obtained from ZI by adding leaves for y* and z* below to.. Even without being careful about the implementation. which contradicts the optimality of T’ as a prefix code for S’. using each letter’s frequency as its key. and so summing this over the k .depthT.fz*)" (1 q.[to.. (Note that the former quantity is the average number of bits used to encode letters in S. This means that there is some labeled binary tree Z Extensions The structure of optimal prefix codes. .(X) x~y*.174 Chapter 4 Greedy Algorithms .. Summing over all k iterations. and each iteration except the last consists simply of identifying the two lowest-frequency letters and merging them into a single letter that has the combined frequency. and by (4. which has been our focus here. It then extends this into a tree T for S..33) The Huffinan code for a given alphabet achieves the minimum average number of bits per letter of any prefix code. Also.depthr. Thus we can maintain the alphabet S in a priority queue. Clearly it is optimal for all two-letter alphabets (since it uses only one bit per letter). each with a numerical key. Implementation and Running Time It is clear that Huffman’s Algorithm can be made to run in polynomial time in k. stands as a fundamental result in the area of data compression.) (4.depthriv*) + fz*" depthr(z*) + ~ ~. and consider an input instance consisting of an alphabet S of size Let’s quickly recap the behavior of the algorithm on this instance. each iteration--which performs just three of these operations--takes time O(log/0. Proof. in time O(k). we have ABL(T) = ~ ~" depthr(X) = f~. the depths of y* and z* in T are each one greater than the depth of o) in T’. we get a total running time of O(k log k).1 iterations over smaller and smaller alphabets. Proof. But in fact Huffman’s Algorithm is an ideal setting in which to use a priority queue.z* = L + ~ ]’x" depthr’(X) xES~ = ]:to q. thus the identity in (4.31).~BL(Z’) < ABL(T’). + fz*. and label their former parent with w. Using an implementation of priority queues via heaps.. But we have assumed that ABL(Z) < ABL(T).32) ABL(T’) = ABL(T) -. we get a tree Z’ that defines a prefix code for S’. So suppose by induction that it is optimal for all alphabets of size/~ . Recall that a priority queue maintains a set of/c elements. z* is the same in both T and T’. hence. there is such a tree Z in which the leaves representing y* and z* are siblings. It is now easy to get a contradiction. and then we insert a new letter whose key is the sum of these two minimum frequencies.. Using this. In the same way that T is obtained from T’. as follows. while the latter quantity is the average number of bits used to encode letters in S’. The depth of each lefter x other than y*. In each iteration we just extract the minimum twice (this gives us the two lowest-frequency letters). we now prove optimality as follows. Suppose by way of contradiction that the tree T produced by our greedy algorithm is not optimal.. subtracting/:to from both sides of this inequality we get . . Our priority queue now contains a representation of the alphabet that we need for the next iteration.1 iterations gives O(k2) time. by attaching leaves labeled y* and z* as children of the node in T’ labeled There is a close relationship between ABL(T) and ABL(T’). (4. z* ~ S into a single letter o0.32) applies to Z and Z’ as well: ABL(Z’) = ABL(Z) -.1. The algorithm merges the two lowest-frequency letters y*. and it allows for the insertion of new elements and the extraction of the element with the minimum key. identifying the lowest-frequency letters can be done in a single scan of the alphabet. But it is important to understand that this optimality result does not by any means imply that we have found the best way to compress data under all circumstances. depthT(X) x-aY*r. represented by a labeled binary tree T’.ABE(T/). calls itself recursively on the smaller alphabet S’ (in which y* and z* are replaced by a)).fro- such that ABL(Z) < ABL(T). we can make each insertion and extraction of the minimum run in time O(log k). Using this. plus the fact that [to = fy. 4. as in Chapter 2.z* ]’x" depthr’(X) x~-y*.Z* = (fy* q.(~o)) + = ]’to" (1 q.

suppose that a typical image is almost entirely white: roughly 1. letter over a long run of text that follows. is that they cannot adapt to changes in the text. A second drawback of prefix codes. and all letters are equally frequent. E) be a directed graph in which we’ve distinguished one node r ~ V as a root. and the rest are white. o Halfway into the sequence. sending a list of (x. as follows. Let G = (V. Specifically. c. But what’s really happening in this example is that the frequency remains stable for half the text. and then process everything in a one-pass fashion. Many greedy algorithms make some sort of an initial "ordering" decision on the input.4. d}. In particular. the letters a and b occur equally frequently. however.9 Minimum-Cost Arborescences: A Multi-Phase Greedy Algorithm 177 176 Chapter 4 Greedy Algorithms What more could we want beyond an optimal prefix code? First. many of the improvements to Huffman codes just described come with a corresponding increase in the computational effort needed both to produce the compressed version of the data and also to decompress it and restore the original text. the whole approach of prefix codes has very little to say: we have a text of length one million over the two-letter alphabet {black. consider an application in which we are transmitting black-and-white images: each image is a 1.000 array of pixels. .000 of the million pixels are black. which change the encoding in midstream.~J The Problem The problem is to compute a minimum-cost arborescence of a directed graph. insert some kind of instruction that says. Now.000-by-l. while c and d do not occur at all. graphs. there is a trade-off between the power of the compression technique and its computational cost. There are results in the area of data compression. There is a useful equivalent way to characterize arborescences. as defined here. d}. It is clear. myopic rule. F) such that T is a spanning tree of G if we ignore the direction of edges. we’ve come to appreciate that there can be considerable diversity in the way they operate.) The challenge here is to define an encoding scheme where the notion of using fractions of bits is well-defined. In the framework developed in this section. plus a bit of extra overhead. o Begin with an encoding in which the bit 0 represents a and the bit 1 represents b. Suppose we are trying to encode the output of a program that produces a long sequence of letters from the set {a. and then it changes radically. This is essentially an analogue of the Minimum Spanning Tree Problem for directed. since it still constructs a solution according to a local. and this is as follows. "We’re changing the encoding now. and each pixel takes one of the two values black or white. but in the second half of this sequence. Such approaches. Thus each would be encoded with two bits. that do iust this. Further. Figure 4. the letters c and d occur equally frequently. we will see that the move to directed graphs introduces significant new complications. In many of these cases. The point is that investing a small amount of space to describe a new encoding can pay off many times over if it reduces the average number of bits per . and there is a path in T from r to each other node v ~ V if we take the direction of edges into account. So one could get away with iust one bit per letter. Intuitively. In this section. at the cost of using multiple bits for each black pixel. We begin with the basic definitions. arithmetic coding and a range of other techniques have been developed to handle settings like this. Again let’s consider a simple example. white}. we are trying to compress a text over the four-letter alphabet {a. * 4. that such images should be highly compressible. Others make more incremental decisions--still local and opportunistic. b. and for many kinds of data they lead to significant improvements over the static method we’ve considered here. though. b.9 Minimum-Cost Arborescences: A Multi-Phase Greedy Algorithm As we’ve seen more and more examples of greedy algorithms. since they are so overwhelmingly frequent. the text is already encoded using one bit per letter--the lowest possible in our framework. while a and b do not occur at all. At the same time. the bit 0 represents c and the bit I represents d:’ o Use this new encoding for the rest of the sequence. if we wanted to compress such an image. the style of the algorithm has a strongly greedy flavor. An arborescence (with respect to r) is essentially a directed spanning tree rooted at r. rather than undirected. These issues suggest some of the directions in which work on data compression has proceeded. (In an extreme version. but without a g!obal "plan" in advance. y) coordinates for each black pixel would be an improvement over sending the image as a text with a million bits. it is a subgraph T = (V. c. Finding the right balance among these trade-offs is a topic of active research. are called adaptive compression schemes. As a result. Further suppose that for the first half of this sequence. one ought to be able to use a "fraction of a bit" for each white pixel.18 gives an example of two different arborescences in the same directed graph. From now on. we consider a problem that stresses our intuitive view of greedy algorithms still further.

For example. the edges in a breadth-first search tree rooted at r will form an arborescence.9 Minimum-Cost Arborescences: A Multi-Phase Greedy Algorithm 179 The basic problem we consider here is the following.t4) A subgraph T = (V.19 shows. this can be easily checked at the outset. just as every connected graph has a spanning tree. for the graph in part (a). E). But r is the only node without incoming edges. both rooted at node r. We are given a directed graph G = (V. including the edge of cost 1 in Figure 4.35). Thus it’s natural to start by asking whether the ideas we developed for that problem can be carried over directly to this setting. suppose T has no cycles. Indeed.18 A directed graph can have many different arborescences.178 Chapter 4 Greedy Algorithms 4. a directed graph has an arborescence rooted at r provided that r can reach every node. and each node v # r has exactly one entering edge. there is exactly one edge in F that enters v. We start at v and repeatedly follow edges in the backward direction. and thus this process must terminate. 2 10 10 4 (a) (4. L4~ Designing the Algorithm Given the relationship between arborescences and trees. Here is how to construct such a path. we need only show that there is a directed path from r to each other node v. in this case. then indeed every other node v has exactly one edge entering it: this is simply the last edge on the unique r-v path. the minimum-cost arborescence problem certainlyhas a strong initial resemblance to the Minimum Spanning Tree Problem for undirected graphs. by (4. If T is an arborescence with root r. Indeed. (We will refer to this as an optimal arborescence. In order to establish that T is an arborescence. But even if the cheapest edge in G belongs to some arborescence rooted at r. the sequence of nodes thus visited yields a path (in the reverse direction) from r to v.19 (a) A directed graph with costs onits edges.!9 would prevent us from including the edge of cost 2 out of the root r (since there can only be one entering edge per node). as the example of Figure 4. (4. Since T has no cycles. and (b) an optimal arborescence rooted at r for this graph. Conversely. we can never return tO a node we’ve previously visited.t5) A directed graph G has an arborescence rooted at r if and only if the¢e _ Figure 4. and so the process must in fact terminate by reaching r. . and for each node v ~ r. must the minimum-cost arborescence contain the cheapest edge in the whole graph? Can we safely delete the most expensive edge on a cycle. Parts (b) and (c) depict two different aborescences. with a distinguished root node r and with a nonnegative cost ce >_ 0 on each edge. since the arborescence we’re seeking is not supposed to have any edges entering the root.) We will assume throughout that G at least has an arborescence rooted at r. and we wish to compute an arborescence rooted at r of minimum total cost. and this in turn would force us to incur an unacceptable cost of 10 when we included one of Figttre 4. F) of G is an arborescence with respect to root r if and only if T has no cycles. m It is easy to see that. Proof. it need not belong to the optimal one. confident that it cannot be in the optimal arborescence? Clearly the cheapest edge e in G will not belong to the optimal arborescence if e enters the root.

In this case. This kind of argument never clouded our thinking in the Minimum Spanning Tree Problem. This suggests that we can afford to use as many edges from C as we want (consistent with producing an arborescence). F*) may not be an arborescence. eaT eaT \ This is because an arborescence has exactly one edge entering each node in the sum.#r y~--that is. . F) in G by adding all but one edge of C ~ Analyzing the Algorithm It is easy to implement this algorithm so that it runs in polynomial time. it suggests that finding the optimal arborescence may be a significantly more complicated task. plus a single node c* representing C. This means. obtaining a smaller graph G’ = (V’. F*) represents the cheapest possible way of making these choices. it’s just that our myopic rule for choosing edges has to be a little more sophisticated. one can even cause the optimal arborescence to include the most expensive edge in the whole graph. however. So the difficulty is that (V. F*) is an arborescence. More crucially. This can result in G’ having parallel edges (i.1 edges. we have the following fact2 (4. then return it Else there is a directed cycle C_CF* Contract C to a single supernode. Now consider the subgraph (V. then the total cost of every arborescence changes by exactly the same amount. (It’s worth noticing that the optimal arborescence in Figure 4.e. For each edge e = (u. We transform each edge e E E to an edge e’ E E’ by replacing each end of e that belongs to C with the new node c*. Thus let Yv denote the minimum cogt of any edge entering v. we delete self-loops from E’--edges that have both ends equal to c*. our discussion motivates the following fact. what matters is the cost of all other edges entering v relative to this. essentially. In summary. Since we know that the optimal arborescence needs to have exactly one edge entering each node v # r. since including edges from C doesn’t raise the cost. select the cheapest edge entering v (breaking ties arbitrarily). with cost ce >_ O.) Despite this.4. E’). (4. V’ contains the nodes of V-C.Yv. Every arborescence contains exactly one edge entering each node v # r. We contract C into a single supemode. it is possible to design a greedy type of algorithm for this problem. The difference between its cost with costs (ce} and [c’e} is exactly ~. Thus our algorithm continues as follows. and so if (V. where it was always safe to plunge ahead and include the cheapest edge.37) T is an optimal arborescence in G subject to costs (Ce) if and Only if it is an optimal arborescence Subject to the modified costs c’ ProoL Consider an arbitrary arborescence T.19 also includes the most expensive edge on a cycle. Here’s a particular version of this strategy: for each node v # r.F to an arborescence (V. so if we pick some node v and subtract a uniform quantity from the cost of every edge entering v. Here. obtaining a set F* If F* forms an arborescence. F*) must contain a cycle C.34) implies that (V.. First let’s consider a little more carefully what goes wrong with the general strategy of including the cheapest edges.Note that since ce >_ y. and (V.F’) in G’ with costs [C’e} ~) Extend (V’. which is fine. we see that T has minimum cost subiect to {ce} if and only if it has minimum cost subject to {c’e}. v). and let F* be this set of n . . we need to worry about the following point: not every arborescence in G corresponds to an arborescence in the contracted graph G’. we begin with the following observation.36) I[ (V. To make matters somewhat clearer. which does not include the root. then it is a minimum-cost arborescence. We now must decide how to proceed in this situation. Could we perhaps "miss" the true optimal arborescence in G by focusing on G’? What is true is the following. All the edges in our set F* have cost 0 under these modified costs. Since the difference between the two costs is independent of the choice of the arborescence T. But does it lead to an optimal arborescence? Before concluding that it does. (4. We now consider the problem in terms of the costs {de}.9 Minimum-Cost Arborescences: A Multi-Phase Greedy Algorithm 180 Chapter 4 Greedy Algorithms 181 the other edges out of r. We recursively find an optimal arborescence in this smaller graph G’. subject to the costs {C’e}...E’) Recursively find an optimal arborescence (V’. here is the full algorithm. F*). edges with the same ends). F*) contains a cycle C. we know that all edges in C have cost 0. The arborescence returned by this recursive call can be converted into an arborescence of G by including all but one edge on the cycle C. we define its modified cost c’e to be ce . that the actual cost of the cheapest edge entering v is not important. all the modified costs are still nonnegafive. yielding a graph G’= (V’. with a different construction. For each node u7&r Let Yu be the minimum cost of an edge entering node Modify the costs of all edges e entering v to c’e=ce Choose one 0-cost edge entering each u7~r.

We show how to modify it to obtain a. and these corresponding arborescences have the same cost with respect to {c’e}. . for obvious reasons. thus we can reach v by first reaching e and then following the edges of the cycle C. We’ll assume that the potential stopping points are located at distances xl. So T’ has exactly n . Proof. such that r ~ C. then we are done. v) enters C if v belongs to C but u does not. Finally. otherwise. On a map they’ve identified a large set of good stopping points for camping. x2 . Otherwise. weather conditions. We’ll say that a set of stopping points is valid if the distance between each adjacent pair is at most d. (4." Is this true? The proposed system is a greedy algorithm.37). and we wish to determine whether it minimizes the number of stops needed. then they keep hiking. Despite many significant drawbacks.. To make this question precise. we consider the problem with the modified costs {c’e}. they claim this system does have one good feature. let’s make the following set of simplifying assumptions. If v ~ C. the algorithm produces an optimal arborescence in G’ by the inductive hypothesis. xn from the start of the trail. Let T’ denote the resulting subgraph of G. let tv be the last node in P C~ C. hence if we can show there is an path in T’ for each v. Now suppose that v C. Thus it would satisfy our initial definition of an arborescence. The algorithm finds an optimal arborescence robted at ~ in G: ’: Proof. then it sti!l exists in T’. We do this now. Let e = (a. After contracting a 0-cost cycle C to obtain a smaller graph G’. If they can make it. then the algorithm returns an optimal arborescence by (4.36). and they’re considering the following system for deciding when to stop for the day. there is an optimal arborescence in G that corresponds to the optimal arborescence computed for G’. Observe that all the edges in P’ still exist in T’. We have already argued that u~ is reachable from r in T’. b) be an edge entering C that lies on as short a path as possible from r. So why is T’ an arborescence? Observe that T’ has exactly one edge entering each node v # r. and the last is at distance at most d from the end of the trai!. They want to hike as much as possible per day but. since the cost of T’ is clearly no greater than that of T: the only edges of T’ that do not also belong to T have cost 0. The proof is by induction on the number of nodes in G. suppose that T enters C more than once. except for the edge e.) So to prove that our algorithm finds an optimal arborescence in G. and let P denote the r-v path in T." they claim..n arborescence of no greater cost that enters C exactly once. not after dark. So consider any node v # r. Consider an optimal arborescence T in G. and hence a tree. they stop. Otherwise. We add in all edges of C except for the one edge that enters b.38) Let C be a cycle in G consisting of edges of cost O. . by (4. "it minimizes the number of camping stops we have to make. we must prove that G has an optimal arborescence with exactly one edge entering C.1 edges. This will establish the result. this means in particular that no edges on the path from r to a can enter C. we must show there is an r-v path in T’.. Each time they come to a potential stopping point. Since r has a path in T to every node. We’ll model the Appalachian Trail as a long line segment of length L. and so forth). have decided to hike the Appalachian Trail this summer. Concatenating this path to tv with the subpath P’ gives us a path to v as well.38). We delete all edges of T that enter C. If P did not touch C. there is at least one edge of T that enters C. Otherwise. Thus a set of stopping points is valid if one could camp only at these places and We can now put all the pieces together to argue that our algorithm is correct. We claim that T’ is also an arborescence. then T’ must be connected in an undirected sense. the first is at distance at most d from the start of the trail. since C consists of 0cost edges. and no edge entering r. . (We say that an edge e = (u. and assume that your friends can hike d miles per day (independent of terrain.. inspired by repeated viewings of the horror-movie phenomenon The Blair Witch Project. we can use the fact that the path in T from r to e has been preserved in the construction of T’. "Given that we’re only hiking in the daylight. they determine whether they can make it to the next one before nightfall. If the edges of F form an arborescence. ~ Solved Exercises Solved Exercise 1 Suppose that three of your friends. Then there is an optimal arborescence rooted at r that has exactly one edge entering C. If T enters C exactly once. since it belongs to C. and let P’ be the subpath of P from tv to v.. the head of edge e. which is equivalent by (4. We’ll also assume (very generously) that your friends are always correct when they estimate whether they can make it to the next stopping point before nightfall.182 Chapter 4 Greedy Algorithms Solved Exercises 183 The arborescences of G’ are in one-to-one correspondence with arborescences of G that have exactly one edge entering the cycle C.

otherwise. We now turn this into a proof showing the algorithm is indeed optimal. no other solution can catch up and overtake it the next day. Combining these two inequalities. Let R = {xpl .l > xqj_l by the induction hypothesis.. d.1. they are all becoming more expensive according to exponential growth curves: in particular. and so we have proved that the greedy algorithm produces a valid set of stopping points of minimum possible size.. and suppose by way of contradiction that there is a smaller valid set of stopping points. (4.. The case j = 1 follows directly from the definition of the greedy algorithm: your friends travel as long as possible .. Due to regulations. We can now state the question as follows. there would be no way to make it the whole way. the cost of license] increases by a factor of rj > 1 each month. we have concluded that Xqm < L -.~ SoIation Often a greedy algorithm looks correct when you first encounter it._en that the greedy solution travels as far as possible each day? This last consideration starts to look like the outline of an argument based on the "staying ahead" principle from Section 4. that the full set of n stopping points is valid. it will cost 100. r~.. (Note the similarity with the corresponding proof for the Interval Scheduling Problem: here too the greedy algorithm is staying ahead because. we have This means that your Mends have the option of hiking all the way from Xpi_~ to Xqi in one day. where rj is a given parameter.~ There’s a natural concern with this algorithm: Might it not help to stop early on some day. m. Proof. and then puts on a burst of speed and passes the greedy solution? How could it pass it. identifying a natural sense in which the stopping points it chooses "stay ahead" of any other legal set of stopping points. Perhaps we can show that as long as the greedy camping strategy is ahead on a given day. Solved Exercise 2 Your ~ends are starting a security company that needs to obtain licenses for n different pieces of cryptographic software.. we cannot have m < k. it’s worth noting an interesting contrast with the Interval Scheduling Problem: there we needed to prove that a greedy algorithm maximized a quantity of interest.by the greedy algorithm. However. ri ~ r1 for licenses i # ] (even though they start at the same price of $100). at each step. then n < L -implies to stop at the location Xp..Xp~:~ < xq~ .) i we must have Xp. We prove this by induction on j. if m < k. since S is a valid set of stopping points. xpk} denote the set of stopping points chosen. But if you think about it.184 Chapter 4 Greedy Algorithms Solved Exercises 185 stil! make it across the whole trail.~+~. in the sense that it finds a valid set whose size is as small as possible. it’s useful to ask: why might it not work~. we have Xpj > x~tj. we first show that the stopping point reached by the greedy algorithm on each day j is farther than the stopping point reached under the alternate solution. xqm}. Each license is currently selling for a price of $!00. We’ll assume. so before succumbing too deeply to its intuitive appeal. Although we are following the style of proof from Section 4. Combining these two inequalities. they can only obtain these licenses at the rate of at most one per month..xq~_~ since xp~.lags behind the greedy solution. for otherwise your that Mends never have needed Statement (4. the choice made by the alternate solution is one of its valid options. naturally. giv. and xq~ . but this contradicts the assumption that S is a valid set of stopping points. Then xqj . so as to get better synchronized with camping opportunities on future days~. Could there really be an alternate solution that intentionally. whereas here we seek to minimize a certain quantity.. let’s call this smaller set S = {xq~ . What should we be worried about. That is.xqj_l <_ d. you start to wonder whether this could really happen.d.40) For each j = 1. NOW.1. Now let ] > 1 and assume that the claim is true for all i < j. that is. Is your Mends’ greedy algorithm--hiking as long as possible each day--optimal. This means that if license] is purchased t months from now. and hence the location Xpj at which they finally stop can only be farther along than xq~.. We will assume that all the price growth rates are distinct... on the first day before stopping. 2 .40) in particular Xqmwould <_ Xpm. Consequently. with To obtain a contradiction.

20) says that e is in no minimum spanning tree if it is the most expensive edge on some cycle C.186 Chapter 4 Greedy Algorithms Solved Exercises 187 The question is: Given that the company can only buy at most one license a month. r2 . rn. we will successflflly show that ’ou. it turns out that there are cases in which the optimal solution doesn’t put the rates in either increasing or decreasing order (as in the input 4’ 2’ " while buying them in decreasing order results in a total cost of 100(4 + 32 + 23) ---. In O. let’s suppose that there is an optimal solution O that differs from our solution S. Suppose that instead of buying licenses whose prices increase. the rest of the purchafies are identically priced.43) -= 7. t (In ri. in which order should it buy the licenses so that the total amount of money it spends is as small as possible? Give an algorithm that takes the n rates of price growth rI. and computes an order in which to buy the licenses so that the total amount of money spent is minimized.S. We claim that by exchanging these two purchases. So we want to show that ri+~ + ’t < ri + lt+l rt+ l ~t ~t+l ~t t -. we want to show that the second term is less than the first one. we use rt+l to denote this). When a greedy algorithm works for problems like this. it doesn’t tell us immediately that decreasing order is the right answer.17) says that e is in every minimum spanning tree when it is the cheapest edge crossing from some set S to the complement V .It < tt+l -. Let’s see if we can make use of these two rules as part of an algorithm that solves this problem in linear time. Give an algorithm with running time O(m + n) to decide whether e is contained in a minimum spanning tree of G. Both the Cut and Cycle Properties are essentially talking about how e relates to the set of edges that are cheaper than e. Therefore if we succeed in showing this. S consists of the licenses sorted in decreasing order. Solution Two natural guesses for a good sequence would be to sort the ri in decreasing order. in which we put a set of things in an optimal order. Note: It’s interesting to note that things become much less straightforward if we vary this question even a little. which contradicts the assumption that O was optimal. But this last inequality is true simply because ri > 1 for all i and since rt < rt+P This concludes the proof of correctness.. we need to use an edge that is more expensive than e? And if we think about the cycle C in the statement of the Cycle Property. but our goal was just to eliminate one of the two options. with edge costs that you may assume are all distinct. if we swapped these two purchases. Faced with alternatives like this. .I).I) < r~+t(rt+~ . there must exist two neighboring months t and t + 1 such that the price increase rate of the license bought in month t (let us denote it by rti is less than that bought in month t + 1 (similarly. Solution From the text. we have rt < rt+l.2300. since the sorting takes that much time and the rest (outputting) is linear. To do this here. The running time of your algorithm shonld be polynomial in n. Buying the licenses in increasing order results in a total cost of 100(2 -t. So the overall running time is O(n log n). instead of greater than 1. and the Cycle Property (4. This tells us that increasing order is not the way to go. other words..tt+l t ’ ~t+l t ~t+l r~(rt . The Cut Property can be viewed as asking: Is there some set S __ V so that in order to get from S to V . what is the optimal order in which to sell them? Here.500. so if you sell it t months from now you wil! receive 100. we’ve seen in the text that it’s often effective to try proving correctness using an exchange argument. going the . Here we could try rl = 2. Notice that if we swap these two purchases. (On the other hand. to both expressions. Item i depreciates at a factor of ri < I per month. That is. it’s perfectly reasonable to work out a small example and see if the example eliminates at least one of them. and r3 = 4.r algorithm is indeed the correct one. we know of two rules by which we can conclude whether an edge e belongs to a minimum spanning tree: the Cut Property (4. (In other words. r2 = 3. A particular edge e of G is specified. you’re trying to sell off equipment whose cost is depreciating.32 4..) Let’s try proving that sorting the ri in decreasing order in fact always gives the optimal solution. G has n vertices and m edges. the exponential rates are now less than 1.S without using e.) So this optimal solution O must contain an inversion--that is. we would pay 100(r~+~ + r~+~). we can strictly improve our optimal solution. The running time of the algorithm is O(n log n). or to sort them in increasing order.) If you can only sell one item per month. Since the constant 100 is common Solved Exercise 3 Suppose you are given a connected graph G. the amount paid during the two months involved in the swap is 100(r[ +q+u’-t+~" On the other hand.. starting from $i00.

.

However. So this is the problem they pose to you: Give an algorithm that takes two sequences of even~s--S’ of length m and S of length n. using as few base stations as possible. in order. the supercomputer.) What’s the best order for sending people out.g. then bike 10 miles. quiet country road with houses scattered very sparsely along it.. and a projected running time (the time it will take him or her to complete the 3 miles of running).. buy Oracle Exercises Your friend is working as a camp counselor. the sequence of four events above is a subsequence of the sequence buy Amazon. buy Oracle Their goal is to be able to dream up short sequences and quickly detect whether they are subsequences of S.. are equal to the sequence S’. As soon as this first person is out of the pool.) Each contestant has a projected swimming time (the expected time it will take him or her to complete the 20 laps). Let’s say that the completion time of a schedul~ is the earliest time at which all contestants will be finished with all three legs of the triathalon. with an eastern endpoint and a western endpoint. The plan is to send the contestants out in a staggered fashion. buy Yahoo. Jn. note that participants can bike and run simultaneously. first one contestant swims the 20 laps. Yahoo stock may be bought many times In a single sequence S). gets out. then run 3 miles. J2 . They’ve broken the overall computation into n distinct jobs.190 Chapter 4 Greedy Algorithms Prove that. which can be performed completely Independently of one another. the supercomputer can only work on a single job at a time. (We can picture the road as a long line segment.) Further. as soon as he or she is out and starts biking. so that every house is within four miles of one of the base stations. in order but not necessarily consecutively. labeled 71. buy Yahoo. the finishing of the jobs can be performed fully in para!lel--all the jobs can be processed at the same time. together with an essentia!ly unlimited supply of high-end PCs. a third contestant begins swimming. Let’s consider a long. buy Yahoo. . Purchases at stock exchanges--what’s being bought-are one source of data with a natural ordering in time.. In other words. via the following rule: the contestants must use the pool one at a lime. One of his plans is the following mini-triathalon exercise: each contestant must swim 20 laps of a pool. let’s suppose that despite the bucolic setting. So. Since there are at least n PCs available on the premises. Some of your friends have gotten Into the burgeoning field of time-series data mining. The wildly popular Spanish-language search engine E1 Goog needs to do a serious amount of computation every time it recompiles its index. A given event may occur multiple times in S (e. Your friend wants to decide on a schedule for the triathalon: an order in which to sequence the starts of the contestants. the greedy algorithm currently in use actually minimizes the number of trucks that are needed. As soon as the first job 191 occur in this sequence S. and then it needs to be finished on one of the PCs. followed by f~ seconds of time on a PC. the possible’ transactions) and a sequence S of n of these events. buy Yahoo. You want to place cell phone base stations at certain points along the road.. Let’s say that job J~ needs p~ seconds of time on. a projected biking time (the expected time it will take him or her to complete the 10 miles of bicycling). Given a long sequence S of such events. in which one looks for patterns in sequences of events that occur over time. your friends want an efficient way to detect certain "patterns" in them--for example. the residents of all these houses are avid cell phone users. Your proof should follow the type of analysis we used for the Interval Scheduling Problem: it should establish the optimality of this greedy packing algorithm by identif34ng a measure under which it "stays ahead" of all other solutions. and so on. buy eBay.. and he is in charge of organizing activities for a set of junior-high-school-age campers. they may want to know if the four events buy Yahoo. Each job consists of two stages: first it needs to be preprocessed on the supercomputer. if one wants the whole competition to be over as early as possible? More precisely. biking. Give an efficient algorithm that achieves this goal. (Again. each possibly containing an event more than once--and decides in time O(m + n) whether S’ is a subsequence of S. for a given set of boxes with specified weights. the company has at its disposal a single large supercomputer. assuming they each spend exactly their projected swimming. but at most one person can be in the pool at any time. give an efficient algorithm that produces a schedule whose completion time is as small as possible. and running times on the three parts. so the system managers need to work out an order in which to feed the jobs to the supercomputer. Fortunately..g.. and starts biking. a second contestant begins swimming the 20 laps. buy eBay. They begin with a collection of possible events (e. We will say that a sequence S’ is a subsequence of S if there is a way to delete certain of the events from S so that the remaining events. for example.

This is an important quantity to minimize. since it determines how rapidly E1 Goog can generate a new index. . over a communication link. Assume you are given a minimum-cost spanning tree T in G. Here we formulate the question: Can Kruskal’s Algorithm be made to find all the minimum spanning trees of G? RecaLl that Kxuskal’s Algorithm sorted the edges in order of increasing cost.) For each natural number t > O. and the completion time of the schedule is the earliest time at which all jobs will have finished processing on the PCs. using a fixed parameter r: (. there cannot be any delays between the end of one stream and the start of the next. and any minimum spanning tree T of G. with edge costs that are all distinct. is there a valid execution of Kruskal’s Algorithm onG that produces T as output? Giv. In an earlier problem. E). Suppose you are given a connected graph G. one after another. and so on. and positive edge costs that you may assume are all distinct. G has a unique minimum spanning tree. Let G = (V. Suppose you are given a connected graph G = (V. When some edges have the same cost. whichever order you choose).) imposed by the link. Suppose you have n video streams that need to be sent. because you’re just one user. tv V with cost c. We assume that all the values bi and t~ are positive integers. Let’s say that a schedule is an ordering of the jobs for the supercomputer. E’) be a spanning tree of G. One of the basic motivations behind the’Minimum Spanning Tree Proble~fi is the goal of designing a spanning network for a set of nodes with minimum total cost. Specifically. the phrase "in order of increasing cost" has to be specified a little more carefully: we’Ll say that an ordering of the edges is valid if the corresponding sequence of edge costs is nondecreasing. A spanning tree T of G is a minimum-bottleneck spanning tree ff there is no spanning tree T’ of G with a cheaper bottleneck edge. m edges. We say that a schedule is valid if it satisfies the constraint (. However. You cannot send two streams at the same time. it can be handed off to a PC for finishing. when the second job is done on the supercomputer. then greedily processed edges one by one. Can you do it in O(IVI) time? Please note any assumptions you make about what data structure is used to represent the tree T and the graph G.e a proof or a countere. with a cost ce on each edge e. 12. so you need to determine a schedule for the streams: an order in which to send them. connecting two nodes v. Make your algorithm run in time O(IEI). 10. For any graph G.xample. so it imposes the following constraint. Suppose your schedule starts at time 0 (and therefore ends at time ~1 ti. Now. not for time intervals that start at any other value. (a) Is every minimum-bottleneck tree of G a minimum spanning tree of G? Prove or give a counterexample. 8. Give a linear-time algorithm (time O(IEI)) to update the tree T to the new minLmum-cost spanning tree. it can proceed to a PC regardless of whether or not the first job is done (since the PCs work in parallel). 11. E) be an (undirected) graph with costs ce >_ 0 on the edges e ~ E. 193 in order is done on the supercomputer. at that point in time a second job can be fed to the supercompurer. Prove that G has a tmique minimum spann~g tree. Whichever order you choose. Give a polynomial-time algorithm that finds a schedule with as small a completion time as possible. Note that this constraint is only imposed for time intervals that start at 0. over a period of ti seconds. (b) Is every minimum spanning tree of G a minimum-bottleneck tree of G? Prove or give a counterexample. at a constant rate. Now assume that a new edge is added to G.192 Chapter 4 Greedy Algorithms Exercises Suppose T is no longer the minimum-cost spanning tree. We’Ll say that a valid execution of Kruskal’s Algorithm is one that begins with a valid ordering of the edges of G. we define the bottleneck edge of T to be the edge of T with the greatest cost. Let T = (V. Herewe explore another type of objective: designing a spanning network for which the most expensive edge is as cheap as possible. we saw that when all edge costs are distinct. let G -= (V. the link does not want you taking up too much bandwidth. G may have many minimum spanning trees when the edge costs are not all distinct. adding an edge e as long as it did not form a cycle. Stream i consists of a total of bi bits that need to be sent. (a) Give an efficient algorithm to test if T remains the minimum-cost spanning tree with the new edge added to G (but not to the tree T). E) be a connected graph with n vertices. the total number of bits you send over the time interval from 0 to t cannot exceed rt.

as well as the link parameter r. a kind of converse to the above argument. each specified by its number of bits bi and its time duration ti. (a) Consider the following claim: Claim: There exists a valid schedule if and only if each stream i satisfies bi < rti.n i=1 wiCi" Design an efficient algorithm to solve this problem. That is.3 = 46. and ff job j is done right after job i. the security consultants were engaging in a little back-of-the-envelope reasoning. you al! begin wondering whether something stronger is true as well. and 2000+ 5000 5000. since the constraint (. Suppose that k* is the largest value of k such that one can find a set of k sensitive processes with no two ever running at the same time.1 t = 2: half of the second stream has also been sent. and suppose the link’s parameter is r = 5000. So the company decides that they want to order the jobs to mJnimlze the weighted sum of the completion times. if job j is the first to be donel we would have Ci = tj. (b2. 2. Exercises w2 = 2. and determines whether there exists a valid schedule. each specified by its number of bits bi and its time duration ti. and after some further discussion. (a) Give an efficient algorithm that. 1). t3) = (2000. (We’ll model each invocation of status_check as lasting for only this single point in time.1 + 2.us_check that. Each morning they get a set of jobs from customers. status_check is invoked at least once during the execution of process P. Given a schedule (i. Example.4 = 18. when invoked. while the second job takes time t2 = 3 and has weight . and it rtms continuously between these times. You want to order the jobs so as to minimize the weighted sum of the completion times. (b3. There’s particular interest in keeping track of processes that are labeled "sensitive. Then doing job 1 first would yield a weighted completion time of 10. (b) WtKle you were designing your algorithm. q) = (2000.2 Similar calcalations hold for t = 3 and t = 4. the consultants have a list of the planned start and finish times of al! sensitive processes that will be run that day. They want to do the jobs on their single machine in an order that keeps their customers happiest.) Decide whether you think this claim is true or false. but enough that for each sensitive process P. you are given a set of n jobs with a processing time ti and a weight w~ for each job.a~cus_check. subject to the requirement that s~a~cus_check is invoked at least once during each sensitive process P.) What they’d like to do is to run status_check as few times as possible during the day. let Ci denote the finishing time of job i. 195 14. Give an algorithm that takes a set of n streams. the kind of argument in the previous paragraph is really the only thing forcing you to need a lot of invocations of check. Each customer i also has a given weight wg ~sents his or her importance to the business. is valid. determine whether there exists a valid schedule.e. finds as small a set of times as possible at which to invoke s~. with (hi.us_check at least k times: no one invocation of s~a~cus_check can handle more than one of these processes. (b) A small business--say. Suppose there are two jobs: the first takes time q = ! and has weight wl = !0. For example. and 2000 < 5000.4 + 2. as well as the link parameter r. "Suppose we can find a set of k sensitive processes with the property that no two are ever running at the same time. Suppose we have n = 3 streams.. a photocopying service with a single large machine--faces the following scheduling problem. they’ve written a program called s~ca~. and give a proof of either the claim or its negation. an ordering of the jobs). Then clearly your algorithm will need to invoke s~ca~. and give a proof or a counterexample." This is true. ~P=I wiCiExample. You’re working with a group of security consultants who are helping to monitor a large computer system. 1). ~. Decide whether you think the claim is true or false." Each such process has a designated start time and finish time. The happiness of customer i is expected to be dependent o~ the finishing time of i’s job. 2). t2) = (6000. As a simple first step.us_check so that some invocation occurs during the execution of each sensitive process? (In other words. Customer i’s job will take ti time to complete. given the start and finish times of all the sensitive processes. runs for a few seconds and records various pieces of logging information about all the sensitive processes running on the system at that moment.) is satisfied: t = 1: the whole first stream has been sent. Is it the ~ase that there must be a set of k* times at which you can run s~a~. Given a set of n streams. Then the schedule that runs the streams in the order 1. we would have Ci = Q + ti. while doing the second job first would yield the larger weighted completion time of 10. of course.194 Chapter 4 Greedy Algorithms The Problem. The rtmning time of your algorithm should be polynomial in n. 3.

M. (9 P.M.. There are n recent events involving the account. The manager of a large student union on campus comes to you with the following problem. (Note that certain jobs can begin before midnight and end after midnight. for every student not on the committee.). for the period between its start and end times.M. xn.M. the amounts of the transactions. if the job is accepted to run on the processor. Some security consultants wor~g in the financial domain are currently advising a client who is investigating a potential money-latmdering scheme. the speed of travel depends on the time of year. specified by (start-time.) and (1 P. they’ve also learned that extreme weather causes roads in this part of the world to become quite slow in the winter and may cause large travel delays. subject to the constraint that the processor can run at most one job at any given point in time. every day.-Monday 1I P. In the last day or so. but. You may assume for simplicity that no two jobs have the same start or end times. 7 P. She considers such a committee to be complete if. your goal is to accept as many jobs as possible (regardless of their length).M. The investigation thus far has indicated that n suspicious transactions took place in recent days.we can view each shift as a single contiguous interval of time. Then the smallest complete supervising committee would consist of just the second student.M. (In other words. (1 P. etc. Unfortunately. if the account event at time x~ is associated with the suspicious transaction that occurred approximately at time tj. every day. it must run conl~nuously.) Note that different transactions may have different margins of error.196 Chapter 4 Greedy Algorithms 15. (6 P.) Give an efficient algorithm that takes the given data and decides whether such an association exists.. She’s trying to choose a subset of these n students to form a supervising committee that she can meet with once a week. Your friends are planning an expedition to a small town deep in the Canadian north next winter break.. To see whether it’s plausible that this really is the account they’re looking for. helping with package delivery..).M. You have a processor that can operate 24 hours a day. In this way. If possible. each involving money transferred into a single account. however..M. 4 A. 4 A..M.M.). since the second shift overlaps both the first and the third.) Given a list of n such jobs. Provide an algorithm to do this with a running time that is polynomial in n.~1. 2 P.. There are different jobs associated with these shifts (tending the main desk. They’ve found an excellent travel Web site that can accurately predict how fast they’ll be able to trave_l along the roads.-Monday 8 P.). they’ve come across a bank account that (for other reasons we don’t need to go into here) they suspect might be the one involved in the crime. Monday 6 p. each of whom is scheduled to work one shift during the week. Example. they’re wondering 18. this makes for a type of situation different from what we saw in the Interval Scheduling Problem.x~l <_ e~. Example. the sketchy nature of the evidence to date means that they don’t know the identiW of the account.. each student’s performance can be observed by at least one person who’s serving on the committee. endtime) pairs. rebooting cranky information kiosks.M. People submit requests to run daily jobs on the processor.. Give an efficient algorithm that takes the schedule of n shifts and produces a complete supervising committee containing as few students as possible. Consider the fol!owing four jobs. What they do have is an approximate time-stamp for each transaction. and the shifts are Monday 4 p. 16. Each such job comes with a start time and an end time. There can be multiple shifts going on at once. for some "margin of error" ev (In other words..M. More precisely. the evidence indicates that transaction i took place at time ti ~: e~.).). 17.. 6 A. They’ve researched all the travel options and have drawn up a directed graph whose nodes represent intermediate destinations and edges represent the roads between them. it took place sometime between t~ . or the exact t~nes at which the transactions took place. which took place at times Xl. then Itj .-Monday 10 P..M.M. Suppose n = 3. that student’s shift overlaps (at least partially) the shift of some student who is on the committee. the Web site answers queries of the following form: given an . x2 . (3 A. Exercises 197 whether it’s possible to associate each of the account’s n events with a distinct one of the n suspicious transactions in such a way that. which can be scheduled without overlapping. Monday 9 P. they want to know if the activity on the account lines up with the suspicious transactions to within the margin of error. Consider the following variation on the Interval Scheduling Problem.. 7 P.ei and t~ + e~.M. you should make the running time be at most O(n2).M. The optimal solution would be to pick the two jobs (9 P. She’s in charge of a group of n students.M. In the course of this. the tricky part here is that they don’t know which account event to associate with which suspicious transaction.

198 Chapter 4 Greedy Algorithms Exercises Show that such a tree exists. they begin considering an even bolder second conjecture: In T. v in G. u ~ V. and so one of the network designers makes a bold suggestion: Maybe one can find a spanning tree T of G so that for every pair of nodes u. the unique u-v path in the tree actually attains the best achievable bottleneck rate for u. it should have the properW that (v. Other than that. and given a proposed starting time t from location u. they hit upon the following conjecture: The minimum spanning tree of G. that is. with respect to the edge weights ae. 20. where ee is the time needed to travel from the beginning to the end of edge e. and give an efficient algorithm to find one.) This idea is roundly heckled in the offices of CluNet for a few days. even if you could choose any u-v path in the whole graph. and more strongly. The highway crews are goIng to select a set E’ ~ E of the roads to keep dear through the winter. for every pair of towns i and j. and that all predictions made by the Web site are completely correct. E’) is a connected subgraph. while the goal of minimizing altitude seems to be asking for a fully different thing. people begin to suspect it could be possible. In the winter. For example. They have a connected graph G = (V. So each road--each edge e in the graph--is annotated with a number ue that gives the altitude of the highest point on the road.) Initially. b(P) = mine~p e. this conjecture is somewhat counterintuitive. 19. the predicted arrival time at w. A group of network designers at the communications company CluNet find themselves facing the following problem. we claimed that there is a unique minimum spanning tree when the edge weights are distinct. a path between towns i andj is declared tO be winter-optimal flit achieves the minimum possible height over a~ paths from i to j. of the value b(P). somewhere In a far-away mountainous part of the world. E’) is a minimum-altitude connected subgraph if it has this property. (You should assume that they start at time 0. the height of the winter-optimal path in (V. E’) should be no greater than it is In the fi~ graph G = (V. (In other words. the rest will be left unmaintained and kept off limits to travelers. We’ll assume that no two edges have exactly the same altitude value ae. the bottleneck rate of the u-v path in T is equal to the best achievable bottleneck rate for the pair u. sInce the minimum spanning tree is trying to minimize the sum of the values ae. they otherwise want to keep as few roads clear as possible. E). v in G is simply the maximum. the site will return a value fe(t). E) on this set of towns. . why should there be a single tree that simultaneously makes everybody happy? But after some failed attempts to rule out the idea. the county highway crews get together and decide which roads to keep dear through thecoming winter. and that fe(t) is a monotone increasing function of t (that is. each edge representing a road joining two of them. you do not arrive earlier by starting later).) Give a polynomial-time algorithm to do this. 199 edge e = (u. Each edge e is a communication link. Thus. and the road system can be viewed as a (connected) graph G = (V. The height of a path P in the graph is then the maximum of ae over all edges e on P. But lacking an argument to the contrary. There are n towns in this county. They all agree that whichever subset of roads E’ they decide to keep clear. where we treat a single query to the Web site (based on a specific edge e and a time t) as taking a single computational step. is a minimum-altitude connected subgraph. The Web site guarantees that re(t) >_ t for all edges e and all times t (you can’t travel backward in time). with a given available bandwidth by For each pair of nodes u. we would have fe(t) = t + ee. give an algorithm constructing a spanning tree T in which. v in G. in which the nodes represent sites that want to communicate. v. (In an earlier problem. it is okay for us to speak of the minimum spanning tree. w) connecting two sites v and w. Every September. however. in areas where the travel time does not vary with the season. E). for each u. over all u-v paths P in G. they want to select a single u-u path P on which this pair will communicate. thanks to the assumption that all ae are distinct. We’ll say that (V. The bottleneck rate b(V) of this p athbV is the minimumbandwidth of any edge it contains. Your friends want to use the Web site to determine the fastest way to travel through the directed graph from their starting point to their intended destination. the functions fe(t) may be arbitrary. That is. The best achievable bottleneck rate for the pair u. people are high enough up in the mountains that they stop worrying about the length of roads and start worrying about their altitude--this is really what determines how difficult the trip will be. It’s getting to be very complicated to keep track of a path for each pair of nodes. you couldn’t do better than the u-u path Given that they’re goIng to maintain ~s key property. v v. Fina~y. One year. and there’s a natural reason for the skepticism: each pair of nodes might want a very different-looking path to maximize its bottleneck rate.

and that d(p~. for all choices of G and distinct altitudes ae? Give a proof or a countere~xample with explanation.200 Chapter 4 Greedy Algorithms A subgraph (V. where n is a power of two. the smallest possible. it contains no directed cycles. We define a hierarchical metric onP to be any distance function r that can be constructed as fo]]ows. We want the leaves to be completely synchronized. described in Exercise 23. 22. E’) is a minimum-altitude connected subgraph if and only if it contains the edges of the minimum spanning tree. E). distance function d on the set P. If we achieve this. e belongs to some minimum-cost arborescence in G. Consider the tree in Figure 4. and returns a minimum spanning tree of G. Give an algorithm with running t~me O(n) that takes a near-tree G with costs on its edges. d is simply a function bn paJ_rs of points in P with the properties that d(p~. and the total edge length is 12. E). We’]] assume that the time it takes for the signal to reach a given leaf is proportional to the distance from the root to the leaf. Pi) > 0 ff i #j. 25. Here’s a simple model of such a timing circuit. and this is a big problem. Can we conclude that T itself must be a minimum-cost spanning tree in G? Give a proof or a counterexample with explanation. The distance from the root to a given leaf is the sum of the lengths of all the edges on the path from the root to the leaf. which is a positive number. and we associate with each node v of T (both leaves and internal nodes) a height hr. TimJ. so that all root-to-leaf paths have the same length (we’re not able to shrink edge lengths). with a cost ce >_ 0 on each edge. E) is a near-tree if it is connected and has at Figure 4. with a cost ce >_ 0 on each edge. pi) = 0 for each i. and all to receive the signal at the same time. Give an algorithm that increases the lengths of certain edges so that the resulting tree has zero skew and the total edge length is as sma]] as possible. 1 Exercises 201 Note that this second conjecture would immediately imply the first one.pi) = d(py. To make this happen..P2 . The resulting tree has zero skew. then the tree (with its new edge lengths) will be said to have zero skew.20. there can be many distinct minimumcost solutions. and an arborescence A c E with the guarantee that for every e ~ A. where the costs may not all be different. The root generates a clock signal which is propagated along the edges to the leaves. then the signal will not reach the leaves at the same time. e belongs to some minimum-cost spanning tree in G. Pn}. (a) Is the first conjecture true. Suppose we are given a directed acyclic graph G = (V. we will have to increase the lengths of certain edges. if all leaves do not have the same distance from the root. These heights must satisfy the properties that h(v) = 0 for each . where n = IVI. Let us say that a graph G = (V. E). We build a rooted tree T with n leaves. If the costs are not a~ distinct. most n + 8 edges. 24. since a minimum spanning tree contains its own edges. Can we conclude that A itself must be a minimum-cost arborescence in G? Give a proof or a counterexample with explanation. Example. (b) Is the second conjecture true. The unique optimal solution for ~s instance would be to take the three length-1 edges and increase each of their lengths to 2. there can in general be many distinct minimum-cost solutions.. Suppose we are given a set of points P = [Pl. Consider a complete balanced binary tree with n leaves. Here we will consider the case in which G is a directed acyclic graph--that is. As in general directed graphs.20 An instance of the zero-skew problem. Each edge e of the tree has an associated length ~e. Suppose we are given a spanning tree T c E with the guarantee that for every e ~ T. So here’s the question. Consider the Minimum Spanning Tree Problem on an undirected graph G = (V. You may assume that all the edge costs are distinct. in which letters name the nodes and numbers indicate the edge lengths. xplanation. Our goal is to achieve zero skew in a way that keeps the sum of all the edge lengths as small as possible.. Now. Recall the problem of computing a minimum-cost arborescence in a directed graph G = (V. for all choices of G and distinct altitudes at? Give a proof or a counterexample with e. together with a 23..ng circuits are a crucial component of VLSI chips. 21.

]). we have r(p~. The Minimum Spanning Tree Problem.. vn}. Give an algorithm that takes the graph G and the values {(ae. 27.1 of the edges will be owned by Y. and they have the following problem. of course. Also. we’ll denote this function ca(t). 30. q)} can be done in constant time per operation. and <(ii) ff ~’ is any other hierarchical metric consistent with d... or how to find one if it exists. then h(u) >_ h(v).Pi). One Can ask what happens when these two minimization issues are brought together. where ae > 0. ff V = {Ul. we can build a (large) graph 9~ as follows. and ff u is the parent of v in T.j. There is a positive weight w~i on each edge (i. So this is the problem they put to you: Give a polynomial-time algorithm that takes G. Their plan is to choose a spanning tree T of G and upgrade the links corresponding to the edges of T. You may assume that arithmetic operations on the numbers {(ae. tin. Your algorithm should run in time polynomial in the number of nodes and edges of the graph G.. Suppose we have a connected graph G = (V. Their business relations people have already concluded an agreement with companies X and Y stipulating a number k so that in the tree T that is chosen. on the other hand. with each edge labeled X or Y. then ~’(P~.. E) with n nodes. be. the cost of the minimum spanning tree of G becomes a function of the time t. This is a strategy that has been applied to many similar problems as well. for all pairs i. then the degree of u~ should be exactly dv) G should not contain multiple edges between the same pair of nodes.. we can consider the space of all possible spanning trees of a given graph and study the properties of this space. Each edge e is a fiber-optic cable that is owned by one of two companies-creatively named X and Y--and leased to CluNet. E) whose node degrees are precisely the numbers d~. Give a polynomial-time algorithm that takes the distance function d and produces a hierarchical metric ~ with the following properties. from any graph G. 29. be. We say that a hierarchical metric r is consistent with our distance function d if. show how to decide in polynomial time whether there exists an undirected graph G = (V.. E) be a graph with n nodes in which each pair of nodes is joined by an edge. for any connected graph G.. or provide an example (with explanation) of a connected graph G for which % is not connected.. Is it true that. 28. Now. ce) : e ~ E} and returns a value of the time t at which the minimum spanning tree has minimum cost.. Let G be a connected graph. We place each point in P at a distinct leaf in T. and either (i) returns a spanning tree with e~xactly k edges labeled X. and define ~(p~. Each edge e now has a timevarying edge cost given by a function fe :R-+R. or (ii) reports correctly that no such tree exists. v2 . and T"contains exactly one edge that is not in T. (i) ~ is consistent with d. where a > 0.Exercises 202 Chapter 4 Greedy Algorithms 203 leaf v. Pi) is defined as follows.Pi) r(p~.pl) _< d(p~. Suppose you’re a consultant for the networking company CluNet. . We’l! assume that all these functions are positive over their entire range. We determine the least common ancestor v in T of the leaves containing p~ and Pi. dn. d2 . it has cost re(t).k . For a subset V’ _ V. and there is an edge between two nodes of 9C if the corresponding spanning trees are neighbors. their distance ~(p~. Let G = (V. and we will assume these weights satisfy the triangle inequality tv~k <_ ra~i + Wik.. for how the minimum might be achieved--rather than a continuum of possibilities--and we are interested in how to perform the computation without having to exhaust this (huge) finite number of possibilities. The nodes of 9~ are the spanning trees of G. and the following question is an example of this. Given a list of n natural numbers all.pi) for each pair of points Pi and 26.. we will use G[V’] to denote the subgraph (with edge weights) induced on the nodes in V’. Observe that the set of edges constituting the minimum spanning tree of G may change over time. is a minimization problem of a very different flavor: there are now just a~ finite number of possibilities. Now. We say that T and T’ are neighbors if T contains exactly one edge that is not in T’. or "!oop" edges with both endpoints equal to the same node. k of the edges will be owned by X and n . One of the first things you learn in calculus is how to minimize a differentiable function such as y = ax2 + bx + c. the resulting graph ~ is connected? Give a proof that ~K is always connected. CluNet management now faces the following problem. In trying to understand the combinatorial StlXlcture of spanning trees. Here is one way to do this. The network that they’re currently working on is modeled by a connected graph G = (V. E). for any pair of points p~ and Pi. (That is. d2 . at time t. Suppose each function fe is a polynomial of degree 2: re(t) =aetz + bet + Ce. A natural problem then becomes: find a value of t at which cG(t) is minimized. Thus. and T and T’ two different spanning trees of G.. It is not at all clear to them whether there even exists a spanning tree T meeting these conditions.

E) with a root r ~ V and nonnegative costs on the edges. In Chapter 11 we will discuss greedy algorithms that find near-optimal approximate solutions. and they have been extensively studied. Let H = (V. provided that the addition of the edge reduces their shortest-path distance by a sufficient amount. see. and we want to find a spanning subgraph of it. we mean the sum of ~e over all edges e in P. over all n-node input graphs with edge lengths. we defined Yv to be the minimum cost of an edge entering ~. we can instead identify and contract strong components of this subgraph. (a) The algorithm discussed in Section 4. 33. we add e to the subgraph/4 if there is currently no a-v path in/4. Now suppose we are ~g to settle for a subgraph/4 = (V. We are given a connected. undirected graph G = (V. Prove that this T has cost at most twice the cost of the minimum-cost arborescence in the original graph. 31.9 works as follows. the length of the shortest u-v path in/4 is not much longer than the length of the shortest a-v path in G. for each pair of vertices a. consider the subgraph of zero-cost edges. the length of the shortest u-v path in H is at most three times the length of the shortest a-v path in G. Argue briefly that instead of looking for cycles. You are also given an integer k. F) be the. greedy algorithms have a long history and many applications throughout computer science. if there is a u-v path in/4.) o We then construct a subgraph H = (V. v ~ V. By the length of a path P here. Show that the problem of finding a minimum-weight Steiner tree on X can be solved in time Notes and Further Reading 205 32. for example. look for a directed cycle in this subgraph. E) with positive edge lengths {~e}. again. In this problem we consider variants of the ~umcost arborescence algorithm. or reports (correctly) that no such arborescence exists. E) In which each edge has a cost of either 0 or 1. Prove that this T has cost at most twice the cost of the minimum-cost arborescence in the original graph. (a) Prove that for evet3~ pair of nodes a. We add e to/4 ff 3~e < duvIn other words. * First we sort all the edges in order of increasing length. ce . Suppose we instead use the following modified cost: c~’ = max(0. Interval Scheduling can be viewed as a special case of the Independent Set Problem on a graph that represents the overlaps among a collection of intervals. subgraph of G returned by the algorithm. Also suppose that G has a node r such that there is a path from r to every other node in G. we let duv denote the length of the shortest such path. Graphs arising this way are called interval graphs. We say that a Steiner tree onX is a set Z so that X ~_ Z _ V.) On the other hand. length is with respect to the values {~e}.2y~).yr. In this chapter we focused on cases in which greedy algorithms find the optimal solution. Let f(n) denote the maximum number of edges that can possibly be produced as the out-put of this algorithm. F) that is "denser" than a tree. the book by Golumbic (1980). (b) Despite its ability to approximately preserve sh°rtest-p ath distances’ the subgraph/4 produced by the algorithm cannot be too dense. we add an edge even when a and v are already In the same connected component. and we modified the costs of all edges e entering node u to be c’e = ce . As discussed in Chapter 1. and we are interested in guaranteeing that. Not just Independent Set but many hard computational . v). (b) In the course of the algorithm. v ~ V. (This is what Kruskal’s Algorithm would do as well. Give a polynomial-time algorithm that either constructs an arborescence rooted at r of cost exactly k. (c) Assume you do not find an arborescence of 0 cost. Contract al! 0cost strong components and recursively apply the same procedure on the resttlting graph unti! an arborescence is found. Consider a directed graph G = (V. This new change is_likely to turn more edges to 0 cost. together with a spanning subtree T of G[Z]. Here’s a variant of Kruskal’s Algorithm designed to produce such a subgraph.204 Chapter 4 Greedy Algorithms We are given a set X _ V of k terminals that must be connected by edges. We modify the costs. The weight of the Steiner tree is the weight of the tree T. F) by considering each edge in order. Suppose now we find an arborescence T of 0 cost. Prove that Notes and Further Reading Due to their conceptual cleanness and intuitive appeal. and contract it (if one exists). (You may assume all edge lengths are distinct. Let’s go back to the original motivation for the Minimum Spanning Tree Problem. ¯ When we come to edge e = (u. Suppose you are given a directed graph G = (V. Greedy algorithms are also often used as simple heuristics even when they are not guaranteed to find the optimal solution.

206

Chapter 4 Greedy Mgofithms

.~otes and Further Reading

207

problems become much more tractable when restricted to the special case of interval graphs. Interval Scheduling and the problem of scheduling to minimize the maximum lateness are two of a range of basic scheduling problems for which a simple greedy algorithm can be shown to produce an optimal solution. A wealth of related problems can be found in the survey by Lawier, Lenstra, Rinnooy Kan, and Shmoys (1993). The optimal algorithm for caching and its analysis are due to Belady (1966). As we mentioned in the text, under real operating conditions caching algorithms must make eviction decisions in real time without knowledge of future requests. We will discuss such caching strategies in Chapter 13. The algorithm for shortest paths in a graph with nonnegative edge lengths is due to Dijkstra (1959). Surveys of approaches to the Minimum Spanning Tree Problem, together with historical background, can be found in the reviews by Graham and Hell (1985) and Nesetril (1997). The single-link algorithm is one of the most~widely used approaches to, the general problem of clustering; the books by Anderberg (1973), Duda, Hart, and Stork (2001), and Jaln and Dubes (1981) survey a variety of clustering techniques. The algorithm for optimal prefix codes is due to Huffman (1952); the earlier approaches mentioned in the text appear in the books by Fano (1949) and Shannon and Weaver (1949). General overviews of the area of data compression can be found in the book by Bell, Cleary, and Witten (1990) and the survey by Lelewer and Hirschberg (1987). More generally, this topic belongs to the area of information theory, which is concerned with the representation and encoding of digital information. One of the founding works in this field is the book by Shannon and Weaver (1949), and the more recent textbook by Cover and Thomas (1991) provides detailed coverage of the subject.. The algorithm for finding minimum-cost arborescences is generally credited to Chu and Liu (1965) and to Edmonds (1967) independently. As discussed in the chapter, this multi-phase approach stretches our notion of what constitutes a greedy algorithm. Itis also important from the perspective of linear programming, since in that context it can be viewed as a fundamental application of the pricing method, or the primal-dual technique, for designing algorithms. The book by Nemhauser and Wolsey (1988) develops these connections to linear program~ning. We will discuss this method in Chapter 11 in the context of approximation algorithms. More generally, as we discussed at the outset of the chapter, it is hard to find a precise definition of what constitutes a greedy algorithm. In the search for such a deflation, it is not even clear that one can apply the analogue

of U.S. Supreme Court Justice Potter Stewart’s famous test for obscenity-"I know it when I see it"--since one finds disagreements within the research community on what constitutes the boundary, even intuitively, between greedy and nongreedy algorithms. There has been research aimed at formalizing classes of greedy algorithms: the theory of matroids is one very influential example (Edmonds 1971; Lawler 2001); and the paper of Borodin, Nielsen, and Rackoff (2002) formalizes notions of greedy and "greedy-type" algorithms, as well as providing a comparison to other formal work on this quegtion. Notes on the Exercises Exercise 24 is based on results of M. Edahiro, T. Chao, Y. Hsu, J. Ho, K. Boese, and A. Kahng; Exercise 31 is based on a result of Ingo Althofer, Gantam Das, David Dobkin, and Deborah Joseph.

Chapter
Divide artd Cortquer

Divide and conquer refers to a class of algorithmic techniques in which one breaks the input into several parts, solves the problem ifi each part recursively, and then combines the solutions to these subproblems into an overall solution. In many cases, it can be a simple and powerful method. Analyzing the running time of a divide and conquer algorithm generally involves solving a recurrence relation that bounds the running time recursively in terms of the running time on smaller instances. We begin the chapter with a general discussion of recurrence relations, illustrating how they arise in the analysis and describing methods for working out upper bounds from them. We then illustrate the use of divide and conquer with applications to a number of different domains: computing a distance function on different rankings of a set of objects; finding the closest pair of points in the plane; multiplying two integers; and smoothing a noisy signal. Divide and conquer will also come up in subsequent chapters, since it is a method that often works well when combined with other algorithm design techniques. For example, in Chapter 6 we will see it combined with dynamic programming to produce a space-efficient solution to a fundamental sequence comparison problem, and in Chapter 13 we will see it combined with randomization to yield a simple and efficient algorithm for computing the median of a set of numbers. One thing to note about many settings in which divide and conquer is applied, including these, is that the natural brute-force algorithm may already be polynomial time, and the divide and conquer strategy is serving to reduce the running time to a !ower polynomial. This is in contrast to most of the problems in the previous chapters, for example, where brute force .was exponential and the goal in designing a more sophisticated algorithm was to achieve any kind of polynomial running time. For example, we discussed in

210

Chapter 5 Divide and Conquer

5.1 A First Recurrence: The Mergesort Algorithm

211

Chapter 2 that the natural brute-force algorithm for finding the closest pair among n points in the plane would simply measure all ® (n2) distances, for a (polynomial) running time of O(n2). Using divide and conquer, we wi!! improve the running time to O(n log n). At a high level, then, the overall theme of this chapter is the same as what we’ve been seeing earlier: that improving on brute-force search is a fundamental conceptual hurdle in solving a problem efficiently, and the design of sophisticated algorithms can achieve this. The difference is simply that the distinction between brute-force search and an improved solution here will not always be the distinction between exponential and polynomia!.

(5.1)

For some constant c, T(n) < 2T(n/2) + cn

when n > 2, and
T(2) _< c.

5.1 A First Recurrence: The Mergesort Algorithm
To motivate the general approach to analyzing divide-and-conquer algorithms, we begin with the Mergesort Algorithm. We discussed the Mergesort Algorithm briefly in Chapter 2, when we surveyed common running times for algorithms. Mergesort sorts a given list of numbers by first diviiting them into two equal halves, sorting each half separately by recursion, and then combining the results of these recursive calls--in the form of the two sorted halves--using the linear-time algorithm for merging sorted lists that we saw in Chapter 2. To analyze the running time of Mergesort, we will abstract its behavior into the following template, which describes many common divide-and-conquer algorithms. (Q Divide the input into two pieces of equal size; solve the two subproblems on these pieces separately by recursion; and then combine the two results into an overall solution, spending only linear time for the initial division and final recombining.

The structure of (5.1) is typical of what recurrences will look like: there’s an inequality or equation that bounds T(n) in terms of an expression involving T(k) for sma!ler values k; and there is a base case that generally says that T(n) is equal to a constant when n is a constant. Note that one can also write (5.1) more informally as T(n)< 2T(n/2)+ O(n), suppressing the constant c. However, it is generally useful to make c explicit when analyzing the recurrence. To keep the exposition simpler, we will generally assume that parameters like n are even when needed. This is somewhat imprecise usage; without this assumption, the two recursive calls would be on problems of size In/2] and [n/2J, and the recurrence relation would say that
T(n) < T([n/2]) + T(Ln/2J) + cn

for n > 2. Nevertheless, for all the recurrences we consider here (and for most that arise in practice), the asymptotic bounds are not affected by the decision to ignore all the floors and ceilings, and it makes the symbolic manipulation much cleaner.
Now (5.1) does not exphcitly provide an asymptotic bound on the growth rate of the function T; rather, it specifies T(n) implicitly in terms of its values on smaller inputs. To obtain an explicit bound, we need to solve the recurrence relation so that T appears only on the left-hand side of the inequality, not the fight-hand side as well. Recurrence solving is a task that has been incorporated into a number of standard computer algebra systems, and the solution to many standard recurrences can now be found by automated means. It is still useful, however, to understand the process of solving recurrences and to recognize which recurrences lead to good running times, since the design of an efficient divideand-conquer algorithm is heavily intertwined with an understanding of how a recurrence relation determines a running time.

In Mergesort, as in any algorithm that fits this style, we also need a base case for the recursion, typically having it "bottom out" on inputs of some constant size. In the case of Mergesort, we will assume that once the input has been reduced to size 2, we stop the- recursion and sort the two elements by simply comparing them to each other. Consider any algorithm that fits the pattern in (J-), and let T(n) denote its worst-case running time on input instances of size n. Supposing that n is even, the algorithm spends O(n) time to divide the input into two pieces of size n/2 each; it then spends time T(n/2) to solve each one (since T(n/2) is the worstcase nmning time for an input of size n/2); and finally it spends O(n) time to combine the solutions from the two recursive calls. Thus the running time T(n) satisfies the following recurrence relation.

Approaches to Solving Recurrences
There are two basic ways one can go about solving a recurrence, each of which we describe in more detail below.

212

Chapter 5 Divide and Conquer

5.1 A First Recurrence: The M~rgesort Algorithm

213

The most intuitively natural way to search for a solution to a recurrence is to "unroll" the recursion, accounting for the running time across the first few levels, and identify a pattern that can be continued as the recursion expands. One then sums the running times over all levels of the recursion (i.e., until it "bottoms out" on subproblems of constant size) and thereby arrives at a total running time. A second way is to start with a guess for the solution, substitute it into the recurrence relation, and check that it works. Formally, one justifies this plugging-in using an argument by induction on n. There is a useful variant of this method in which one has a general form for the solution, but does not have exact values for all the parameters. By leaving these parameters unspecified in the substitution, one can often work them out as needed. We now discuss each of these approaches, using the recurrence in (5.1) as an example.

Idengfying a pattern: What’s going on in general? At level j of the recursion, the number of subproblems has doubled j times, so there are now a total of 2J. Each has correspondingly shrunk in size by a factor of two j times, and so each has size n/2J, and hence each takes time at most cn/2J. Thus level j contributes a total of at most 2~(cn/2~) = cn to the total running time. Summing over all levels of recursion: We’ve found that the recurrence in (5.1) has the property that the same upper bound of cn applies to total amount Of work performed at each level. The number of times the input must be halved in order to reduce its size from n to 2 is log2 n. So summing the cn work over log n levels of recursion, we get a total running time of O(n log n).
We summarize this in the following claim. (5.2) Any function T(.) satisfying (5.1) is bounded by O(n log n), when n>l.

Unrolling the Mergesort Recurrence Let’s start with the first approach to solving the recurrence in (5.1). The basic argument is depicted in Figure 5.1. o Analyzing the first few levels: At the first level of recursion, we have a single problem of size n, which takes time at most cn plus the time spent in all subsequent rect~sive calls. At the next level, we have two problems each of size n/2. Each of these takes time at most cn/2, for a total of at most cn, again plus the time in subsequent recursive calls. At the third level, we have four problems each of size n/4, each taking time at most cn/4, for a total of at most cn.
Level 0: cn

Substituting a Solution into the Mergesort Recurrence
The argument establishing (5.2) can be used to determine that the function T(n) is bounded by O(n log n). If, on the other hand, we have a guess for the running time that we want to verify, we can do so by plugging it into the recurrence as follows. Suppose we believe that T(n) < cn log2 n for all n > 2, and we want to check whether this is indeed true. This clearly holds for n = 2, since in this case cnlog2 n = 2c, and (5.1) explicitly tells us that T(2) < c. Now suppose, by induction, that T(m) <_ cm log2 m for all values of m less than n, and we want to establish this for T(n). We do this by writing the recurrence for T(n) and plugging in the inequality T(n/2) <_ c(n/2) log2(n/2). We then simplify the resulting expression by noticing that log2(n/2) = (log2 n) - 1. Here is the ftfll calculation. T(n) < 2T(n/2) + cn < 2c(n/2) loga(n/2) + cn = cn[(log2 n) - 1] + cn = (cn log2 n) - cn + cn = cn log2 n. This establishes the bound we want for T(n), assuming it holds for smaller values m < n, and thus it completes the induction argument.

Level 1:crt/2 + crt/2 = cn total

Level 2: 4(cn/4) = cn total

Figure 5.1 Unrolling the recurrence T(n) < 2T(n/2) + O(n).

214

Chapter 5 Divide and Conquer

5.2 Further Recurrence Relations

215

An Approach Using Partial Substitution There is a somewhat weaker kind of substitution one can do, in which one guesses the overall form of the solution without pinning down the exact values of all the constants and other parameters at the outset. Specifically, suppose we believe that T(n)= O(n log n), but we’re not sure of the constant inside the O(-) notation. We can use the substitution method even without being sure of this constant, as follows. We first write T(n) <_ kn logb n for some constant k and base b that we’ll determine later. (Actually, the base and the constant we’ll end up needing are related to each other, since we saw in Chapter 2 that one can change the base of the logarithm by simply changing the multiplicative constant in front.) Now we’d like to know whether there is any choice of k and b that wiJ! work in an inductive argument. So we try out one level of the induction as follows. T(n) < 2T(n/2) + cn < 2k(n/2) lOgb(n/2) + cn. It’s now very tempting to choose the base b = 2 for the logarithm, since we see that this wil! let us apply the simplification logz(n/2) = (log2 n) - 1. Proceeding with this choice, we have T(n) <_ 2k(n/2) log2(n/2) + cn = 2k(n/2) [(log2 n) - 1] + cn = krt[(log2 n) - 1] + cn = (kn log2 n) - kn + cn. Finally, we ask: Is there a choice of k that will cause this last expression to be bounded by kn log2 n? The answer is clearly yes; we iust need to choose any k that is at least as large as c, and we get T(n) < (kn log2 n) - kn + cn <_ kn !og2 n, which completes the induction. Thus the substitution method can actually be usefl~ in working out the exact constants when one has some guess of the general form of the solution.

This more general class of algorithms is obtained by considering divideand-conquer algorithms that create recursive calls on q subproblems of size n/2 each and then combine the results in O(n) time. This corresponds to the Mergesort recurrence (5.1) when q = 2 recursive calls are used, but other algorithms find it useful to spawn q > 2 recursive calls, or just a single (q = 1) recursive call. In fact, we will see the case q > 2 later in this chapter when we design algorithms for integer multiplication; and we will see a variant on the case q = 1 much later in the book, when we design a randomized algorithm for median finding in Chapter 13.
If T(n) denotes the nmning time of an algorithm designed in this style, then T(n) obeys the following recurrence relation, which directly generalizes (5.1) by replacing 2 with q: (5.3) For some constant c, T(n) <_ qT(n/2) + cn when n > 2, and
T(2) < c.

We now describe how to solve (5.3) by the methods we’ve seen above: unrolling, substitution, and partial substitution. We treat the cases q > 2 and q = 1 separately, since they are qualitatively different from each other--and different from the case q = 2 as well. The Case of q > 2 Subproblems We begin by unro!ling (5.3) in the case q > 2, following the style we used earlier for (5.1). We will see that the punch line ends up being quite different. Analyzing the first few levels: We show an example of this for the case q = 3 in Figure 5.2. At the first level of recursion, we have a single problem of size n, which takes time at most cn plus the time spent in all subsequent recursive calls. At the next level, we have q problems, each of size n/2. Each of these takes time at most cn/2, for a total of at most (q/2)cn, again plus the time in subsequent recursive calls. The next level yields q2 problems of size n/4 each, for a total time of (q2/4)cn. Since q > 2, we see that the total work per level is increasing as we proceed through the recursion.
Identifying apattern: At an arbitrary levelj, we have qJ distinct instances, each of size n/2]. Thus the total work performed at level j is qJ(cn/2]) = (q/2)icn.

5.2 Further Recurrence Relations
We’ve just worked out the solution to a recurrence relation, (5.1), that will come up in the design of several divide-and-conquer algorithms later in this chapter. As a way to explore this issue further, we now consider a class of recurrence relations that generalizes (5.1), and show how to solve the recurrences in this class. Other members of this class will arise in the design of algorithms both in this and in later chapters.

216
cn time plus recursive calls

Chapter 5 Divide and Conquer

5.2 Further Recurrence Relations

217

Level O: cn total

(5.4)

Any function T(.) satisfying (5.3) with q > 2 is bounded by O(nl°ga q).

Level 1:cn/2 + cn/2 + cn/2 = (3/2)cn total

So we find that the running time is more than linear, since log2 q > !, but still polynomial in n. Plugging in specific values of q, the running time is O(nl°g~ 3) = O(nl.sg) when q = 3; and the running time is O(nl°g~ 4) = O(n2) when q = 4. This increase in running time as q increases makes sense, of course, since the recursive calls generate more work for larger values of q. Applying Partial Substitution The appearance of log2 q in the exponent followed naturally from our solution to (5.3), but it’s not necessarily an expression one would have guessed at the outset. We now consider how an approach based on partial substitution into the recurrence yields a different way of discovering this exponent. Suppose we guess that the solution to (5.3), when q > 2, has the form T(n) <_ kna for some constants k > 0 and d > 1. This is quite a general guess, since we haven’t even tried specifying the exponent d of the polynomial. Now let’s try starting the inductive argument and seeing what constraints we need on k and d. We have T(n) <_ qT(n/2) + cn, and applying the inductive hypothesis to T(n/2), this expands to + cn

Level 2: 9(cn/4) = (9/4)cn total

Figure 5.2 Unrolling the recurrence T(n) < 3T(n/2) + O(rt).

Summing over all levels of recursion: As before, there are log). n levels of recursion, and the total amount of work performed is the sum over-all these:

This is a geometric sum, consisting of powers of r = q/2. We can use the formula for a geometric sum when r > 1, which gives us the formula r(n) <_cn \ r!.1) < cn rlogz n_ Since we’re aiming for an asymptotic upper bound, it is useful to figure out what’s simply a constant; we can pull out the factor of r - 1 from the denominator, and write the last expression as T(n) <_
nrlog2 n

= q, knd + cn. 2a

Finally, we need to figure out what rl°g2 n is. Here we use a very handy identity, which says that, for any a > 1 and b > 1, we have al°g b = blOg a Thus
rlog2 n _~_ nlog2 r = nlog2(q/2) = n(logz q)-l.

This is remarkably close to something that works: if we choose d so that q/2d = 1, then we have T(n) < knd + cn, which is almost right except for the extra term cn. So let’s deal with these two issues: first, how to choose d so we get q/2a = 1; and second, how to get rid of the cn term. Choosing d is easy: we want 2d = q, and so d = log2 q. Thus we see that the exponent log2 q appears very naturally once we dec.ide to discover which value of d works when substituted into the recurrence.
But we still have to get rid of the cn term. To do this, we change the form of our guess for T(n) so as to explicitly subtract it off. Suppose we try the form T(n) <_ kna - gn, where we’ve now decided that d = log2 q but we haven’t fixed the constants k or g. Applying the new formula to T(n/2), this expands to

Thus we have T(n) <_
n ¯ n(l°g2 q)--I <_

nlog2 q = O(nl°g2 q).

We sum this up as follows.

218

Chapter 5 Divide and Conquer

5.2 Further Recurrence Relations

219

+ cn q. knd - q~ n + cn 2a 2
= knd -- q~’n + cn

cn time, plus recursive calls

Level 0: cn total

~

/~ Level 1:cn/2 total

2

= l~na - (~ - c)n. This now works completely, if we simply choose ~ so that (~ - c) = ~: in other words, ~ = 2c/(q - 2). This completes the inductive step for n. We also need to handle the base case n = 2, and this we do using the fact that the value of k has not yet been fixed: we choose k large enough so that the formula is a valid upper bound for the case n = 2.

~/~ Level2:cn/4total Figure 5.3 Unrolling the recurrence T(n) <_ T(n/2) + O(n).

We sum this up as follows.

The Case of One Subproblem
We now consider the case of q = I in (5.3), since this illustrates an outcome’ of yet another flavor. While we won’t see a direct application of the recurrence for q = 1 in this chapter, a variation on it comes up in Chapter 13, as we mentioned earlier. We begin by unrolling the recurrence to try constructing a solution. Analyzing the first few levels: We show the first few levels of the recursion in Figure 5.3. At the first level of recursion, we have a single problem of size n, which takes time at most cn plus the time spent in all subsequent recursive calls. The next level has one problem of size n/2, which contributes cn/2, and the level after that has one problem of size n/4, which contributes cn/4. So we see that, unlike the previous case, the total work per leve! when q = 1 is actually decreasing as we proceed .through the recursion. Identifying a pattern: At an arbitrary level j, we st£1 have just one instance; it has size n/21 and contributes cn/21 to the running time. Summing over all levels of recursion: There are log2 n levels of recursion, and the total amount of work performed is the sum over all these:
l°g2 n-1cn l°g2 n-1 (~./

This is counterintuitive when you first see it. The algorithm is performing log n levels of recursion, but the overall running time is still linear in n. The point is that a geometric series with a decaying exponent is a powerfl~ thing: fully half the work performed by the algorithm is being done at the top level of the recursion. It is also useful to see how partial substitution into the recurrence works very well in this case. Suppose we guess, as before, that the form of the solution is T(n) <_ kna. We now try to establish this by induction using (5.3), assuming that the solution holds for the smaller value n/2: T(n) <_ T(n/2) + cn +cn
~ ~d nd -}- on.

If we now simply choose d = ! and k = 2c, we have k T(n) <_ ~n + cn = (-~ + c)n = kn, which completes the induction. The Effect of the Parameter q. It is worth reflecting briefly on the role of the parameter q in the class of recurrences T(n) <_ qT(n/2) + O(n) defined by (5.3). When q = 1, the restflting running time is linear; when q = 2, it’s O(n log n); and when q > 2, it’s a polynomial bound with an exponent larger than I that grows with q. The reason for this range of different running times lies in where

T(n) <

-~- =cn 1=o E

"

This geometric sum is very easy to work out; even if we continued it to infinity, it would converge to 2. Thus we have T(n) <_ 2cn = O(n).

220

Chapter 5 Divide and Conquer

5.3 Counting Inversions

221

most of the work is spent in the recursion: when q = 1, the total running time is dominated by the top level, whereas when q > 2 it’s dominated by the work done on constant-size subproblems at the bottom of the recursion. Viewed this way, we can appreciate that the recurrence for q = 2 really represents a "knifeedge"--the amount of work done at each level is exactly the same, which is what yields the O(n log n) running time. A Related Recurrence: T(n) <_ 2T(n/2) + O(n2) We conclude our discussion with one final recurrence relation; it is illustrative both as another application of a decaying geometric sum and as an interesting contrast with the recurrence (5.1) that characterized Mergesort. Moreover, we wil! see a close variant of it in Chapter 6, when we analyze a divide-andconquer algorithm for solving the Sequence Alignment Problem using a small amount of working memory. The recurrence is based on the following divide-and-conquer structure. Divide the input into two pieces of equal size; solve the two subproblems, on these pieces separately by recursion; and then combine the two results into an overall solution, spending quadratic time for the initial division and final recombining. For our proposes here, we note that this style of algorithm has a running time T(n) that satisfies the fo!lowing recurrence. (5.6) For some constant c, T(n) <_ 2T(n/2) + cn2 when n > 2, and
T(2) < c.

total of at most cn2/2, again plus the time in subsequent recursive calls. At the third level, we have four problems each of size n/4, each taking time at most c(n/4)2 = cn2/16, for a total of at most cn2/4. Already we see that something is different from our solution to the analogous recurrence (5.!); whereas the total amount of work per level remained the same in that case, here it’s decreasing. Identifying a pattern: At an arbitrary level j of the recursion, there are 2J subproblems, each of size n/2J, and hence the total work at this level is bounded by 2Jc@)2 = cn2/2j. Summing over all levels of recarsion: Having gotten this far in the calculation, we’ve arrived at almost exactly the same sum that we had for the. case q = 1 in the previous recurrence. We have T(n) <_
log~-I

j=0

cn2 1og2 n~-I (~.) : = cn2 23 j=0

< 2cn2 = O(n2),

where the second inequality follows from the fact that we have a convergent geometric sum. In retrospect, our initial guess of T(n) = O(n2 log n), based on the analogy to (5.1), was an overestimate because of how quickly n2 decreases as we replace it with (~)2, (n)2,~ (~)~, and so forth in the unrolling of the recurrence. This means that we get a geometric sum, rather than one that grows by a fixed amount over all n levels (as in the solution to (5.1)).

5.3 Counting Inversions
We’ve spent some time discussing approaches to solving a number of common recurrences. The remainder of the chapter will illustrate the application of divide-and-conquer to problems from a number of different domains; we will use what we’ve seen in the previous sections to bound the running times of these algorithms. We begin by showing how a variant of the Mergesort technique can be used to solve a problem that is not directly related to sorting numbers.

One’s first reaction is to guess that the solution will be T(n) = O(n2 log n), since it looks almost identical to (5.1) except that the amount of work per level is larger by a factor equal to the input size. In fact, this upper bound is correct (it would need a more careful argument than what’s in the previous sentence), but it will turn out that we can also show a stronger upper bound. We’ll do this by unrolling the recurrence, following the standard template for doing this. o Analyzing the first few levels: At the first level of recursion, we have a single problem of size n, which takes time at most cn2 plus the time spent in al! subsequent recursive calls. At the next level, we have two problems, each of size n/2. Each of these takes time at most c(n/2)2 = cn2/4, for a

~ The Problem
We will consider a problem that arises in the analysis of rankings, which are becoming important to a number of current applications. For example, a number of sites on the Web make use of a technique known as collaborative filtering, in which they try to match your preferences (for books, movies, restaurants) with those of other people out on the Internet. Once the Web site has identified people with "similar" tastes to yours--based on a comparison

By our previous discussion. am and ara+l . We then draw a sequence 2. an inversion. To help with counting the number of inversions between the two halves. We want to produce a single sorted list C from their union. and a completely reversed ranking is very different. But what’s a good way to measure. we will assume that all the numbers are distinct. which formed the corresponding "combining" step for Mergeso. Note that these first-half/second-half inversions have a particularly nice form: they are precisely the pairs (a. We want to define a measure that tells us how far this list is from being in ascending order. b) where a ~ A. aj). We say that two indices i < j form an inversion if ai > aj. 1). art. It turns out that we will be able to do this in very much the same style that we used for merging. respectively. 1. and (4. Another application arises in recta-search tools on the Web. and should increase as the numbers become more scrambled. this would take O(n2) time. and below that in ascending order.. aj).. (4. We set m = [n/2] and divide the list into the two pieces a~ . but we should also count the number of "inverted pairs" (a. aj) and determine whether they constitute an inversion. 1). then order these labels according to the stranger’s ranking. if we want to apply (5." More concretely. that is.4 Counting the number of inversions in the order they’re p~ovided. So the crucial routine in this process is Nerge-and-Cotmt. and a > b. while also counting the number of pairs (a. then every pair forms an inversion. 5.rt: there we had two sorted lists A and B. You rank a set of rt movies. the trick is that we must do this part in O(n) time..3 Counting Inversions 223 of how you and they rate various things--it can recommend new things that these other people have liked. We are given a sequence of rt numbers art. 4. 4. b ~ B. and ai > aj. There is also an appealing geometric way to visualize the inversions. and we wanted to merge them into a single sorted list in O(n) time. Then we count the number of inversions (az. ~ Designing and Analyzing the Algorithm What is the simplest algorithm to count inversions? Clearly. Each crossing pair of line segments corresponds to one pair that is in the corresponds to one pair that is in the opposite order in opposite order in the two lists--in other words. 2 5 Just to pin down this definition. A natural method would be to label the movies from 1 to n according to your ranking. which execute the same query on many different search engines and then try to synthesize the results by looking for similarities and differences among the various rankings that the search engines return. showing our current position. 3.. We now have two sorted lists A and B.. where the two numbers belong to different halves. numerically. A core issue in applications like this is the problem of comparing two rankings. This is closely related to the simpler problem we discussed in Chapter 2. we have a Current pointer into each list. an. and then a collaborative filtering system consults its database to look for other people who had "similar" rankings. b ~ B. 1. The basic idea is to follow the strategy (]-) defined in Section 5. Having the recursive step do a bit more work (sorting as wel! as counting inversions) will make the "combining" portion of the algorithm easier. we want something that interpolates through the middle region. such an algorithm must be able to compute the total number without ever looking at each inversion individually. Each line segment between each number in the top list and its copy in the lower crossing pair of line segments list. We now show how to count the number of inversions much more quickly. Suppose that these pointers are currently Note how the number of inversions is a measure that smoothly interpolates between complete agreement (when the sequence is in ascending order. Our Merge-and-Count routine will walk through the sorted lists A and B. the value of the measure should be 0 if al < a2 < ¯ ¯ ¯ K an.. removing elements from the front and appending them to the sorted list C. 3). The difference here is that we want to do something extra: not only should we produce a single sorted list from A and B. in O(n log n) time." We will seek to determine the number of inversions in the sequence a~ .. the input list and the ascending list--in other words. where ai is in the first half. we could look at ~very pair of numbers (ai.. and so there are (~) of . we will make the algorithm recursively sort the numbers in the two halves as well. if the two elements ai and aj are "out of order. containing the first and second halves. Suppose we have recursively sorted the first and second halves of the list and counted the inversions in each. We first count the number of inversions in each of these two halves separately. b) with a ~ A.. Let’s consider comparing your ranking and a stranger’s ranking of the same set of n movies. this is precisely what we will need for the "combining" step that computes the number of first-half/second-half inversions. consider an example in which the sequence is 2. an inversion.222 Chapter 5 Divide and Conquer 5. how similar two people’s rankings are? Clearly an identical ranking is very similar. and a > b. In a given step. A natural way to quantify this notion is by counting the number of inversions. pictured in Figure 5.. Note that since there can be a quadratic number of inversions. There are three inversions in this sequence: (2..1.. we will consider the following problem.4: we draw the sequence of input numbers in the Figure 5. and see how many pairs are "out of order... aj is in the second half. 5.. then there are no inversions) and complete disagreement (if the sequence is in descending order. 3.2).

A) = Sort-and-Count (A) (rB. and the sorted list L at elements. remove the smaller one from its list.4 Finding the losest Pair of Points We now describe another problem that can be solved by an algorithm in the style we’ve been discussing.224 Chapter 5 Divide and Conquer Elements inverted with by < ai ]A ]B Figure 5. the rimming time T(n) of the full Sort-and-Count procedure satisfies the recurrence (5. and it comes after all of them. and append it to the end of list C. To summarize. we have accounted for a potentially large number of inversions. In one step. initialized to 0 While both lists are nonempty: Let ai and ~ be the elements pointed to by the Cuwent pointer Append the smaller of these two to the output list If ~ is the smaller element then Increment Count by the number of elements remaining in A Since our Merge-and-Count procedure takes O(n) time. Merge-and-Count(A. if bI is appended to list C. 5. EndWhile 5.5 for an illustration of this process. initialized to point to the front elements Maintain a variable Count for the number of inversions. Sort-and-Count (L) If the list has one element then there are no inversions Else Divide the list into two halves: A contains the first [rt/2] elements B contains the remaining [n/2J elements (rA. then it is smaller than all the remaining items in A. On the other hand. B) Endif Return r=rA+rB+r. it runs in O(n log n) time for a list with n elements: Endif Advance the Cu~ent pointer in the list from which the smaller element was selected.7) The Sort-and-Count algorithm correctly sorts the input list and counts the number of inversions. we compare the elements ai and by being pointed to in each list.B) Maintain a Cuwent pointer into each list. Every time the element a~ is appended to C. L) = Merge-and-Count (A.2). append the remainder of the other list to the output Return Count and the merged list The rimming time of Merge-and-Count can be bounded by the analogue of the argument we used for the original merging algorithm at the heart of Mergesort: each iteration of the While loop takes constant time. no new inversions are encountered. but finding the right way to "merge" the solutions to the two subproblems it generates requires quite a bit of ingenuity. This is the crucial idea: in constant time. it is actually very easy to keep track of the number of inversions we encounter.4 Finding the Closest Pair of Points 225 0nce one list is empty. See Figure 5.1). ai and bi.5 Merging two sorted fists while also counting the number of inversions between them. We use this Merge-and-Count routine in a recursive procedure that simultaneously sorts and counts the number of inversions in a list L.~ Because A and B are sorted. and in each iteration we add some element to the output that will never be seen again. By (5. Thus the number of iterations can be at most the sum of the initial lengths of A and B. B) = Sort-and-Count (B) (r. How do we also count the number of inversions. . since a~ is smaller than everything left in list B. and it comes before all of them. we have the following algorithm. we have (S. so we increase our count of the number of inversions by the number of elements remaining in A. This takes care of merging. and so the total running time is O(n).

since it is much simpler and the contrasts are revealing. and for two points Pi.4 Finding the Closest Pair of Points 227 ~ The Problem The problem we consider is very simple to state: Given rt points in the plane. find the pair that is closest together. The first level of recursion will work as follows. in O(n log n) time. geographic information systems. I. Pn}. We define O to be the set of points in the first In/2] positions of the list Px (the "left half") and R to be the set of points in the final [n/2J positions of the list Px (the "right half"). It took quite a long time before they resolved this question. In two dimensions. we sort all the points in P by xcoordinate and again by y-coordinate. The problem was considered by M. . producing lists Px and Py.1) will give us an O(n log n) running time. and they have found their way into areas such as graphics. or by slightly extending the algorithm we develop here. in which all the points in P’ have been sorted by increasing y-coordinate.6. In fact. on a set P’ c_ p. We can ensure that this remains true throughout the algorithm as follows. our solution will be complete: it will be the smallest of the values computed in the recursive calls and this minimum "left-to-right" distance. P and the closest pair among the points in the "right half" of P. And although the closest-pair problem is one of the most natural algorithmic problems in geometry.t~ in which a~ the points in P’ have been sorted by increasing xcoordinate. Yi). Instead. before any of the recursion begins. yet we need to find the smallest one in O(n) time after the recursive calls return. we wi!l see that it is possible to further improve the running fim~ to O(n) using randomization. preventing us from adapting our one-dimensional approach. and the O(n log n) algorithm we give below is essentially the one they discovered. If we develop an algorithm with this structure. But it is easy to construct examples in which they are very far apart. when we return to this problem in Chapter 13. and then we use this information to get the overall solution in linear time.. we use d(p~. we can create the Q O O Lim L o o o o Figure 5. It is easy to see that one of these distances must be the minimum one.. there are S2 (n2) such distances. Attached to each entry in each list is a record of the position of that point in both lists. Hoey in the early 1970s. our plan will be to apply the style of divide and conquer used in Mergesort: we find the closest pair among the points in the "left half" of First. then the solution of our basic recurrence from (5. pj) to denote the standard Euclidean distance between them. it is sm~risingly hard to find an efficient algorithm for it. It will be very useful if every recursive call. /¢::~ Designing the Algorithm We begin with a bit of notation. It is immediately clear that there is an O(n2) solution--compute the distance between each pair of points and take the minimum--and so Shamos and Hoey asked whether an algorithm asymptotically faster than quadratic could be found. computing the distance from each point to the one that comes after it. as part of their proiect to work out efficient algorithms for basic computational primitives in geometry.6 The first level of recursion: The point set P is divided evenly into Q and R by the line L. We will assume that no two points in P have the same x-coordinate or the same y-coordinate. Setting Up the Recursion Let’s get a few easy things out of the way first.. and the closest pair is found on each side recursively.226 Chapter 5 Divide and Conquer 5. and then we’d walk through the sorted list. How would we find the closest pair of points on a line? We’d first sort them. begins with two lists: a list p. This makes the discussion cleaner. By a single pass through each of Px and Py. It’s instructive to consider the one-dimensional version of this problem for a minute. p1). in O(n) time. with all further levels working in a completely analogous way. Pj E P. These algorithms formed the foundations of the then-fledgling field of compatational geometry. "combining" phase of the algorithm that’s tricky: the distances that have not been considered by either of our recursive calls are precisely those that occur between a point in the left half and a point in the right half. Our goal is to find a pair of points pi. Let us denote the set of points by P = {Pl .. It is the last. Shamos and D. where Pi has coordinates (x. we could try sorting the points by their y-coordinate (or x-coordinate) and hoping that the two closest points were near one another in the order of this sorted list. and a list P.. and molecular modeling. computer vision. If we can do this. and it’s easy to eliminate this assumption either by initially applying a rotation to the points that makes it ~e. pl that minimizes d(pi. See Figure 5.

Thus each box contains at most one point of S. [] We note that the value of 15 can be reduced. Here is a simple fact. we can restrict our search to the narrow band consisting only of points in P within 8 of L. r) < 87 If not. or (if) the (correct) conclusion that no pairs of points in S are within ~ of each other.d(q. then we have already found the closest pair in one of our recursive calls.rx . this pair is the closest pair in P. In case (i). we can construct Sy in O(n) time. r) < a if and only if there exist s. in case (if). s’ ~ S have the property that d(s. Now suppose that s. as the following amazing fact shows. since there can be at most one point per box. but for our purposes at the moment. But it still leaves us with the problem that we saw looming originally: How do we use the solutions to the two subproblems as part of a linear-time "combining" operation? d(qo.10) implies that in doing so. there are at least three rows of Z lying between s and s’. Qy. This line L "separates" Q from R. if their distance is less than 8.) Line L (5. s’ ~ S for which d(s. in terms of the set S.8) and (5. Let x* denote the x-coordinate of the rightmost point in Q. So having done this. we compute its distance to each of the next 15 points in Sy. so each of q and r has an x-coordinate within ~ of x* and hence lies within distance a of the line L. We now recursively determine a closest pair of points in Q (with access to the lists Qx and Qy).7. we record the position of the point in both lists it belongs to. consisting of the points in Q sorted by increasing xcoordinate. and let L denote the vertical line described by the equation x = x*. ry). For each entry of each of these lists.7 The portion of the plane dose to the dividing line L. which tried to make one pass through P in order Boxes ~ Figure 5. Proof. [] So if we want to find a close q and r. But any two points in Z separated by at least three rows must be a distance of at least 38/2 apart--a contradiction. We partition Z into boxes: squares with horizontal and vertical sides of length 8/2. Note the resemblance between this procedure and the algorithm we rejected at the very beginning. s’) < a. which contradicts our definition of ~ as the minimum distance between any pair of points in Q or in R. consisting of the points in Q sorted by increasing y-coordinate. Proof. Suppose that q~ and q~ are (correctly) returned as a closest pair of points in Q. the closest pair found by our recursive calls is the closest pair in P. we will have computed the distance of each pair of points in S (if any) that are at distance less than 8 from each other. we can compare the smallest such distance to 8. these two points either both belong to O or both belong to R. and for each s ~ Sy. and let Sy denote the list consisting of the points in S sorted by increasing y-coordinate. s’) < 8. we determine a closest pair of points in R.qx <.8) If there exists q ~ Q and r ~ R for which d(q. qy) and r = (rx. .10). there points q E Q and r E R for which d(q.qx <. Statement (5. we know that qx < x* <_ rx. ~/2 < 8. as analyzed in the proof of (5. without our really having delved into the structure of the closest-pair problem. By a single pass through the list Py. obtaining r~ and r~. as before.4 Finding the Closest Pair of Points 229 following four lists: Qx. Similarly.10). Combining the Solutions The general machinery of divide and conquer has gotten us this far. Then we have (5.qx <.9) really seem to buy us nothing. in which case (5. We make one pass through Sy.9) There exist q ~ O and r ~ R for which d(q.ql * * The real question is: Are Let 8 be the minimum of * * ) and d(r~. Consider the subset Z of the plane consisting of all points within distance ~ of L. r) < ~ and rx . Assume without loss of generality that s has the smaller y-coordinate.rl). Suppose two points of S lie in the same box.d(q. But any two points in the same box are within distance ~.x* < rx . By the definition of x*.228 Chapter 5 Divide and Conquer 5. r) < & then each of q and r lies within a distance ~ of L. I~ L812 ach box can ontain at most | ne input point. (5. But if there are. One row of Z will consist of four boxes whose horizontal sides have t_he same y-coordinates. Then. we can conclude the algorithm as follows. s’) < & It’s worth noticing at this point that S might in fact be the whole set P. and analogous lists Rx and Ry. Suppose such q and r exist. This collection of boxes is depicted in Figure 5.10) If s.8) as follows. the important thing is that it is an absolute constant. and we can report one of two things~ (i) the closest pair of points in S. Since all points in this box lie on the same side of L. r) < 8. we write q = (qx. then S and s’ are within 15 positions of each other in the sorted list Sy. then the closest such q and r form the closest pair in P. But this is actuary far from true. and that they are at least 16 positions apart in Sy. In view of (5. x* . We can restate (5. Let S __c p denote this set. s’ ~ S have the property that d(s.

or it has one element in each. p~) = Closest-Pair-Kec(Px. we have found the closest such pair.2). Q).Py) Closest-Pair-Rec(Px.and y-coordinate takes time O(n log n). We now bound the running time as well. The running time of the remainder of the algorithm satisfies the recLt~. The reason such an approach works now is due to the extra knowledge (the value of 8) we’ve gained from the recursive calls. in which the "default" quadratic algorithm is improved by means of a different recurrence.5 Integer Multiplication Else Return (r~. (~. This concludes the description of the "combining" part of the algorithm. Proof. the closest pair is correctly found by the recursive call. f! The Problem The problem we consider is an extremely basic one: the multiplication of two integers.r~) then Return (q~.9) we have now determined whether the minimum distance between a point in Q and a point in R is less than 8. using (5.q~) = Closest-Pair-Rec(Ox. since by (5.y) : x = x*} S = points in P within distance ~ of L. The algorithm correctly outputs a closest pair of points in P. As we’ve noted. (r~.q~) . By (5.r~) = Closest-Pair-Rec(Rx. Py) If [PI ~ 3 then find closest pair by measuring all pairwise distances Endif Construct Qx. compute distance from s to each of next 15 points in Sy Let s.ence (5. and hence is O(n log n) by (5. The analysis of the faster algorithm will exploit one of the recurrences sidered in Section 5. and if so returns the closest such pair. A complete description of the algorithm and its proof of correctness are implicitly contained in the discussion so far.2). Ry (O(n) time) (q$. but for the sake of concreteness. so here we just summarize how they fit together.5 Integer Multiplication We now discuss a different application of divide and conquer. using the notation we have developed above. in which more than two recursive calls are spawned at each level. Ry) ~ Analyzing the Algorithm we first prove that the algorithm produces a correct answer. Construct Sy (O(n) time) For each point s ~ Sy.2. The initial sorting of P by x. In the former case. and the special structure of the set S. In a sense.9). s’ be pair achieving minimum of these distances (O(n) time) 5.12) The running time of the algorithm is O(n log n). the case of [P[ _< 3 being clear. Proof. this pair is at distance less than 8. this problem is so basic that one may not initially think of it If d(s.q~) < d(r~. and if so. and it is correctly found by the remainder of the algorithm.Chapter 5 Divide and Conquer 5. using the facts we’ve established in the process of designing it. We prove the correctness by induction on the size of P.1).. Now the closest pair in P either has both elements in one of Q or R. Closest-Pair (P) Construct Px and Py (O(n log n) time) (p~. all the components of the proof have already been worked out. the closest pair in the recursive calls is computed correctly by induction. we now summarize both.!0) and (5. r~) Endif 231 of y-coordinate. s’) Else if d(q~. in the latter case.. the remainder of the algorithm correctly determines whether any pair of points in S is at distance less than 8.#) < 8 then Retur~ (s. Rx. For a given P. [] x* = maximum x-coordinate of a point in set Q L = {~. Summary of the Algorithm A high-level description of the algorithm is the following..

Similarly. recursive way of performing the multiplication. The combining of the solution requires a constant number of additions of O(n)-bit numbers.XlYI -.3). Yl) XoYo = Recursive-Multiply(xo. but it works exactly the same way in base-2 as well.So we have a first candidate for a divide-and-conquer solution: recursively compute the results for these four n/2-bit instances.Thus. In elementary school we always see this done in base10. and x0 corresponds to the "low-order" n/2 bits. Is this good enough to give us a subquadratic running time? We can work out the answer by observing that this is just the case q = 4 of the class of recurrences in (5. and start by writing x as Xl. it performs a constant number of additions on O(n)-bit numbers. in (a) decimal and (b) binary representation. Let’s assume we’re in base-2 (it doesn’t really matter). it takes O(n) time to compute each partial product. it is possible to improve on O(n2) time using a different. (Figure 5.1).8 should help you recall this algorithm. which we saw had the solution T(n) <_ O(nl°g2 q) = O(n1-59). for a constant c. in fl~. Since there are n partial products.3). in fact.2n/2 + Xo. If you haven’t thought about this much since elementary school. elementary schoolers are taught a concrete (and quite efficient) algorithm to multiply two n-digit numbers x and y.2n ÷ (xlYo ÷ xoYl) ¯ n/2 2 ÷ X0y0 in Equation (5. So. y) : Write x=x1-2nl2+x0 Y = Yl "2n/2 + YO Compute Xl+X0 and YI+YO P = Recursive-Multiply(Xl +xo.1). But. at the cost of a single recursive multiplication. If we now also determine xly~ and XoYo by recursion. Ignoring for now the issue that x1 ÷ Xo and yl + Yo may have n/2 + I bits (rather than just n/2).1) reduces the problem of solving a single n-bit instance (multiplying the two R-bit numbers x and y) to the problem of solving four n/2bit instances (computing the products xlYl. You first compute a "partial product" by multiplying each digit ofy separately by x. This will lead to the case q = 3 of (5. xly0. xl corresponds to the "highorder" n/2 bits. then we get the outermost terms explicitly. our algorithm is Recursive-Mult iply (x. we write y = Yl" 2n/2 + Yo. which turns out not to affect the asymptotic results. we have xy = (X1¯ 2n/2 ÷ X0)(Yl" 2n/2 ÷ Y0) (5. there’s something initially striking about the prospect of improving on this algorithm.x0Y0). As we saw earlier in the chapter. xoYl. so it takes time O(n). we should try to get away with only three recursive calls.8 The elementary-school algorithm for multipl~4ng two integers. and O(n) time to combine it in with the running sum of all partial products so far. It turns out there is a simple trick that lets us determine al! of the terms in this expression using just three recursive calls. sing!9 operation on a pair of bits as one primitive step in this computation. Given two nbit numbers. Thus. This has the four products above added together. each of these recursive calls is on an instance of size n/2. Yo) Return XlYI ¯ 2n + (p -.) Counting a.232 Chapter 5 Divide and Conquer 5. and xoY0). The txick is to consider the result of the single multiplication (xl + Xo) (Yl + Yo) = xffl + Xlyo + xoyl + x0yo. the running time T(n) is bounded by the recurrence T(n) < 4T(n/2) + cn even as an algorithmic question. and we get the middle term by subtracting xly1 and xoyo away from (xl + Xo)(y1 + Y0). (5. our divide-and-conquer algorithm with four-way branching was just a complicated way to get back to quadratic time! If we want to do better using a strategy that reduces the problem to instances on n/2 bits. and then you add up all the partial products. Recall that our goal is to compute the expression xlYl . in addition to the three recursive calls. 2n/2 + x0Y0 ~ Designing the Algorithm The improved algorithm is based on a more clever way to break up the product into partial sums. we now have . thus. in fact. this is a total running time of O(n2). Yl +Yo) XlYl = Recursive-Multiply(Xl. and then combine them using Equation ~ Analyzing the Algorithm We can determine the running time of this algorithm as follows.5 Integer Multiplication 233 11oo x 11Ol IlOO 0000 IlOO II00 ~0011100 (a) Figure 5. Thus. in fact. in place of our four-way branching recursion. Aren’t all those partial products "necessary" in some way? But. the solution to this is T(n) < o(nl°g2q) = O(n2). In other words.1) Equation (5.

even if it doesn’t always show up in introductory linear algebra courses. we’ll drop explicit mention of the condition i < m.j) entry is a~bj. an_lbn_2 an_lbn_l and then to compute the coordinates in the convolution vector by summing along the diagonals. where coordinate k is equal to (i. the motivation for the definition can also initially be a bit elusive.. . arn_lXm-1 can be represented arn-~). Now. is the convolution a ¯ b.1) is used in the design of the Fast Fourier Transform. aobl + albo. the convolution comes up in a surprisingly wide variety of different contexts. the table is now rectangular. For example. al .]):i+j=k i<m..6 Convolutions and the Fast Fourier Transform aobo a~bo a2bo aobl alb~ azbx . Another way to think about the convolution is to picture an rt x n table whose (f.. unlike the vector sum and inner product. the coefficient vector c of C(x) is the convolution of the coefficient vectors of A(x) and B(x). the convolution can be easily generalized to vectors of different lengths.. we have i5:i31 The ~n~ing ti~e of geC~siV~ZMUi~iplT bn ~bi~ factors is an-lbO an_lbl . but we still compute coordinates by summing along the diagonals......! coordinates. producing the vector a ÷ b = (ao ÷ bo.(bo. A first example (which also proves that the convolution is something that we all saw implicitly in high school) is polynomial multiplication. an-lbn-1). like this.. aobn_2 a~bn_2 azbn_2 aobn_~ albn_l azbn_l 235 a three-way branching one... (i. The convolution of two vectors of length n (as a and b are) is a vector with 2n .j):i+j=k In other words.. so . where coordinate k is equal to (i. an-l) and b -. In other words..6 Convolutions and the Fast Fourier Transform As a final topic in this chapter. ara_~xra-1 and B(X) = b0 -k blX ÷ b2X2 ÷ .(ao. consider the polynomial C(x) = A(x)B(x) that is equal to their product.. with a running time that satisfies T(n) <_ 3T(n/2) + cn for a constant c. bl . there are a number of common ways of combining them. (For reasons that will emerge shortly. producing the real number a.]):i+]=k i.3) that we were aiming for. bn-1).]<n We can picture this using the table of products aibj as before. the coefficient on the xk term is equal to c~ = ~ a~b]. ~ The Problem Given two vectors a -. This is the case q = 3 of (5. given two polynomials A(x) = ao + alx + a2x2 +... we show how our basic recurrence from (5. bn_l). ¯.. one can compute the sum. Arguably the most important application of convolutions in practice is for signal processing. Using the solution to that recurrence from earlier in the chapter. we define a ¯ b to be a vector with m + n .. a ¯ b = (aobo. This is a topic that could fill an entire course.j < n in the summations for convolutions. it is useful to write vectors in this section with coordinates that are indexed starting from 0 rather than 1. ¯ bn_lxn-1.. Any polynomial A(x) = ao + a~x + a2x2 + ¯ . To illustrate this. (From here on. It’s worth mentioning that. or one can compute the inner product. .) It’s not just the definition of a convolution that is a bit hard to absorb at first. aob2 + albl + azbo . b-aobo + albl +. What are the circumstances where you’d want to compute the convolution of two vectors? In fact. an algorithm with a wide range of applications. In this more general case.. an_2bn_l ÷ an_lbn-2....]<n 5. + an_lbn_~.) A means of combining vectors that is very important in applications. In this polynomial C(x). This definition is a bit hard to absorb when you first see it.234 Chapter 5 Divide and Conquer 5. we mention the following examples here.1 coordinates. since it will be clear from the context that we only compute the sum over terms that are defined.

For simplicity. Sequences like this are often very noisy due to measurement error or random fluctuations. and use this as the value of the kth coordinate. Now.. k ~ we replace ai with a. the smoothed sequence is just the convolution of the original signal and the reverse of the mask (with some meaningless coordinates at the beginning and end). Computing the convolution is a more subtle question than it may first appear. showing for each k the number of pairs (M. or by scaling them differently to make up for the missing terms. (There are some issues With boundary conditions--what do we do when i . this is the number of ways of choosing a man with income ai and a woman with income hi. e-k2) in the Gaussian case above. after al!.cra+n-2) is simply the convolution of a and b. and we have the fol!owing two histograms: One shows the annual income of all the men in the population. This is precisely a convolution. we picture this smoothing operation as follows.. In other words. let’s discuss the problem of computing it efficiently. The definition of convolution. We first define a "mask" Wk_D c~= ~ aib~. for any pair (i. and so a common operation is to "smooth" the measurements by averaging each value ai with a weighted sum of its neighbors within k steps to the left and right in the sequence. For example. we just calculate the sum E aibj (i.. We can similarly write the second histogram as a bn_~). j) where i + j = k. although everything we say carries over directly to the case of vectors of unequal lengths..~):]+~=i+k Computing the Convolution Having now motivated the notion of convolution.j):i+j=k albg. Then it’s not hard to check that with this definition we have the smoothed value ai ~. one can view this example as showing how convolution is the underlying means for computing the distribution of the sum of two independent random variables.]):i+j=k so the combined histogram c = (co . by discarding the first and last k entries from the smoothed signal. we will consider the case of equal length vectors (i.. we just have to warp the notation a bit so that this becomes clear. e-1 .e. Spending O(n2) time on computing the convolution seems natural. for example. We’d now like to produce a new histogram. We can write the first histogram as a ara-1). w) with combined income k.) consisting of the weights we want to use for averaging each point with e-1. gives us a perfectly valid way to compute it: for each k.. and one shows the annual income of all the women. let c~ denote the number of pairs (m. = ~s=-k Wsai+s" This last expression is essentially a convolution. the weights decaying quickly as one moves away from ai. (Using terminology from probability that we will develop in Chapter 13. for some "width" parameter k.) To see the connection with the convolution operation. e-(k-l)2. and for each positioning. as the definition involves O(n2) multiplications azbj. However. sampled at m consecutive points in time. 1. Suppose we’re studying a population of people. such as a temperature or a stock price. In other words. Let’s define b = b2k) by setting be = tvk_g. W) for which man M and woman W have a combined income of k. to indicate that there are a~ men with annual income equal to i. .237 we’ll just give a simple example here to suggest one way in which the convolution arises. one replaces ai with We mention one final application: the problem of combining histograms. since the input and output both only have size O(n). The trouble is that this direct way of computing the convolution involves calculating the product a~bj for every pair (i...) We then iteratively position this mask so it is centered at each possible point in the sequence a. (i.j) (in the process of distributing over the sums in the different terms) and this is ®(n2) arithmetic operations. m = n). and with Z chosen simply to normalize the weights in the average to add up to 1. In other words.k < 0 or i + k > m?--but we could dea! with these. we compute the weighted average.. it’s not inherently clear that we have to spend quadratic time to compute a convolution. in Gaussian smoothing. am-l) which represents a sequence of measurements.

. where e’r~= -1 (and e2’-n = 1). C(x2) .. as shown in Figure 5. we are going to exploit this connection in the opposite direction.. and it is easy to identify them." with axes representing their real and imaginary parts... We will view them as the polynomials A(x) = ao + alx + a2x2 + ¯ . Recall that complex numbers can be viewed as lying in the "complex plane. and so we can then read off the desired answer directly from the coefficients of C(x).. we can treat them as functions of the variable x and multiply them as follows.. This seems to bring us back to quadratic time right away. rather than multiplying A and B symbolically.. But rather than use convolution as a primitive in polynomial multiplication. . x2n.. x2 . is iust one of these applications. since it simply involves the multiplication of O(n) numbers.. Here we take advantage of a fundamental fact about polynomials: any polynomial of degree d can be reconstructed from its values on any set of d + 1 or more points. we have to recover C from its values on x~.. But the situation doesn’t look as hopeful with steps (i) and (fii). and we’ll discuss the mechanics of performing interpolation in more detail later. c2n_2) is the vector of coefficients of C. and we’ll seek to compute their product C(x) = A(x)B(x) in O(n log rt) time. 1. x2 . First we choose 2n values xl.. The key idea that will make this all work is to find a set of 2n values x~. the polynomial equation xk = 1 has k distinct complex roots..1) satisfies the equation. quite surprisingly. .. (iii) Finally. It’s worth mentioning (although it’s not necessary for understanding the algorithm) that the use of the complex roots of unity is the basis for the name Fast Fourier Transform: the representation of a degree-d This approach to multiplying polynomials has some promising aspects and some problematic ones. First. evaluating the polynomials A and B on a single value takes S2 (n) operations... such that the work in evaluating A and B on all of them can be shared across different evaluations.!. as illustrated in the first example discussed previously. we’re going to need-to recal! a few facts about complex numbers and their role as solutions to polynomial equations. The FFT has a wide range of further applications in analyzing sequences of numerical values. and so it can be reconstructed from the values C(xl). ~ Designing and Analyzing the Algorithm To break through the quadratic time barrier for convolutions.. The Complex Roots of Unity At this point. Now. c~ .k = e2’-qi/k (for] = 0. Suppose we are given the vectors a = (ao. For our numbers x~ .. We now describe a method that computes the convolution of two vectors using only O(n log n) arithmetic operations. since (i) and each of these numbers is distinct. We refer to these numbers as the kth roots of unity..6 Convolutions and the Fast Fourier Transform 239 Could one design an algorithm that bypasses the quadratic-size definition of convolution and computes it in some smarter way? In fact.. A set for which this wil! turn out to work very well is the complex roots o[ unity.. O(n) arithmetic operations.238 Chapter 5 Divide and Conquer : 5. x2n that are intimately related in some way. The crux of this method is a powerful technique known as the Fast Fourier Transform (FFT). then we recall from our earlier discussion that c is exactly the convolution a ¯ b. In particular. For the moment.. for a positive integer k. al . this is possible.. so these are all the roots.. their product C has degree at most 2n . 2 . (ii) We can now compute C(xi) for each ] very easily: C(xj) is simply the product of the two numbers A(xj) and B(xj). If c = (Co. Now.. and our plan calls for performing 2n such evaluations.. We can picture these roots as a set of k equally spaced points lying on the unit circle in the complex plane.. we simply observe that since A and B each have degree at most n . the good news: step (ii) requires only Figure 5... Each of the complex numbers wj.2.. k . 2n. which we focus on here. x2 . ¯ an_~xn-~ and B(x) = bo + b~x + b2x2 ÷" "" bn-1xn-1.. bn_l).. We can write a complex number using polar coordinates with respect to this plane as re°i.9 for the case k = 8... 2 ..9 The 8th roots of unity in the complex plane. we will choose the (2n)th roots of unity. an_~) and b= (bo. we are going to exploit the connection between the convolution and the multiplication of two polynomials. x2n and evaluate A(xj) and B(xj) for each of j = !. C(x2n) that we computed in step (ii).. This is known as polynomial interpolation. x2n on which to evaluate A and B. computing convolutions quickly.

Now Suppose that we evaluate each of Aeuen and Aoaa on the rtth roots of unity. Thus we can perform these evaluations in time T(n/2) for each of Ae~en and Aoaa.1. Aeven(X) and Aoaa(x).2n = e2’rji/2n. and we have rt roots of unity rather than 2n. so as to take advantage of the familiar recurrence from (5.2n) ÷ wj. Define a new polynomial D(x)= Z-~s=oV’2n-1 dsxs. + an_2x(n-2)/2. But this is easy. Using this fact and extending the notation to COs. we just have to produce the evaluations of A on the (2n)th roots of unity using O(n) additional operations. We’re now very close to having a recursive algorithm that obeys (5.2n --= (e2’-ri/2n)s.e2’rji/n. This is exactly a version of the problem we face with A and the (2n)th roots of unity. and Aoad(X) = a~ + a3x + asx2 + ¯ ¯ ¯ + a(n-Dx(n-:2)/2" Simple algebra shows us that A(x) = Aeven(X2) + XAodd(X2).an-1 s=o Csxs that we want to reconstruct from its values C(ws.2n) = Aeuen(W~. and the heart of our procedure is a method for making this computation fast. Consider a polynomial C(x)= y~. That is. given the results from the recursive calls on Aeuen and Aoaa. we will assume that rt is a power of 2. and so this gives us a way to compute A(x) in a constant number of operations. and so the bound T(n) on the number of operations indeed satisfies T(n) < 2T(n/2) + O(n). T(n) <_ 2T(n/2) ÷ O(n) where T(n) in this case denotes the number of operations required to evaluate a polynomial of degree n . it’s worth keeping track of the following top-level point: it turns out that the reconstruction of C can be achieved simply by defining an appropriate polynomial (the polynomial D below) and evaluating it at the (2n)th roots of unity. so we do it again here.2n)B(o)j. except that the input is half as large: the degree is (n . and so we can determine A(mj. Consider one of these roots of unity roj. by definition. we can clearly compute the products C(wj.1) and gives us the running time we want. as noted above. we get that 2n-1 2n-1 D(oAj. for a total time of 2T(n/2).n_) = ~ Ct(~ e(2"rO(st+js)/2n) t=0 s=0 2n-1 2n-1 we discover that both of the evaluations on the right-hand side have been performed in the recursive step. that consist of the even and odd coefficients of A.240 Chapter 5 Divide and Conquer 5. In describing this part of the algorithm...6 Convolutions and the Fast Fourier Transform constant number of operations. We now consider the values of D(x) at the (2n)th roots of unity. This is exactly what we’ve just seen how to do using O(n log n) operations.2n) at the (2rt)th roots of unity. spending an additional O(n log n) operations and concluding the algorithms.n) = A(ooj. Now recall that OOs. and this gives us the desired O(n log n) bound for step (i) of our algorithm outline. reconstructing C from its values on the (2n)th roots of unity.2nAodd(O)~. given the evaluation of the two constituent polynomials that each have half the degree of A.2n)" t=0 s=0 .2n) . For simplicity in describing this algorithm.2)/2 rather than n .2n = (e2’-ri/2n)s even when s >_ 2n. The quantity o)}.1 on all the (2n)th roots of unity.2n) using a = ~_~ Ct(~ c°~+j.2n) in O(n) more operations. Thus. where ds = C(cos. We run the same procedure to evaluate the polynomial B on the (2n)th roots of unity as well. and hence o¢.2n is an nth root of unity. So when we go to compute A(o)j. respectively.an). A Recursive Procedure for Polynomial Eualuatiou We want to design an algorithm for evaluating A on each of the (2n)th roots of unity recursively. to conclude the algorithm for multiplying A and B. How does one break the evaluation of a polynomia! into two equal-sized subproblems? A useful trick is to define two polynomials..2. Polynomial Interpolation We’ve now seen how to evaluate A and B on the set of all (2n)th roots of unity using O(n log n) operations and. Doing this for a!l 2rt roots of unity i~ therefore O(n) additional operations after the two recursive calls. 241 polynomial P by its values on the (d ÷ 1)st roots of unity is sometimes referred to as the discrete Fourier transform of P.2n is equal to (e2’-rji/2n)2 =. Aeven(X) = a0 + a2x + a4x2 +.1)--namely. we need to execute step (iii) in our earlier outline using O(n log n) operations.

t = (x . and then the coefficients of C are the coordinates in the convolution vector c = a ¯ b that we were originally seeking. It is not hard to solve this recurrence by unrolling it.2n--! s is.2n_C((’°s. This is simply because eo is by definition a root of 2n--1 t x2n . with each entry holding a A[n] is unimodal: For some index p between 1 and n. it follows that co is (x-2~-I x~). and the level after that has one problem of size at most n/4. we would have T(n) < T(n/2) + c _< k log. throw away half the input. If one needs to compute something using only O(log n) operations. behind the O(log n) running time for binary search. the values in the array entries increase up to position p in A and then decrease the remainder of the way unt~ position n. And this wraps everything up: we reconstruct the polynomial C from its values on the (2n)th roots of unity.2n)" D(X) Z-~s=O 1 s We can do a]] the evaluations of the values D(eozn_s. independent of]. We can view this as a divide-and-conquer approach: for some constant c > 0.zn = I. So we get that y~.16) T(n) < T(n/2) + c when n > 2. "2n-1 we have v ~s=O ms-= O. and continue recursively on what’s left. and then fal! from there on. x-’2n-1 Csxs. a useful strategy that we discussed in Chapter 2 is to perform a constant amount of work. which takes time at most c plus the time spent in all subsequent recursive calls.242 Chapter 5 Divide and Conquer Solved Exercises 243 To analyze the last line.s=0 D(a~j. In summary.14) For any polynomial C(x) -. Solution Let’s start with a general discussion on how to achieve a nmning time of O(log n) and then come back to the specific problem here.-Z-. Analyzing the first few levels: M the first level of recursion. As in the chapter. we can compute the convolution of the original vectors a and b in O(n log rO time. Zn) in O(nlog. The next level has one problem of size at most n/2. and this happens if t + j is a multiple of 2n. where we don’t know k or b. for example. (S.2n) = 2nczn_~. Evaluating the polynomial D(x) at the (2n)th roots of unity thus gives us the coeffients of the polynomial C(x) in reverse order (multiplied by 2n each). by reading as few entries of A as possible. 2n)Xs" we have that c = ~D( 2n-s. as follows. We can also do this by partial substitution.(n/2) + c = k logb n 7 k logb 2 + c. performing at most c operations to finish the computation. which contributes c to the running time.2n = Z-. and corresponding polynomial (S. we have shown the following.) You’d like to find the "peak entry" p without having to read the entire array--in fact. Show how to find the entry p by reading at most O(log n) entries of A. Suppose ~ve guess that T(n) < k !ogb n. which contributes yet another c. we have a single problem of size n. . Thus the total running time is at most c times the number of levels of recursion. we perform at most c operations and then continue recursively on an input of size at mbst n/2. also a root of !-~t=0 Thus the only term of the last line’s outer sum that is not equal to 0 is for q such that wt+j.s=0 1 ~0 = X-. Identifying apattem: No matter how many levels we continue. then we have the recurrence (5. Summing over all levels of recursion: Each level of the recursion is contributing at most c operations. (So if you were to draw a plot with the array position j on the x-axis and the value of the entry A[j] on the y-axis. and T(2) < c. that K-’2n--1 1 = 2n.1)(~t=0 X ) and eo 7~ 1. Solved Exercises Solved Exercise 1 Suppose you are given an array A with n entries. For this value. each level will have just one problem: level j has a single problem of size at most n/2J. the p!otted points would rise until x-value p. which contributes another c. we will assume that the recursion "bottoms out" when n = 2. This was the idea.n) operations using the divide-and-conquer approach developed for step (i). We sum this up as follows. Assuming that this holds for smaller values of n in an inductive argument. if t = 2n -j. If T(n) denotes the running time on an input of size n.lS) Using the Fast Fourier Transforrrt to determine the product polynomial C(x). we use the fact that for any (2n)th root of unity eo 7~ 1. since x2n .1 = 0. which is at most c log2 n = O(log rt). s=0 wt+j. and it takes log2 n levels of recursion to reduce n to 2. where they’d achieve their maximum.

Then you should return "buy on 2. you should report this instead. Hence we end up with the solution T(n) < c log2 n. we’ll make the usual assumption that n is a power of 2. Clearly. This is no loss of generality: if n’ is the next power of 2 greater than n. p(1) = 9. They want to know: When should they have bought and when should they have sold in order to have made as much money as possible? (If there was no way to make money during the n days. and hence O(n log n) by (5. which is exactly what we got by unrolling the recurrence. in the more general case when each level of the recursion throws away any constant fraction of the input. we do not change the answer. n. Let’s number the days i = 1. we should mention that one can get an O(log n) running time. and then figure out how to get an overall solution from this in O(n) time.. suppose n = 3. they wanted to buy 1. sell on 3" (buying on day 2 and selling on day 3 means they would have made $4 per share. by essentially the same reasoning. if A[n/2] is larger than both A[n/2 . Thus we can apply (5. Also. since we need to know whether entry n/2 is sitting on an "up-slope" or on a "down-slope. to make things easier.sets S and S’. if there isn’t. and choosing j ~ S’ which maximizes p(j). for some constant a < 1. solving the problem recursively on each of these two sets. we can set p(i) = p(n) for all i between n and n’.1. Now. and so we can continue recursively on entries n/2 + 1through ft. let S be the set of days 1 . and we at most double the size of the input (which will not affect the O0 notation). and S’ be the set of days n/2 + n. So suppose we look at the value A[n/2].p(i) where i ~ S and j S’. * The maximum of p(j) -p(i). The first two alternatives are computed in time T(n/2). then the value of this solution is p(j) .) For example. and each level of recnrsion involves at most c operations. we can’t tell whether p lies before or after n/2.. and they have the following type of problem that they want to solve over and over. for each day i. It now takes at most log1/a n levels of recursion to reduce n down to a constant size. transforming an instance of size n to one of size at most an. p(2) = 1. If A[rt/2 .. n/2. so that k logb 2 = c log2 2 = c. Solved Exercise 2 You’re consulting for a small computation-intensive investment company. p(3) = 5. A typical instance of the problem is the following.. at some point in the past. and the third alternative is computed by finding the minimum in S and the .244 Chapter 5 Divide and Conquer Solved Exercises 245 The first term on the fight is exactly what we want. Finally. Now let’s get back to the problem at hand.. Since we’re faced with a similar issue here. we are done: the peak entry is in fact equal to rt/2 in this case. Your investment friends were hoping for something a little better.1). 2 . o The optimal solution on S.. and so we can continue recursively on entries 1 through n/2 . or there isn’t. Thus our algorithm is to take the best of the following three possible solutions." So we also look at the values A[n/2 . If there is an optimal solution in which they hold the stock at the end of day n/2. In this way. If A[n/2 . Now. Finally.) Suppose during this time period. A natural approach would be to consider the first n/2 days and the final n/2 days separately..000 shares on some day and sell all these shares on some (later) day. In all these cases. Our divide-and-conquer algorithm will be based on the fol!owing observation: either there is an optimal solution in which the investors are holding the stock at the end of day n/2. over i ~ Sandy ~ S’.16) to conclude that the running time is O(log n). This would give us the usual recurrence r(n) < 2T (-~) + O(n). we could probe the midpoint of the array and try to determine whether the "peak entry" p lies before or after this midpoint. " Show how to find the correct numbers i and ] in time O(n log n). o The optimal solution on S’. From this value alone. each by recursion. (We’ll assume for simplicity that the price was fixed during each day.!] < A[n/2] < A[n/2 + 1]. But this value is maximized by simply choosing i S which minimizes p(i). so we just need to choose k and b to negate the added c at the end. then the optimal solution is the better of the optimal solutions on the .15). they have a price p(i) per share for the stock on that day.1] and A[n/2 + 1]. They’re doing a simulation in which they look at n consecutive days of a given stock. the maximum possible for that period). we perform at most three probes of the array A and reduce the problem to one of at most half the size. There are now three possibilities. This we can do by setting b = 2 and k = c. there’s a simple algorithm that takes time O(n2): try all possible pairs of buy/sell days and see which makes them the most money. then entry n/2 must come strictly beforep.1] and A[n/2 ÷ 1].1] > A[n/2] > A[n/2 + 1].. If we wanted to set ourselves up to use (5. then entry n/2 must come strictly after p. let’s think about how we might apply a divide-and-conquer strategy. Solution We’ve seen a number of instances in this chapter where a bruteforce search over pairs of elements can be reduced to O(n log n) by divide and conquer..

We motivated the problem of counting inversions as a good measure of how different two orderings are. and it corresponds to a unique account in the bank. and we’ll say that two bank cards are equivalent if they correspond to the same account. by measuring it and then comparing it to a computationa! prediction. which we will define here to be the nth smallest value. Give an algorithm that finds the median value using at most O(log n) queries. The total net force on particle j. Thus the running time T(n) satisfies T(n) <_2T (-~) + O(n). Since queries are expensive. However. As in the text. the interactions among large numbers of very small charged particles. is equal to 247 maximum in S’. we will pose this question as Exercise 7. You are interested in analyzing some hard-to-obtain data from two separate databases. Recall the problem of finding the number of inversions. They have an inert lattice structure. and the chosen database will return the/(m smallest value that it contains.(i-i)2 . as desired. their setup works as follows. you would like to compute the median using as few queries as possible. Let’s call a pair a significant inversion ff i <j and ai > 2aj. at the end of that chapter. Each account can have many bank cards . This computational part is where they need your help. Each bank card is a small plastic object. after performing some computations.246 Chapter 5 Divide and Conquer Exercises corresponding to it. Thus we can model their structure n} on the real line. determines whether they are equivalent. In fact. one can find the optimal pair of days in O(n) tA’ne using dynamic programming.q 0-----~ Endif Endfer Output F] Endfor . and at each of these points j. Give an O(n log n) algorithm to count the number of significant inversions between two orderings.) They want to study the total force on each particle.(]-i)2 They’ve written the following simple program to compute F~ for all j: n Initialize Fi to 0 n If i < j then C qi qi Else if i > j then C qi qJ Add . the only way you can access these values is through queries to the databases. which takes time O(n). However. Basically. as part of their experimental design. Ina single query. by Couiomb’s Law. Each database contains n numerical values--so there are 2n values total--and you may assume that no two values are the same. they have a particle with charge qJ" (Each charge can be either positive or negative. is there a set of more than n/2 of them that are all equivalent to one another? Assume that the only feasible operations you can do with the cards are to pick two of them and plug them in to the equivalence tester.. You’ve been working with some physicists who need to study. one might feel that this measure is too sensitive. the topic of the next chapter. They have a collection of n bank cards that they’ve confiscated. but the bank has a high-tech "equivalence tester" that takes two bank cards and. and they use this for placing charged particles at regular spacing along a straight line. Show how to decide the answer to their question with only O(n log n) invocations of the equivalence tester. you can specify a value k to one of the two databases. You’d like to determine the median of this set of 2n values. We note that this is not the best running time achievable for this problem. It’s very difficult to read the account number off a bank card directly. Exercises 1. Their question is the following: among the collection of n cards. which we assume are all distinct. suspecting them of being used in fraud. containing a magnetic stripe with some encrypted data. an. and we define an inversion to be a pair i < j such that ai > ai. and they come to you with the following problem. Suppose you’re consulting for a bank that’s concerned about fraud detection..

248

Chapter 5 Divide and Conquer

Notes and Further Reading

249

It’s not hard to analyze the running time of this program: each invocation of the inner loop, over i, takes O(n) time, and this inner loop is invoked O(n) times total, so the overall running time is O(n2). The trouble is, for the large values of n they’re working with, the program takes several minutes to run. On the other hand, their experimental setup is optimized so that they can throw down n particles, perform the measurements, and be ready to handle n more particles withJ_n a few seconds. So they’d really like it ff there were a way to compute all the forces Fi much more quickly, so as to keep up with the rate of the experiment. Help them out by designing an algorithm that computes all the forces F1 in O(n log n) time. Hidden surface removal is a problem in computer graphics that scarcely needs an introduction: when Woody is standing in front of Buzz, you should be able to see Woody but not Buzz; when Buzz is standing in front of Woody .... well, you get the idea. The magic of hidden surface removal is that you-can often compute things faster than your intuition suggests. Here’s a clean geometric example to illustrate a basic speed-up that can be achieved. You are given n nonvertical ]lnes in the plane, labeled L1 ..... Ln, with the i~ line specified by the equation y = aix + hi. We will make the assumption that no three of the lines all meet at a single point. We say line Lg is uppermost at a given x-coordinate x0 if its y-coordinate at x0 is greater than the y-coordinates of a~ the other lines at x0: a~xo + bi > aixo + b1 for all ] ~ i. We say line L~ is visible if there is some x-coordinate at which it is uppermost--intuitively, some portion of it can be seen if you look down from "y = 002’ Give an algorithm that takes n lines as input and in O(n log n) time returns all of the ones that are visible. Figure 5.10 gives an example.
Consider an n-node complete binary tree T, where n = 2d - 1 fo~ some d. Each node v of T is labeled with a real number xv. You may assume that the real numbers labeling the nodes are all distinct. A node v of T is a local minimum ff the label xv is less than the label xw for all nodes w that are joined to v by an edge. You are given such a complete binary tree T, but the labeling is only specified in the following implicit way: for each node v, you can determine the value xu by probing the node v. Show how to find a local minim _u~m of T using only O(log n) probes to the nodes of T.

5

Figure 5.10 An instance of hidden surface removal with five lines (labeled 1-5 in the figure). All the lines except for 2 are visible.

natural numbers (i,j), where 1 < i < ~. and 1 _<] _< n; the nodes (i,j) and (k, e) are joined by an edge ff and only ff [i -/~1 + [/- el = 1.) We use some of the terminology of the previous question. Again, each node u is labeled by a real number x~; you may assume that all these labels are distinct. Show how to find a local minimum of G using only O(n) probes to the nodes of G. (Note that G has n2 nodes.)

Notes and Further Reading
The militaristic coinage "divide and conquer" was introduced somewhat after the technique itself. Knuth (1998) credits John yon Neumann with one early explicit application of the approach, the development of the Mergesort Algorithm in 1945. Knuth (1997b) also provides further discussion of techniques for solving recurrences. The algorithm for computing the closest pair of points in the plane is due to Michael Shamos, and is one of the earliest nontrivial algorithms in the field of computational geometry; the survey paper by Staid (1999) discusses a wide range of results on closest-point problems. A faster randomized algorithm for this problem will be discussed in Chapter 13. (Regarding the nonobviousness of the divide-and-conquer algorithm presented here, Staid also makes the interesting historical observation that researchers originally suspected quadratic time might be the best one could do for finding the closest pair of points in the plane.) More generally, the divide-and-conquer approach has proved very useful in computational geometry, and the books by Preparata and Shamos

Suppose now that you’re given an n x n grid graph G. (An n x n grid graph is just the adjacency graph of an n x n chessboard. To be completely precise, it is a graph whose node set is the set of all ordered pairs of

250

Chapter 5 Divide and Conquer

(1985) and de Berg eta!. (1997) give many further examples of this technique in the design of geometric algorithms. The algorithm for multiplying two n-bit integers in subquadrafic time is due to Karatsuba and Ofrnan (1962). Further background on asymptotically fast multiplication algorithms is given by Knuth (!997b). Of course, the number of bits in the input must be sufficiently large for any of these subquadrafic methods to improve over the standard algorithm. Press et al. (1988) provide further coverage of the Fast Fourier Transform, including background on its applications in signal processing and related areas. Notes on the Exercises Exercise 7 is based on a result of Donna Llewellyn, Craig Tovey, and Michael Trick.

Chapter
Dynamic Programmiag

We began our study of algorithmic techniques with greedy algorithms, which in some sense form the most natural approach to algorithm design. Faced with a new computational problem, we’ve seen that it’s not hard to propose multiple possible greedy algorithms; the challenge is then to determine whether any of these algorithms provides a correct solution to the problem in all cases. The problems we saw in Chapter 4 were al! unified by the fact that, in the end, there really was a greedy algorithm that worked. Unfortunately, this is far from being true in general; for most of the problems that one encounters, the real difficulty is not in determining which of several greedy strategies is the right one, but in the fact that there is no natural greedy algorithm that works. For such problems, it is important to have other approaches at hand. Divide and conquer can sometimes serve as an alternative approach, but the versions

Rather, as we noted in Chapt~~~~/s there tended to reduce a running time that was unnecessarily large, b--fi-t already polynomial, down to a faster nmning time. We now turn to a more powerful and subtle design technique, dynamic programming. It will be easier to say exactly what characterizes dynamic programming after we’ve seen it in action, but the basic idea is drawn from the intuition behind divide and conquer and is essentially the opposite of the greedy strategy: one implicitly explores the space of all possible solutions, by carefully decomposing things into a series of subproblems, and then building up correct solutions to larger and larger subproblems. In a way, we can thus view dynamic programming as operating dangerously close to the edge of

enough to reduce exponent,i..~~te4~~earc~own to polynomial time.

c,~apter are often not s~ong

252

Chapter 6 Dynamic Programming

6.1 Weighted Interval Scheduling: A Recursive Procedure

253

brute-force search: although it’s systematically working through the exponentially large set of possible solutions to the problem, it does this without ever examining them all explicitly. It is because of this careful balancing act that dynamic programming can be a tricky technique to get used to; it typically takes a reasonable amount of practice before one is fully comfortable with it. With this in mind, we now turn to a first example of dynamic programming: the Weighted Interval Scheduling Problem that we defined back in Section 1.2. We are going to develop a dynamic programming algorithm for this problem in two stages: first as a recursive procedure that closely resembles brute-force search; and then, by reinterpreting this procedure, as an iterative algorithm that works by building up solutions to larger and larger subproblems.

We use the notation from our discussion of Interval Scheduling in Secn, with each request i specifying a start time s~ and a finish time f~. Each interval i now also has a value, or weight v~. Two intervals are compatible if they do not overlap. The goal of our current n} of mutually compatible intervals, so as to maximize the sum of the values of the selected intervals, ~ss viLet’s suppose that the requests are sorted in order of nondecreasing finish time: fl < f2 < "" ".<_ fn. We’ll say a request i comes before a request] if i <j. This wil! be the natural left-to-right order in which we’ll consider intervals. To help in talking about this order, we define p(j), for an interval ], to be the largest index i < ] such that intervals i and j are disjoint. In other words, i is the leftmost interval that ends before j begins. We define p(]) = 0 if no request i < j is disjoint from ]. An example of the definition of p(]) is shown in Figure 6.2.

6.1 Weighted Interval Scheduling: A Recursive Procedure
We have seen that a particular greedy algorithm produces an optimal solution to the Interval Scheduling Problem, where the goal is to accept as large a set of nonoverlapping intervals as possible. The Weighted Interval Scheduling Problem is a strictly more general version, in which each interval has a certain value (or weight), and we want to accept a set of maximum value. ~

~ Designing a Recursive Algorithm
Since the original Interval Scheduling Problem is simply the special case in which all values are equal to 1, we know already that most greedy algorithms will not solve this problem optimally. But even the algorithm that worked before (repeatedly choosing the interval that ends earliest) is no longer optimal in this more general setting, as the simple example in Figure 6.1 shows. Indeed, no natural greedy algorithm is known for this pr0blem, which is what motivates our switch to dynamic programming. As discussed above, we wil! begin our introduction to dynamic programming with a recursive type of .algorithm for this problem, and then in the next section we’ll move to a more iterative method that is closer to the style we use in the rest of this chapter.

Now, given an instance of the Weighted Interval Scheduling Problem, let’s consider an optimal solution CO, ignoring for now that we have no idea what it is. Here’s something completely obvious that we can say about CO: either interval n (the last one) belongs to CO, or it doesn’t. Suppose we explore both sides of this dichotomy a little further. If n ~ CO, then clearly no interval indexed strictly between p(n) and n can belong to CO, because by the definition ofp(n), n - 1 all overlap interval n. Moreover, if n s CO, then CO must include an optimal solution to the problem consisting of requests {1 ..... p(n)}--for if it didn’t, we could replace CO’s choice of requests from {1 ..... p(n)} with a better one, with no danger of overlapping request n.

Index

1 2 3 4

V1 = 2

t 172 = 4 t

p(1) = 0
p(2) = 0 V3 = 4 v4= 7 p(3) = 1

Index

p(4) = 0 p(S) = 3
1 p(6) = 3

Value = 1 Value = 3 ; Figure 6.1 A simple instance of weighted interval scheduling. t Value = 1

5 6

Figure 6.2 An instance of weighted interval scheduling with the functions p(i) defined for each interval j.

254

Chapter 6 Dynamic Programming

6.1 Weighted Interval Scheduling: A Recursive Procedure

255

On the other hand, if n ~ (9, then (9 is simply equal to the optimal solution n - 1}. This is by completely analogous reasoning: we’re assuming that (9 does not include request n; so if n - 1}, we could replace it with a better one. n}

The correctness of the algorithm follows directly by induction on j:
(6,3) Compute’0pt(j) correctly computes OPT(j) for each] = 1, 2, ..,, n,

involves looking at the optimal solutions of smaller problems of the form ]}. Thus, for any value of] between ! and n, let (9i denote the optimal j}, and let OPT(j) denote the value of this solution. (We define OPT(0) = 0, based on the convention that this is the optimum over an empty set of intervals.) The optimal solution we’re seeking is precisely (gn, with value OPT(n). For the optimal solution (9i j}, our reasoning above (generalizing from the case in which j = n) says that either j ~ (9i’ in which case OPT(j) = 11i q- OPT(p(])), or j 0i, in which case OPT(j) = OPT(j -- 1). Since these are precisely the two possible choices (j (9i or j 0i), we can hn-ther say that (6.1)
OPT(j) ---- max(v] + OPTQg(j)), OPT(j -- 1)).

Proof. By definition OPT(0) ---- 0. Now, take some j > 0, and suppose by way of induction that Compute-0pt(i) correctly computes OPT(i) for all i <j. By the induction hypothesis, we know that Compute-0pt(p(j)) = OPT(p(])) and Compute-0pt(j -.1) = OPT(j -- 1); and hence from (6.1) it follows that
OPT(j) -----

max(u] + Compute-Opt(2(])), Compute-Opt(j -- I)) = Compute-Opt(j).

And how do we decide whether n belongs to the optimal solution (9i? This too is easy: it belongs to the optimal solution if and only if the first of the options above is at least as good as the second; in other words, j} if and only if
Uj + OPT(p(])) >_ OPT(j -- 1).

Unfortunately, if we really implemented the algorithm Compute-Opt as just written, it would take exponential time to run in the worst case. For example, see Figure 6.3 for the tree of calls issued for the instance of Figure 6.2: the tree widens very quickly due to the recursive branching. To take a more extreme example, on a nicely layered instance like the one in Figure 6.4, where n, we see that Compute-Opt(j) generates separate recursive calls on problems of sizes j - 1 and j - 2. In other words, the total number of calls made to Compute-Opt on this instance will grow

OPT(6) OPT(3)

OPT(l)

These facts form the first crucial component on which a ,dynamic programming solution is based: a recurrence equation that expresses th6 optimal solution (or its value) in terms of the optimal solutions to smaller subproblems. Despite the simple reasoning that led to this point, (6.1) is already a significant development. It directly gives us a recursive algorithm to compute OPT(n), assuming that we have already sorted the requests by finishing time and computed the values of p(j) for each j.
Compute-Opt (]) If j----0 then Keturn 0 Else Return max(u]+Compute-Opt (p (j)), Compute-OptG - I)) Endif

OPT(3)

OPT(l) OPT(2)

Ig
OPT(1)

The tree of subproblems~ rows very quickly. )

Figure 6.3 The tree of subproblems called by Compute-Opt on the problem instance of Figure 6.2.

256

Chapter 6 Dynamic Programming

6.1 Weighted Interval Scheduling: A Recursive Procedure

257

Define M~] = max(ui+M-Compute-OptOp(])), M-Compute-OptS-- I)) Return M[j] Endif
I I

Figure 6.4 An instance of weighted interval scheduling on which the shnple ComputeOpt recursion will take exponential time. The vahies of all intervals in t_his instance are 1.

/~::# Analyzing the Memoized Version Clearly, this looks ver~ similar to our previous implementation of the algorithm; however, memoization has brought the running time way down. (6.4) The running time o[M-Compute-Opt(n) is O(n) (assuming the input intervals are sorted by their finish times). Proof. The time spent in a single call to M-Compute-0pt is O(1), excluding the time spent in recursive calls it generates. So the rurming time is bounded by a constant times the number of calls ever issued to M-Compute-0pt. Since the implementation itself gives no explicit upper bound on this number of calls, we try to find a bound by looking for a good measure of "progress." The most useful progress measure here is the number of entries in M that are not "empty." Initially this number is 0; but each time the procedure invokes the recurrence, issuing two recursive calls to M-Compute-0pt, it fills in a new entry, and hence increases the number of filled-in entries by 1. Since M has only n + I entries, it follows that there can be at most O(n) calls to M-ComputeOpt, and hence the running time of M-Compute-0pt(n) is O(n), as desired.

like the Fibonacci numbers, which increase exponentially. Thus we have not achieved a polynomial-time solution.

Memoizing the Recursion In fact, though, we’re not so far from having a polynomial-time algorithr~. A fundamental observation, which forms the second crucial component of a dynamic programming solution, is that our recursive algorithm COmpute¯ Opt is really only solving n+ 1 different subproblems: Compute-0pt(0), Compute-0pt(n). The fact that it runs in exponential time as written is simply due to the spectacular redundancy in the number of times it issues each of these calls. How could we eliminate all this redundancy? We could store the value of Compute-0pt in a globally accessible place the first time we compute it and then simply use this precomputed value in place of all future recursive calls. This technique of saving values that have already been computed is referred to as memoization. We implement the above strategy in the more "intelligent" procedure MCompute-0pt. This procedure will make use of an array M[0... hi; M[j] will start with the value "empty," but will hold the value of Compute-0pt(j) as soon as it is first determined. To determine OPT(n), we invoke M-ComputeOpt(n).
M-Compute-Opt (]) If ] = 0 then Return 0 Else if M~] is not empty then Return M~] Else

Computing a Solution in Addition to Its Value So far we have simply computed the value of an optimal solution; presumably we want a flail optimal set of intervals as well. It would be easy to extend M-Compute-0pt so as to keep track of an optimal solution in addition to its value: we could maintain an additional array S so that S[i] contains an optimal i}. Naively enhancing the code to maintain the solutions in the array S, however, would blow up the rulming time by an additional factor of O(n): while a position in the M array can be updated in O(1) time, writing down a set in the $ array takes O(n) time. We can avoid this O(n) blow-up by not explicitiy maintaining $, but ra~er by recovering the optimal solution from values saved in the array M after the optimum value has been computed.

We know from (6.2) that j belongs to an optimal solution for the set j} if and only if v] + OPT(p(j)) > OPT(j -- 1). Using this observation, we get the following simple procedure, which "traces back" through the array M to find the set of intervals in an optimal solution.

258

Chapter 6 Dynamic Programming

6.2 Principles of Dynamic Programming

259

Find-Solut ion (j) If j = 0 then Output nothing Else If ui+ M[P(J)]>-M[J- 1] then Output j together with the result of Find-Solution(p~)) Else Output the result of Find-Solution(j- I) Endif Endif

values that come earlier in the array. Once we have the array M, the problem is solved: M[n] contains the value of the optimal solution on the full instance, and Find-Solut ion can be used to trace back through M efficiently and return an optimal solution itself.
The point to realize, then, is that we can directly compute the entries in M by an iterative algorithm, rather than using memoized recursion. We just start with M[O] = 0 and keep incrementing j; each time we need to determine a value M[j], the a.nswer is provided by (6.1). The algorithm looks as follows.
It erat ive-Comput e-Opt

M[O] = 0
For j=l, 2 ..... n MId] = max(ui + M[pq) ], M[j - 1]) Enddor

Since Find-Solution calls itself recursively only on strictly smaller values, it makes a total of O(rt) recursive calls; and since it spends constant time per call, we have (6.5) Giuert the array M of the optimal ualues of the sub-problems, Find-

~ Analyzing the Algorithm
By exact analogy with the proof of (6.3), we can prove by induction on j that this algorithm writes OPT(j) in array entry M[j]; (6.1) provides the induction step. Also, as before, we can pass the filled-in array M to Find-Solution to get an optimal solution in addition to the value. Finally, the running time of Iterative-Compute-0pt is clearly O(n), since it explicitly runs for n iterations and spends constant time in each. An example of the execution of Iterative-Compute-0pt is depicted in Figure 6.5. In each iteration, the algorithm fills in one additional entry of the array M, by comparing the value of uj + M[p(j)] to the value ofM[j - 1]. A Basic Outline of Dynamic Programming This, then, provides a second efficient algorithm to solve the Weighted Interval Scheduling Problem. The two approaches clearly have a great deal of conceptual overlap, since they both grow from the insight contained in the recurrence (6.1). For the remainder of the chapter, we wil! develop dynamic programming algorithms using the second type of approach--iterative building up of subproblems--because the algorithms are often simpler to express this way. But in each case that we consider, there is an equivalent way to formulate the algorithm as a memoized recursion. Most crucially, the bulk of our discussion about the particular problem of selecting intervals can be cast more genera~y as a rough template for designing dynamic programming algorithms. To set about developing an algorithm based on dynamic programming, one needs a collection of subproblems derived from the original problem that satisfies a few basic properties.

Solution returns art opdmal solution in O(n) dine.

6.2 Principles of Dynamic Programming: Memoization or Iteration over Subproblems
We now use the algorithm for the Weighted Interval Scheduling Problem developed in the previous section to summarize the basic principles of dynamic programming, and also tc offer a different perspective that will be fundamental to the rest of the chapter: iterating over subproblems, rather than computing solutions recursively. In the previous section, we developed a polynomial-time solution to the Weighted Interval Scheduling Problem by first designing an exponential-time recursive algorithm and then converting it (by memoization) to an efficient recursive algorithm that consulted a global array M of optimal solutions to subproblems. To really understand what is going on here, however, it helps to formulate an essentially equivalent version of the algorithm. It is this new formulation that most explicitly captures the essence of the dynamic programming technique, and it will serve as a general template for the algorithms we develop in later sections.

f! Designing the Algorithm
The key to the efficient algorithm is really the array M. It encodes the notion that we are using the value of optimal solutions to the subproblems on intervals j} for each j, and it uses (6.1) to define the value of M[j] based on

260
Index 1 2 3 4 5 6
L~I = 2

Chapter 6 Dynamic Programming
0123456

6.3 Segmented Least Squares: Multi-way Choices

261

6.3 Segmented Least Squares: Multi-way Choices
We now discuss a different type of problem, which illustrates a slightly more complicated style of dynamic programming. In the previous section, we developed a recurrence based on a fundamentally binary choice: either the interval n belonged to an optimal solution or it didn’t. In the problem we consider here, the recurrence will involve what might be called "multiway choices": at each step, we have a polynomial number of possibilities to consider for the s.tructure of the optimal solution. As we’ll see, the dynamic programming approach adapts to this more general situation very naturally. As a separate issue, the problem developed in this section is also a nice illustration of how a clean algorithmic definition can formalize a notion that initially seems too fuzzy and nonintuitive to work with mathematically.

p(1) = 0
w2 = 4
I/73 = 4 1B4=7 tV5 = 2 I

0~2 4 p(3) = 1

p(4) = o
tu6 =

I4 !6 ~ ~ I I

->

Co) Figure 6.5 Part (b) shows the iterations of r~cerag±ve-¢omPu~Ce-0P~c on the sample instance of Weighted Interval Scheduling depicted In part (a). ~ The Problem Often when looking at scientific or statistical data, plotted on a twodimensional set of axes, one tries to pass a "line of best fit" through the data, as in Figure 6.6.
This is a foundational problem in statistics and numerical analysis, formulated as follows. Suppose our data consists of a set P of rt points in the plane, (xn, Yn); and suppose x1 < x2 < .. ¯ < xn. Given a line L defined by the equation y = ax + b, we say that the error of L with respect to P is the sum of its squared "distances" to the points in P:
n

(i) There are only a polynomial number of subproblems. (ii) The solution to the original problem can be easily computed from the solutions to the subproblems. (For example, the original problem may actually be one of the subproblems.) (iii) There is a natural ordering on subproblems from "smallest" to "largest;’ together with an easy-to-compute recurrence (as in (6.1) and (6.2)) that allows one to determine the solution to a subproblem from the solutions to some number of smaller subproblems. Naturally, these are informal guidelines. In particular, the notion of "smaller" in part (iii) will depend on the type of recurrence one has. We w~ see that it is sometimes easier to start the process of designing such an algorithm by formulating a set of subproblems that looks natural, and then figuring out a recurrence that links them together; but often (as happened in the case of weighted interval scheduling), it can be useful to first define a recurrence by reasoning about the structure of an optimal solution, and then determine which subproblems will be necessary to unwind the recurrence: This chicken-and-egg relationship between subproblems and recurrences is a subtle issue underlying dynamic programming. It’s never clear that a collection of subproblems will be useflfl until one finds a recurrence linking them together; but it can be difficult to think about recurrences in the absence of the :’smaller" subproblems that they build on. In subsequent sections, we will develop further practice in managing this design trade-off.

Error(L, P) = Y~,(Yi - axi - b)2i=1

Figure 6,6 A "line of best fit."

262

Chapter 6 Dynamic Programming

6.3 Segmented Least Squares: Multi-way Choices
O0o

263

0 0 0 0 0 0 0 0 0O0 O0 0 0 0O0 O0

0 0 0 0 0 0 0 0

Figure 6.7 A set of points that lie approxLmately on two lines.

Figure 6.8 A set of points that lie approximately on three lines.

A natural goal is then to find the line with minimum error; this turns out to have a nice closed-form solution that can be easily derived using calculus. Skipping the derivation here, we simply state the result: The line of minimui-n error is y = ax + b, where

Y,) and = E, a
Now, here’s a kind of issue that these formulas weren’t designed to cover. Ofien we have data that looks something like the picture in Figure 6.7. In this case, we’d like to make a statement like: "The points lie roughly on a sequence of two lines." How could we formalize this concept? Essentially, any single line through the points in the figure would have a terrible error; but if we use two lines, we could achieve quite a small error. So we could try formulating a new problem as follows: Rather than seek a single line of best fit, we are allowed to pass an arbitrary set of lines through the points, and we seek a set of lines that minimizes the error. But this fails as a good problem formulation, because it has a trivial solution: if we’re allowed to fit the points with an arbitrarily large set of lines, we could fit the points perfectly by having a different line pass through each pair of consecutive points inP. At the other extreme, we could try "hard-coding" the number two into the problem; we could seek the best fit using at most two lines. But this too misses a crucial feature of our intuition: We didn’t start out with a preconceived idea that the points lay approximately on two lines; we concluded that from looking at the picture. For example, most people would say that the points in Figure 6.8 lie approximately on three lines.

Thus, intuitively, we need a problem formulation that reqt~es us to fit the points we]], using as few lines as possible. We now formulate a problem-the Segmented Least Squares Problem--that captures these issues quite cleanly. The problem is a fundamental instance of an issue in data mining and statistics known as change detection: Given a sequence of data points, we want to identify a few points in the sequence at which a discrete change occurs (in this case, a change from one linear approximation to another).

Formulating the Problem As in the discussion above, we are given a set of (xn, Yn)], with x~ < x2 < .-. < xn. We will use Pi to denote the point (xi, y~). We must first partition P into some number of segments. Each segment is a subset of P that represents a contiguous set PJ-~" Pi} for some :indices i < ]. Then, for each segment S in our partition of P, we compute the line minimizing the error with respect to the points in S, according to the formulas above. The penalty of a partition is defined to be a sum of the following terms. (i) The number of segments into which we partition P, times a fixed, given multiplier C > 0. For each segment, the error value of the optim,al line through that segment. Our goal in the Segmented Least Squares Problem is to find a partition of minimum penalty. This minimization captures the trade-offs we discussed earlier. We are allowed to consider partitions into any number of segments; as we increase the number of segments, we reduce the penalty terms in part (ii) of the definition, but we increase the term in part (i). (The multiplier C is provided

264

Chapter 6 Dynamic Programming

6.3 Segmented Least Squares: Multi-way Choices

265

with the input, and by tuning C, we can penalize the use of additional lines to a greater or lesser extent.) There are exponentially many possible partitions of P, and initially it is not clear that we should be able to find the optimal one efficiently. We now show how to use dynamic programming to find a partition of minimum penalty in time polynomial in n.

Suppose we let OPT(i) denote the optimum solution for the points Pi, and we let ei,j denote the minimum error of any line with repj. (We will write OPT(0) = 0 as a boundary case.) Then our observation above says the following. (6.6) If the last segment of the optimal partition is Pi ..... of the optimal solution is OPT(n) = ei,n + C + OPT(i -- 1). Pn, then the value

~ Designing the Algorithm To begin with, we should recall the ingredients we need for a dynamic programming algorithm, as outlined at the end of Section 6.2.We want a polynomial number of subproblems, the solutions of which should yield a Solution to the original problem; and we should be able to build up solutions to these subprob1eros using a recurrence. As with the Weighted Interval Scheduling Problem, it helps to think about some simple properties of the optimal sohition. Note, however, that there is not really a direct analogy to weighted interval scheduling: there we were looking for a subset of n obiects, whereas here we are seeking to partition n obiects. For segmented least squares, the following observation is very usefi.d: The last point Pn belongs to a single segment in the optimal partition, and that segment begins at some earlier point Pi- This is the type of observation that can suggest the right set of subproblems: if we knew the identity of the Pn (see Figure 6.9), then we could remove those points from consideration and recursively solve the problem on the remaining points
Pi-l"

Using the same observation for the subproblem consisting of the points p], we see that to get OPT(]) we should find the best way to produce a p]--paying the error plus an additive C for this segment-together with an optimal solution OPT(i -- 1) for the remaining points. In other words, we have iustified the following recurrence. p~, OPT(]) = min(ei ~ + C + OPT(i -- 1)), p1 is used in an optimum solution for the subproblem if and only if the minimum is obtained using index i. The hard part in designing the algorithm is now behind us. From here, we simply build up the solutions OPT(i) in order of increasing i.
Segmented-Least-Squares (n) Array M[O... n] Set M[0]---- 0 For all pairs i<j End/or n Use the recurrence (6.7) to compute P/!~] End/or Return M[n] P]

I(

OPT(i- 1)
0 0 0 0 0

i
% .o._0-o-o-°-

00000 O0

Pn, and then Pi-1.

By analogy with the arguments for weighted interval scheduling, the correctness of this algorithm can be proved directly by induction, with (6.7) providing the induction step. And as in our algorithm for weighted interval scheduling, we can trace back through the array M to compute an optimum partition.

266

Chapter 6 Dynamic Programming

6.4 Subset Sums and Knapsacks: Adding a Variable ,

267

Find-Segment s If j = 0 then Output nothing Else Find an i that minimizes eij+C+M[i-1] Output the segment {Pi .....p]} and the result of Find-Segments (i - I) Endif

we wil! see, this is done by adding a new variable to the recurrence underlying the dynamic program. ~ The Problem In the scheduling problem we consider here, we have a single machine that can process iobs, and we have a set of requests {1, 2 ..... n}. We are only able to use this resource for the period between time 0 and time W, for some number W. Each iequest corresponds to a iob that requires time w~ to process. If our goal is to process jobs so as to keep the machine as busy as possible up to the "cut-off" W, which iobs should we choose? More formally, we are given n items {1 ..... n}, and each has a given nonnegative weight wi (for i = 1 ..... n). We are also given a bound W. We would like to select a subset S of the items so that ~i~s wi _< W and, subject to this restriction, ~i~s voi is as large as possible. We will call this the Subset Sum Problem.
This problem is a natural special case of a more general problem called the Knapsack Problem, where each request i has both a value vg and a weight w~. The goal in this more general problem is to select a subset of maximum total value, subiect to the restriction that its total weight not exceed W. Knapsack problems often show up as subproblems in other, more complex problems. The name knapsack refers to the problem of filling a knapsack of capacity W as fl~ as possible (or packing in as much value as possible), using a subset of the items {1 ..... n}. We will use weight or time when referring to the quantities tv~ and W.

~ Analyzing the Algorithm
Final!y, we consider the running time of Segmented-Least-Squares. First we need to compute the values of all the least-squares errors ei,j. To perform a simple accounting of the running time for this, we note that there are O(nz) pairs (f, ]) for which this computation is needed; and for each pair (f, ]); we can use the formula given at the beginning of this section to compute ei,j in O(n) time. Thus the total running time to compute all e~,j values is O(n3). ’ Following this, the algorithm has n iterations, for values ] --- I ..... n. For each value of], we have to determine the minimum in the recurrence (6.7) to fill in the array entry M[j]; this takes time O(n) for each], for a total Of O(nZ). Thus the running time is O(n~) once all the el,~ values have been determinedJ

6.4 Subset Sums and Knapsacks: Adding a Variable
We’re seeing more and more that issues in scheduling provide a rich source of practically motivated algorithmic problems. So far we’ve considered problems in which requests are specified by a given interval of time on a resource, as well as problems in which requests have a duration and a deadline ,but do not mandate a particular interval during which they need to be done. In this section, we consider a version of t_he second type of problem, with durations and deadlines, which is difficult to solve directly using the techniques we’ve seen so far. We will use dynamic programming to solve the problem, but with a twist: the "obvious" set of subproblems wi~ turn out not to be enough, and so we end up creating a richer collection of subproblems. As
I In this analysis, the running time is dominated by the O(n3) needed to compute all ei,] values. But, in fact, it is possible to compute all these values in O(n2) time, which brings the running time of the ful! algorithm down to O(n2). The idea, whose details we will leave as an exercise for the reader, is to first compute eid for all pairs (1,13 where ~ - i = 1, then for all pairs where j - i = 2, then j - i = 3, and so forth. This way, when we get to a particular eij value, we can use the ingredients of the calculation for ei.i-~ to determine ei.i in constant time.

Since this resembles other scheduling problems we’ve seen before, it’s natural to ask whether a greedy algorithm can find the optimal solution. It appears that the answer is no--at least, no efficient greedy role is known that always constructs an optimal solution. One natura! greedy approach to try would be to sort the items by decreasing weight--or at least to do this for al! items of weight at most W--and then start selecting items in this order as !ong as the total weight remains below W. But if W is a multiple of 2, and we have three items with weights {W/2 + 1, W/2, W/2}, then we see that this greedy algorithm will not produce the optimal solution. Alternately, we could sort by increasing weight and then do the same thing; but this fails on inputs like
{1, W/2, W/21.

The goa! of this section is to show how to use dynamic programming to solve this problem. Recall the main principles of dynamic programming: We have to come up with a small number of subproblems so that each subproblem can be solved easily from "smaller" subproblems, and the solution to the original problem can be obtained easily once we know the solutions to all

268

Chapter 6 Dynamic Programming

6.4 Subset Sums and Knapsacks: Adding a Variable

269

the subproblems. The tricky issue here lies in figuring out a good set of subproblems.

/J Designing the Algorithm
A False Start One general strategy, which worked for us in the case of Weighted Interval Scheduling, is to consider subproblems involving only the first i requests. We start by trying this strategy here. We use the notation OPT(i), analogously to the notation used before, to denote the best possible i}. The key to our method for the Weighted Interval Scheduling Problem was to concentrate on an optimal solution CO to our problem and consider two cases, depending on whether or not the last request n is accepted or rejected by this optimum solution. Just as in that case, we have the first part, which follows immediately from the definition of OPT(i).

value for the remaining available weight w. Assume that W is an integer, and all requests i = 1 ..... n have integer weights wi. We will have a subproblem for each i = O, 1 ..... n and each integer 0 < w < W. We will use OPT(i, IU) tO denote the value of the optimal solution using a subset of the items {1 ..... with maximum allowed weight w, that is,

i} that satisfy ~jss wj <_ w. Using this new set of subproblems, we will be able to express the value OPT(i, w) as a simple expression in terms of values from smaller problems. Moreover, OPT(n, W) is the quantity we’re looking for in the end. As before, let 0 denote an optimum solution for the original problem. o If n CO, then OPT(n, W) --- OPT(n -- 1, W), since we can simply ignore item n. If n ~ CO, then OPT(n, W) = Wn ÷ OPT(n -- 1, W - wn), since we now seek to use the remaining capacity of W - wn in an optimal way across items n-1.
When the nth item is too big, that is, W < wn, then we must have OPT(n, W) = OPT(n -- 1, W). Otherwise, we get the optimum solution allowing all n requests by taking the better of these two options. Using the same line of argument for the subproblem for items {1 ..... i}, and maximum allowed weight w, gives us the following recurrence. (6.8) If w < wi then OPT(i, W) = OPT(i -- 1, w). Otherwise
OPT(i, W) = max(oPT(i" 1, Iv), wi + OPT(i -- 1, U~ -- lVi)).

o If n CO, then OPT(n) = OPT(n -- t). Next we have to consider the case in which n ~ CO. What we’d like here is a simple recursion, which tells us the best possible value we can get for solutions that contain the last request n. For Weighted Interval Scheduling this was easy, as we could simply delete each request that conflicted with request n. In the current problem, this is not so simple. Accepting request n does not immediately imply that we have to reject any other request. Instead, n - 1} that we will accept, we have less available weight left: a weight of wn is used on the accepted request n, and we only have W - wn weight left for the set S of remaining requests that we accept. See Figure 6.10. A Better Solution This suggests that we need more subproblems: To find out the value for OPT(n) we not only need the value of OPT(n -- 1), but we also need to know the best solution we can get using a subset of the first n- 1 items and total allowed weight W - wn. We are therefore going to use many more i} of the items, and each possible

As before, we want to design an algorithm that builds up a table of all OPT(i, w) values while computing each of them at most once.
Subset-Sum(n, W)

Array M[0... n, 0... W]
W For i=1,2 .....
,I

n W

For w=0 .....

Use the recurrence (6.8) to compute M[i, w] End/or End/or
Figure 6.10 After item n is included in the solution, a weight of ran is used up and there is W - tun available weight left.
Return M[/Z, W]

270

Chapter 6 Dynamic Programming

6.4 Subset Sums and Knapsacks: Adding a Variable

271

0 0 0 0 0 0 0 0 0

I
,.....--

Knapsack size W - 6, items w1 = 2, w2 = 2, w3 = 3 3 2 1 0 0 0 0 0 0 0 0 0 123456 Initial values 3 2 0 0 2 2 2 2 2 0 0 0 0 o 0 0123456 Filling in values for i = 1

2 0 1 0 0 0 0 o 0 0 0 0 00 0 0 0 0 0 0

0 0 2 2 4 4 4 0 0 2 2 2 2 2 0 0 0 0 0 o 0 0123456 Filling in values for i = 2

0 0 1 0 0 0

0 2 3 o 2 2 0 2 2 0 0 0 0123456

4 4 2 0

5 4 2 0

5 4 2 0

Figure 6.11 The two-dimensional table of OPT values. The leftmost column and bottom row is always 0. The entry for OPT(i, w) is computed from the two other entries OPT(i -- I, w) and OPT(i -- 1, w -- wi), as indicated by the arrows.

Filling in values for i = 3

Figure 6.12 The iterations of the algorithm on a sample instance of the Subset Sum Problem.

Using (6.8) one can immediately prove by induction that the returned n and available weight W.

(6.9) The Subset-Sum(n, W) Algorithm correctly computes the optimal value of the problem, and runs in O(nW) time.
Note that this method is not as efficient as our dynamic program for the Weighted Interval Scheduling Problem. Indeed, its running time is not a polynomial function of n; rather, it is a polynomial function of n and W, the largest integer involved in defining the problem. We cal! such algorithms pseudo-polynomial. Pseudo-polynomial algorithms can be reasonably efficient when the numbers {uai} involved in the input are reasonably small; however, they become less practical as these numbers grow large. To recover an optimal set S of items, we can trace back through the array M by a procedure similar to those we developed in the previous sections.

~ Analyzing the Algorithm
Recall the tabular picture we considered in Figure 6.5, associated with weighted interval scheduling, where we also showed the way in which the array M for that algorithm was iteratively filled in. For the algorithm we’ve iust designed, we can use a similar representation, but we need a twodimensional table, reflecting the two-dimensional array of subproblems that, is being built up. Figure 6.11 shows the building up of subproblems in this case: the value M[i, w] is computed from the two other values M[i - 1, w] and M[i - 1, u? -- wi]. As an example of this algorithm executing, consider an instance with weight limit W = 6, and n = 3 items of sizes w1 = w2 = 2 and w3 = 3. We find that the optimal value OPT(3, 6) = 5 (which we get by using the third item and one of the first two items). Figure 6.12 illustrates the way the algorithm fills in the two-dimensional table of OPT values row by row. Next we w/l! worry about the running time of this algorithm. As before .in the case of weighted interval scheduling, we are building up a table of solutions M, and we compute each of the values M[i, w] in O(1) time using the previous values. Thus the running time is proportional to the number of entries in the table.

(6.10) Given a table M of the optimal values of thesubproblems, the optimal set S can be found in O(n) time.

Extension: The Knapsack Problem
The Knapsack Problem is a bit more complex than the scheduling problem we discussed earlier. Consider a situation in which each item i has a normegative weight wi as before, and also a distinct value vi. Our goal is now to find a

272

Chapter 6 Dynamic Programming

6.5 RNA Secondary Structure: Dynamic Programming over Intervals

273

subset S of maximum value ~],i~s vi, subject to the restriction that the total weight of the set should not exceed W: ~i~s wi <- W. It is not hard to extend our dynamic programming algorithm to this more general problem. We use the analogous set of subproblems, OPT(i, I/Y), to denote the value of the optimal solution using a subset of the items {1 ..... i} and maximum available weight w. We consider an optimal solution (9, and identify two cases depending on whether or not n E (9. o If n ~ O, then OPT(n, W) = OPT(n -- !, W). o If n E O, then OPT(n, W) = vn q- OPT(n -- 1, W - wn). Using this line of argument for the subproblems implies the following analogue

G
U A U~ ~A

C
A U G~ C

of {6.8}.
(6.11) If tv < wi then OPT(i, W) = OPT(i -- 1, W). Otherwise OPT(i, W) = max(oPT(i -- 1, w), vi + OPT(i -- 1, w -- wi)).

~C A Figure 6.13 An RNA secondary structure. Tinck lines connect adjacent elements of the sequence; thin lines tndlcate parrs of elements that are matched.

Using this recurrence, we can write down a completely analogous dynamic programming algorithm, and this implies the following fact. (6.1:1) The Knapsack Problem can be solved in O(nW) time.

~J The Problem

6.5 RNA Secondary Structure: Dynamic Programming over Intervals
In the Knapsack Problem, we were able to formulate a dynamic programming algorithm by adding a new variable. A different but very common way by which one ends up adding a variable to a dynamic program is through the following scenario. We start by thinking about the set of subproblems on {1, 2 ..... j}, for all choices of j, and find ourselves unable to come up with a natural recurrence. We then look at the larger set of subproblems on {i, i + 1 ..... j} for all choices of i and j (where i <_ j), and find a natural recurrence relation on these subproblems. In this way, we have added the second variable i; the effect is to consider a subproblem for every contiguous interval in {1, 2 ..... n}. There are a few canonical problems that fit this profile; those of you who have studied parsing algorithms for context-free grammars have probably seen at least one dynamic programming algorithm in this style. Here we focus on the problem of RNA secondary structure prediction, a fundamental issue in computational biology.

As one learns in introductory biology classes, Watson and Crick posited that double-stranded DNA is "zipped" together by complementary base-pairing. Each strand of DNA can be viewed as a string of bases, where each base is drawn from the set {A, C, G, T}.2 The bases A and T pair with each other, and the bases C and G pair with each other; it is these A-T and C-G pairings that hold the two strands together. Now, single-stranded RNA molecules are key components in many of the processes that go on inside a cell, and they follow more or less the same structural principles. However, unlike double-stranded DNA, there’s no "second strand" for the RNA to stick to; so it tends to loop back and form base pairs with itself, resulting in interesting shapes like the one depicted in Figure 6.13. The set of pairs (and resulting shape) forme~ by the RNA molecule through this process is called the secondary structure, and understanding the secondary structure is essential for understanding the behavior of the molecule.

Adenine, cytosine, guanine, and thymine, the four basic units of DNA.

274

Chapter 6 Dynamic Programming

6.5 RNA Secondary Structure: Dynamic Programming over Intervals

275

For our purposes, a single-stranded RNA molecule can be viewed as a sequence of n symbols (bases) drawn from the alphabet [A, C, G, U}.3 Let B = bib2.. ¯ bn be a single-stranded RNA molecule, where each bi ~ [A, C, G, U). To a first approximation, one can model its secondary structure as follows. As usual, we require that A pairs with U, and C pairs with G; we also require that each base can pair with at most one other base--in other words, the set of base pairs forms a matching. It also turns out that secondary structures are (again, to a first approximation) "knot-flee," which we will formalize as a kind of noncrossing condition below. Thus, concretely, we say that a secondary structure on B is a set of pairs n}, that satisfies the following conditions. (i) (No sharp tams.) The ends of each pair in S are separated by at least four intervening bases; that is, if (i,j) ~ S, then i <j - 4. (ii) The elements of any pair in S consist of either {A, U} or {C, G} (in either order). (iii) S is a matching: no base appears in more than one pair. (iv) (The noncrossing condition.) If (i, j) and (k, g) are two pairs in S, then we cannot have i < k <j < g. (See Figure 6.14 for an illustration.) Note that the RNA secondary structure in Figure 6.13 satisfies properties (i) through (iv). From a structural point of view, condition (i) arises simply because the RNA molecule cannot bend too sharply; and conditions (ii) and (iii) are the fundamental Watson-Crick rules of base-pairing. Condition (iv) is the striking one, since it’s not obvious why it should hold in nature. But while there are .sporadic exceptions to it in real molecules (via so-called pseudoknotting), it does turn out to be a good approximation to the spatial constraints . Now, out of all the secondary structures that are possible for- a single RNA molecule, which are the ones that are likely to arise under physiological conditions? The usual hypothesis is that a single-stranded RNA molecule wfl! form the secondary structure with the optimum total flee energy. The correct model for the free energy of a secondary structure is a subject of much debate; but a first approximation here is to assume that the flee energy of a secondary structure is proportional simply to the number of base pairs that it contains. Thus, having said all this, we can state the basic RNA secondary structure prediction problem very simply: We want an efficient algorithm that takes

G U A G U A A C CAUGAUGGCCAUGU

(b) Figure 6.14 Two views of an RNA secondaI3, structure. In the second view, (b), the string has been "stretched" lengthwise, and edges connecting matched pairs appear as noncrossing "bubbles" over the string.

a single-stranded RNA molecule B = bib2 ¯ ¯ ¯ bn and determines a secondary structure S with the maximum possible number of base pairs.

~ Designing and Analyzing the Algorithm
A First Attempt at Dynamic Programming The natural first attempt to apply dynamic programming would presumably be based on the following subproblems: We say that OPT(j) is the maximum number of base pairs in a secondary structure on bib2¯ .. bj. By the no-sharp-turns condition above, we know that OPT(j) = 0 for j < 5; and we know that OPT(n) is the solution we’re looking for. The trouble comes when we try writing down a recurrence that expresses OPT(j) in terms of the solutions to smaller subproblems. We can get partway there: in the optimal secondary structure on bib2¯ ¯ ¯ bj, it’s the case that either o j is not involved in a pair; or e j pa~s with t for some t < j - 4.
In the first case, we just need to consult our solution for OPT(j -- 1). The second case is depicted in Figure 6.15(a); because of the noncrossing condition, we now know that no pair can have one end between 1 and t- 1 and the other end between t 4- 1 and j - 1. We’ve therefore effectively isolated two new subproblems: one on the bases blb2 . .. bt_l, and the other on the bases bt+~¯ ¯ ¯ bj_~. The first is solved by OPT(t -- 1), but the second is not on our list of subproblems, because it does not begin with b~.

3 Note that the symbol T from the alphabet of DNA has been replaced by a U, but this is not important for us here.

276 Chapter 6 Dynamic Programming 6. Let OPT(i. The no-sharp-turns condition lets us initialize OPT(i. and (b) two variables. Dynamic Programming over Intervals Once we make this decision. We have therefore justified the following recurrence. In the first case. j) = 0 whenever i >_ j .) = max(OPT(i. we have OPT(i.] pairs with i < j . j) denote the maximum number of base pairs in a secondary structure on bib~÷~ . we consider the input ACCGGUAGU.15(b). we only show entries corresponding to [i. j -. It is easy to bound the running time: there are O(n2) subproblems to solve. we recur on the two subproblems OPT(i.14. ]) results in~ wo independent subproblems. we need to consider subproblems on bibi+~. and one for the right endpoint.. n) ~6:13) 0PT(il.1). since these are the only ones that can possibly be nonzero. we have the same alternatives as before: o j is not involved in a pair.. The form of (6. Initialize OPT(i. we need two dimensions to depict ’the array M: one for the left endpoint of the interval being considered." max(! + OPT(i~t" + OPT(t +1. or o j pairs with t for some t < j . (For notational convenience.4. Thus the running time is O(na). ) ]-1 j (a) i t-1 t t+ 1 j-1 1 Co) Figure 6. for which k =. where the max is taken over t such that bt and bi are an allowable base pair (under conditions (i) and (iO from the definition of a secondary structure) ~ Now we just have to make sure we understand the proper order in which to build up the solutions to the subproblems.i is smaller. . 4 o 0 0 0 3 0 0 1 1 2 0 0 1 1 i=1 1 1 1 j=6 7 8 9 Filling in the values fork = 7 4 0 0 0 3 0 0 1 2 0 0 1 i=1 1 1 1 j=6 7 8 9 0 1 1 2 Filling in the values fork = 8 Figure 6. as argued above...1). its value is 0. j) even when i > j.1) and OPT(t + 1. ¯ bj for a!l choices of i <’j.. in the optimal secondary structure on b~bi+~¯ ¯ ¯ bj. in other words. .4. In the second case.j -. in this case. In the figure. we will also allow ourselves to refer to OPT(i.]) using the recurrence in (6. and evaluating the recurrence in (6.13) reveals that we’re always invoking the solution to subproblems on shorter intervals: those As an example of this algorithm executing. a subsequence of the sequence in Figure 6.4.5 RNA Secondary Structure: Dynamic Programming over Intervals 277 I 12 RNA sequence ACCGGUAGU 4 0 0 0 3 0 0 2 0 i=l j=6 7 8 9 Initial values Filling in the values for k = 5 0 0 0 0 0 0 1 0 1 4 0 0 0 0 3 0 0 1 1 2 0 0 1 i=1 1 1 j= 6 7 8 9 Filling in the values for k = 6 Itncluding the pair (t.15 Schematic views of the dynamic programming recurrence using (a) one variable. As with the Knapsack Problem. ¯ bi. Thus things will work without any trouble if we build up the solutions in order of increasing interval length.16 The iterations of the algorithm on a sample instance of the RNA Secondary Structure Prediction Problem.13) End/or End/or l~etur~ OPT(I.13) takes time O(n) for each. t -. the noncrossing condition has isolated these two subproblems from each other. This is the insight that makes us realize we need to add a variable: We need to be able to work with subproblems that do not begin with bl.j) = 0 whenever i _>j -. depicted in Figure 6.) Now.4 n-I n-k Set j=i+k Compute OPT(i. our previous reasoning leads straight to a successfu! recurrence. j) = OPT(i.

Suppose further that we’ve determined that a certain substring in the DNA of X codes for a certain kind of toxin. our lining up is not perfect in that an e is lined up with an a. Which is better: one gap and one mismatch. if we discover a very "similar" substring in the DNA of Y. This use of computation to guide decisions about biological experiments is one of the hallmarks of the field of computational biology. each of which serves conceptually as a one-dimensional chemical storage device. The string of symbols encodes the instructions for building protein molecules.6 Sequence Alignment For the remainder of this chapter. ocurrance--it will come back and ask. there are many possible ways to line up the two words. Why is similarity important in this picture? To a first approximation. 6. it does not obscure reality very much to think of it as an enormous linear tape. Indeed. To put it another way. we’d like to say that ocurrance and occurrence are similar because we can make the two words identical if we add a c to the first word and change the a to an e. the two molecular biologists Needleman and Wunsch proposed a definition of similarity. a cell can construct proteins that in turn control its metabolism. basically unchanged. a fundamental problem that arises in comparing strings. it would be natural to search the dictionary for the word most "similar" to the one you typed in. for example. we turn to the problem of computing shortest paths in graphs when edges have costs that may be negative. Moreover. or three gaps and no mismatches? This discussion has been made easier because we know roughly what the correspondence ought to !ook like. which are closely related evolutionarlly. while typing badly spelled words into our online dictionary: How should we define the notion of similarity between two strings? In the early 1970s. before performing any experiments at all. C. So suppose we have two strains of bacteria. Of course. we can nearly line up the two words letter by letter: Dictionary interfaces and spell-checkers are not the most computationally intensive application for this type of problem. we might be able to hypothesize. containing a string over the alphabet {A. "Perhaps you mean occurrence?" How does it do this? Did it truly know what you had in mind? Let’s defer the second question to a different book and think a little about the first one. G. Since neither of these changes seems so large. And many online dictionaries offer functions-that you can’t get from a printed one: if you’re looking for a definition and type inca word it doesn’t contain--say. In fact. Strings arise very naturally in biology: an organism’s genome--its ful! set of genetic material--is divided up into giant linear DNA molecules known as chromosomes. T}. has become The hyphen (-) indicates a gap where we had to add a letter to the second word to get it to line up with the first. When the two strings don’t look like English words--for example. we have to answer the question: How should we define similarity between two words or strings? Intuitively. determining similarities among strings is one of the central computational problems facing molecular biologists today. we can recover the secondary structure itself (not just its value) by recording how the minima in (6. abbbaabbbbaab and ababaaabbbbbab--it may take a little work to decide whether they can be lined up nicely or not: abbbaa--bbbbaab ababaaabbbbba-b ~ The Problem Dictionaries on the Web seem to get more and more useful: often it seems easier to pull up a bookmarked online dictionary than to get a physical dictionary down from the bookshelf. the sequence of symbols in an organism’s genome can be viewed as determining the properties of the organism. we could have written . In the next two sections we discuss seqaence alignment.6 Sequence Alignment 279 6. X and Y. We want a model in which similarity is determined roughly by the number of gaps and mismatches we incur when we line up the two words. which. we conclude that the words are quite similar. Then. that this portion of the DNA in Y codes for a similar kind of toxin.278 Chapter 6 Dynamic Programming As always. using a chemical mechanism for reading portions of the chromosome. Al! this leaves us with the same question we asked initially. Following this. we consider two further dynamic programming algorithms that each have a wide range of applications.13) are achieved and tracing back through the computation. which involves three gaps and no mismatches. To decide what you probably meant. To do this.

we pay aXmy.Consider the sets {1. Its position as a standard was reinforced by its simplicity and intuitive appeal. j) ~ M. 3)}. or (iO the Inth position of X is not matched.15) holds. corresponds to the alignment {(2. an alignment gives a Way of lining up the two strings. n) ~ M or (in. o The cost of M is the sum of its gap and mismatch costs. Suppose we are given two strings X and Y. From our point of Proof. but n > i so the pairs (i. The lower this cost.xi and YlY2""Yj. at least one of the following is true: (i) (In. that we compound it with the following basic fact. there is a parameter ~ > 0 that defines a gap penalty. But this contradicts our definition ofaligninent: we have (i. a lot of work goes into choosing the se~ngs for these parameters. and in particular by the notion of "lining up" two strings. for a given pair of strings X and Y.j) ¢M and (i. One generally assumes that %~ = 0 for each letter p--there is no mismatch cost to line up a letter with another copy of itself--although this wil! not be necessary in anything that follows.. Thus. this fact would be too weak to provide us with a dynamic programming solution. we pay a gap cost of ~ since the Inth position of X is not matched. n) 9~ M..) By itself. Suppose.. Moreover. Our definition of similarity will be based on finding the optiinal alignment between X and Y.~.. o First. Now. o Second.i5) In an optiinal alignment M. for example. for each (i. ]). the paradigm of dynamic programming was independently discovered by biologists some twenty years after mathematicians and computer scientists first articulated it. Thus. where X consists of the sequence of symbols XlX2¯ ¯ ¯ xm and Y consists of the sequence of symbols Y~Y2" " "Yn. and leads directly to the formulation of a recurrence. The quantities ~ and {oOq) are external parameters that must be plugged into software for sequence alignment. or (iii) the n~h position of Y is not matched.14) Let M be any alignment of X and Y. we pay the appropriate mismatch cost o~xiyj for lining up xi with yj. as wel! as through its independent discovery by several other researchers around the same time. (6.i) cross. n) CM. let OPT(i. then j < j’.. and then align XlX2 ¯ ¯ ¯ xm_l as well as possible with YlY2 " ¯ ¯ Yn-1. n) = ~x. (4.. or they aren’t. we wil! take them as given.j) ¢M with i < In. n) ~ M. 2). and we are motivated by the fo!lowing basic dichotomy. by telling us which pairs of positions will be lined up with one another. j) denote the minimum cost of an alignment between xlx2. The process of minimizing this cost is often referred to as sequence aligninent in the biology literature. stop-tops view. and consider a matching of these sets. n -.1).1. for each pair of letters p.14) that exposes three alternative possibilities. q in our alphabet. (i’. and we seek an alignment of minimum cost. If case (i) of (6.6 Sequence Alignment 281 the standard definition in use today. (That is. j’) ~ M and i < i’. (in. We say that a matching M of these two sets is an alignment if there are no "crossing" pairs: if (i.. recall that a matching is a set of ordered pairs with the property that each item occurs in at most one pair. 2 . + OPT(In -. the more similar we declare the strings to be. For each position of X or Y that is not matched in M--it is a gap--we incur a cost of 3. 1). We now turn to the problem of computing this minimum cost. n} as representing the different positions in the strings X and Y. either the last symbols in the two strings are matched to each other. n) ~ M. The definition is motivated by the considerations we discussed above.. according to the following criteria. To go back to our first example. then either the Inth position of X or the nth position of Y is not matched in M. this definition of similarity came with an efficient dynamic programming algorithm to compute it.. there is a Inisinatch cost of %q for lining up p with q. Intuitively. n). One of the approaches we could try for this problem is dynamic programruing. and an optimal alignment that yields it. we get OPT(m.. If (in. indeed. and then we align xlx2 ¯ ¯ ¯ Xm-1 as we!l as . however. and there are numbers i < In andj < n so that (in. (3. in designing an algorithm for sequence alignment.280 Chapter 6 Dynamic Programming 6. [] There is an equivalent way to write (6. (6. notice how these parameters determine which alignment of ocurrance and occurrence we should prefer: the first is strictly better if and only if ~ + O~ae < 33. o In the optimal alignment M. Suppose M is a given alignment between X and Y.. ~ Designing the Algorithm We now have a concrete numerical definition for the similarity between strings X and Y: it is the minimtim cost of an alignment between X and Y. n) and (in. Suppose by way of contradiction that (in. In this way. In} and {1. n) C M. 2 . If case (ii) holds. either (in.

(6.1)] = min[O~x~vj + OPW(i -. if case (iii) holds.1. The last edge on the shortest path to (i. For i=1 . we get OPT(m. and indeed f(i. Moreover.1). j . ~ + opT(i" 1. j] Endfor Endfor Return A[m.16) for OPT(i. we have i =] = 0. 3 + OPT(/-. ~ + f(i . We have maneuvered ourselves into a position where the dynamic programming algorithm has become clear: We build up the values of OPT(i. j). Y) Array A[0.]]=]~ for each ] n For ]=1 . n] Initialize A[i. and we pass from the second to the third using (6. ]) = min [axiyj + OPT(i ’ 1’] ~ 1). ]. or (i.. j . n) = ~ + OPT(m -.16). When i +] = 0.j) is ~xiyj. i) = ig for all i. We number the rows from 0 to m and the columns from 0 to n.1).1) to (i. Now consider arbitrary values of i and j.1. n] x3 x2 x1 Yl Y2 Y3 Y4 Figo_re 6.16) The minimum al~nment costs satisfy the ~olIotving recurrenCe for i > 1 and ] >_ 1: ~ + OPT(i. 0) = OPT(0. We can easily prove this by induction on i +]. since the only way to line up an i-letter word with a 0-letter word is to use i gaps.j) = 0.] " i)]..16).j) = min[axiyj + f(i .]).j . with the rows labeled by symbols in the string X.j .]). There are only O(mn) subproblems.In this way.j) is precisely the recurrence one gets for the minimum-cost path in Gxy from (0. (i . O) to (i.. m.j) using the recurrence in (6...16). ra Use the recurrence (6. we get OPT(m.17. The purpose of this picture now emerges: the recurrence in (6. n) is the value we are seeldng. Using the same argument for the subproblem of finding the minimum-cost alignment between XlX2" ¯ ¯ xi and YlY2" " "Yi’ we get the following fact.1.]) denote the minimum cost of a path from (0.j -. Suppose we build a two-dimensional m x n grid graph Gx~.6 Sequence Alignment 283 possible with YlYa ¯ "" Yn. 0]= i8 for each i Initialize A[O. We put costs on the edges of Gxy: the cost of each horizontal and vertical edge is 3. n) = ~ + OPT(m. the colunms labeled by symbols in Y. As in previous dynamic programming algorithms. 0. 0) to (i...1. and suppose the statement is true for all pairs (i’. We now specify the algorithm to compute the value of the optimal aligr~ment. OPT(i.!. 8 + ~f(i.1).j). we note that OPT(i.1). ]) is in an optimal alignment M for this subproblem if and only . Thus we can show (6..1)] = OPT(i. where we pass from the f~st line to the second using the induction hypothesis. The running time is O(mn). For purposes of initialization. n).!.16).]) in Gxy~ Then for all i.1). n -. ~ Analyzing the Algorithm The correctness of the algorithm follows directly from (6.j’) with i’ +j’ < i +j. we can trace back through the array A.j -.282 Chapter 6 Dynamic Programming 6.16) to compute All..])~ Proof.17) Let f(i.. j). 3 + OPW(i. using the second part of fact (6.17 A graph-based picture of sequence alignment There is an appealing pictorial way in which people think about this sequence alignment algorithm.]) = OPT(i. j). [] . ]) = OPT(i.1.j). ]) is either from (i .. to construct the alignment itself. Similarly. Alignment (X.] . and the cost of the diagonal edge from (i. and at worst we spend constant time on each. and OPT(m. Thus we have f(i. roe have f (i. since the array A has O(mn) entries. and directed edges as in Figure 6...1. (i. we denote the node in the ith row and the jth column by the label (i.

the prospect of performing roughly !0 billion primitive . In other words. however. when we seek the pairs in an optimal alignment. For the purpose of this example. 6. In biological applications of sequence alignment.O]] End£or Move column I of B to column 0 to make room for next iteration: Update B[i. In this section we describe a very clever enhancement of the sequence alignment algorithm that makes it work in O(ran) time using only O(ra + n) space.) Moreover. Space-El f icient -Alignment (X. n). (We’l! call any path in Gxv from (0... 1] will hold the "current" column’s value A[i.1].. For ease of presentation. For an example. 0. 4). the ®(ran) space requirement can potentially be a more severe problem than the ®(ran) time requirement.16) only needs information from the current column of A and the previous column of A.j . the running time is O(ran). one often compares very long strings against one another. the arrow indicates the last step of the shortest path leading to that node--in other words. however.]]) For i=I . by following arrows backward from node (4. however.j) for the problem of aligning the words raean and narae. Thus. For each cel! in the table (representing the corresponding node). costs 1. In either of these ways of formulating the dynamic programming algorithm.18 The OPT values for the problem of aligning the words mean to name. since it was dominated by the cost of storing the array (or the graph . 8+B[i. we’ll describe various steps in terms of paths in the graph Gxr. the recurrence in (6. Thus. it is quite reasonable. OPT(-. the diagonal edges used in a shortest path correspond precisely to the pairs used in a minimum-cost alignment. we assume that 8 = 2. 0) to (ra.. 1]. if we divide the problem into several recursive calls. while matching a vowel and a consonant with each other costs 3.~ Designing the Algorithm We first show that if we only care about the value of the optimal alignment.. These connections to the Shortest-Path Problem in the graph Gxv do not directly yield an improvement in the running time for the sequence alignment problem. I] Initialize B[i..000 symbols each. 8+B[i-l. Building up the two-dimensional ra-by-n array of optimal solutions to subproblems.16). it is easy to get away with linear space. n B[0... The crux of the technique is the observation that. entries of the form B[i. The algorithm itself will be a nice application of divide-and-conquer ideas.j]. we can trace back to construct the alignment. 0) to (ra. 0) to each node (i. The way in which this idea is used. I] for each i Endfor Gxv).1.m. matching a vowel with a different vowel.7 Sequence Alignment in Linear Space via Divide and Conquer 285 Figure 6. Thus we will "collapse" the array A to an rax 2 array B: as the algorithm iterates through values of j. they do help one’s intuition for the problem and have been useful in suggesting algorithms for more complex variations on sequence alignment... 0] will hold the "previous" column’s value A[i. or a consonant with a different consonant. the way that the minimum is achieved in (6. -). Thus the value of the optima! alignment is the length of the shortest path in Gxy from (0. Fortunately. turned out to be equivalent to constructing a graph Gxv with ran nodes laid out in a grid and looking for the cheapest path between opposite corners. respectively.18 shows the value of the shortest path from (0. this is not the end of the story. and in these cases. then the space needed for t_he computation can be reused from one cal! to the next. is fairly subtle. Y) Array B[0.. 0]. we can bring the space requirement down to linear while blowing up the running time by at most an additional constant factor. 1] = min[c~xg~3 + B[i . Suppose. ~ The Problem The question we ask in this section is: Should we be happy with O(ran) as a space bound? If our application is to compare English words. The crucial observation is that to fill in an entry of the array A. while entries of the form B[i. that we are comparing two strings of 100.. and not the alignment itself. n) a corner-to-comer path. ra B[i.7 Sequence Alignment in Linear Space via Divide and Conquer In the previous section. 0]----B[i. for example. we can equivalently ask for the edges in a shortest corner-to-corner path in Gx~. operations might be less cause for worry than the prospect of working with a single 10-gigabyte array. and the space requirement is O(ran) as well. we showed how to compute the optimal alignment between two strings X and Y of lengths ra and n.284 Chapter 6 Dynamic Programming 6. with the natural equivalence back to the sequence alignment problem. Depending on the underlying processor. or even English sentences. because it takes constant time to determine the value in each of the ran cells of the array OPT. I]----]8 (since this corresponds to entry A[0. Figure 6.O]----i8 for each i (just as in column 0 of A) For ]=i .

rt) for i = 0. k) + g(q. 3 . 0) to (i. and this proves (6. k). j + 1). k) + g(q. A Backward Formulation of the Dynamic Program Recall that we use f(i. Now consider the index q that achieves the minimum in the right-hand side of this expression.j) has the same value as OPT(i.j) is [(i. (6. k)--and thus by (6. In particular. a solution to this problem--we will be able to recover the alignment itself using O(m + n) space--but it requires a genuinely new idea. The shortest corner-to-corner path must use some node in the kth column of Gx~. Thus the shortest corner-to-corner path using the node (q. if we were to try tracing back to get the path. followed by a minimum-length path from (i. k) has length f(q. ]) + g(i..j). By strict analogy with (6. j) + g(i.j). in fact. (As we showed in the initial sequence alignment algorithm. k). there is a space-efficient version of this backward dynamic programming algorithm. 1 . We could imagine getting around this difficulty by trying to "predict" what the alignment is going to be in the process of running our space-efficient procedure. n). backward starting from (m. consider the corner-to-corner path that consists of a minimum-length path from (0.j) + g(i. except that we build it up in reverse: we start with g(m. 0) to (i. Now fix a value of k ~ {0 .--let’s suppose it is node (p.j) to (m. The problem is: where is the alignment itselff We haven’t left enough information around to be able to run a procedure like Find-Alignment. Similarly. k) + g(p.n}. 0) to (i.* >_ f(q. and since g* is the minimum length of any corner-tocorner path. k).16).18) For i < mandj < n we have g(i.. we’d run out of information after iust these two columns.j). [] (6. m. n). Using this picture. k) + g(q. k) + g(q. Let ~* denote the length of the shortest corner-to-corner path in Gxy. j). the node (q. "rotating" it so that the node (m.7 Sequence Alignment in Linear Space via Divide and Conquer this backward version. j). Moreover. [] This is just the recurrence one obtains by taking the graph GxT. k) has length ~*. k) + g(q. which computes the value of the optimal alignment using ordy O(m ÷ n) space. Then there is a comer-to-comer path of minimum length that passes through the node (q. and a different alignment that currently looks much less attractive could turn out to be the optimal one. j). ]) to (m. k). j) in the graph Gxv. here are two basic facts summarizing some relationships between the functions f and g.19) again. This path has length f(i. Let ~q denote the length of the shortest corner-to-corner path in Gxv that passes through (i.g(i + 1. 0). k). But this promising alignment might run into big problems later on.20) Let k be any number in {0. and using the previous approach. j). It follows that gij = [(i.19) e. k). Proof.j). analogous to Space-Efficient-Alignment.286 Chapter 6 Dynamic Programming 6.20).. j) + g(i.* = f(p. we have the following recurrence for g. It follows that ~* = f(q. n}. k) >_ rain q f(q.j) to (m. and so we have ~ii > f(i. it uses O(mn) time and O(m) space. Since B at the end of the algorithm only contains the last two columns of the original dynamic programming array A. naturally enough. n) is in the lower left corner.j) +g(i. Combining the Forward and Backward Formulations So now we have syrmnetric algorithms which build up the values of the functions f and g..19) The ler~th of the shortest comer-to-comer path in Gxy that passes through (i. any such path must get from (0. we have ~.j) + g(i. There is. we have ~* <~ f(q. and hence that the alignment that passes through this entry is a promising candidate to be the optimal one. (6. First.) Now let’s define g(i. the shortest corner-to-corner path using. The function g provides an equally natural dynamic programming approach to sequence alignment. Proof.j) and then from (i. By (6. Thus its length is at least [(i.. j)]. k). as Backward-Space-EfficientAlignment.. and the answer we want is g(0.. j) to denote the length of the shortest path from (0. The insight is based on employing the divide-and-conquer technique that we’ve seen earlier in the book. We begin with a simple alternative way to implement the basic dynamic programming solution. 287 It is easy to verify that when this algorithm completes. The idea will be to use these two algorithms in concert to find the optimal alignment. n). Clearly. n) in Gxv. ]) = min[c%+~yj+1 + g(i + 1. We will refer to . we could try hypothesizing that a certain entry has a very small value. n) = 0.[(i. On the other hand. we can also work out the full dynamic programming algorithm to build up the values of g. k) + g(q. the array entry B[i. . ] + 1). and let q be an index that minimizes the quantity [(q. [(i. 1] holds the value of OPT(i..j). ~ + g(i. as we compute the values in the jth column Of the (now implicit) array A. and so we have ~0 <.j) to be the length of the shortest path from (i..

First of all. suppose that we were in a case in which rn = n. Thus. Analyzing the Algorithm The previous arguments already establish that the algorithm returns the correct answer and that it uses O(m + n) space. the total space usage is O(m + n). n) < cran + T(q.288 Chapter 6 Dynamic Programming 6. We will assume for simplicity that n is a power of 2. Y[n/2 + 1 : hi) Return P Second recursive call Yl Y2 Y3 ~ First recursive call Figure 6. and have T(n) < 2T(n/2) + cn2. n/2) + g(f.7 Sequence Alignment in Linear Space via Divide and Conquer 289 Using (6. Given this. and in which the split point q were exactly in the middle.q and n/2. P is empty.q. Y) Let m be the number of symbols in X Let n be the number of symbols in Y If m_<2 or ~<_2 then Compute optimal alignment using Alignment (X . we can search for the shortest path recursively in the portion of Gxy between (0. also. the division into subproblems is not necessarily an "even split. and on strings of size m . and conclude via (6.x j. ~/2) Add (q. and we define Y[i :j] analogously. Y[n/2 + 1 : hi) Let q be the index minimizing [(q. n) denote the maximum running time of the algorithm on strings of length m and n. (6. Y[I : n/2]) Divide-and-Conquer-Alignment (X[q + 1 : n]. In running the algorithm. The crucial point is that we apply these recursive calls sequentially and reuse the working space from one call to the next. denotes the substring of X consisting of xixi+l .19. Let T(m. n/2) for each value of i. we could write the function T(. In this (admittedly restrictive) special case. We can then determine the minimum value of f(i. n/2).. we need only verify the following fact. we will proceed as follows. n/2) and (m. This recurrence is more complex than the ones we’ve seen in our earlier applications of divide-and-conquer in Chapter 5. We divide Gxy along its center column and compute the value of f(f. n/2). The two boxed regions indicate the input to the two recursive cells. The key question we have to resolve is whether the running time of this algorithm remains O(rnn). although it can be easily avoided. since no corner-to-corner path can use more than this many edges.20) and our space-efficient algorithms to compute the value of the optimal alignment. n). we maintain a globally accessible list P which will hold nodes on the shortest corner-to-corner path as they are discovered. this assumption makes the discussion much cleaner. If the minimizing index q turns out to be 1. for 1 _< f < j _< rn. n/2) and g(i. n/2) T(m. consider Figure 6. Y) Call Space-Efficient-Alignment (X. and then using partial substitution to fill out the details of this guess. since we only work on one recursive call at a time. .21) The running t~me of Divide-and-Conquer-Alignment on strings length m and n is O(mn). As an example of the first level of recursion. the running time is a function of two variables (m and n) rather than just one." but instead depends on the value q that is found through the earlier work done by the algorithm. The algorithm performs O(mn) work to build up the arrays B and B’. 0) and (i. we get the two subproblems pictured. n) <_ cn. set q = n/2 (since we’re assuming a perfect bisection).20) that there is a shortest corner-to-corner path passing through the node (f. for some constant c. Proof. Specifically.19 The first level of recurrence for the space-efficient Divide-and-ConquerAlignment. we have T(ra. So how should we go about solving such a recurrence? One way is to try guessing the form by considering a special case of the recurrence. n/2) + T(m . using our two space-efficient algorithms. P need only have rn + n entries. Initially. and some choice of index q.. Thus. n/2)+g(q. 2) < cm T(2. Divide-and-Conquer-Alignment (X.) in terms of the single variable n. We also use the following notation: X[i :j]. n/Z) to global list P Divide-and-Conquer-Alignment (X[I : q]. it then runs recursively on strings of size q and n/2. Thus. Y[I : n/2]) Call Backward-Space-Efficient-Alignment (X. n/2) and in the portion between (i.

290 Chapter 6 Dynamic Programming 6. then we can build an s-t path of arbitrarily negative cost: we first use Ps to get to the negative cycle C. we will consider the following two related problems. sell to i3. For example. a negative cycle corresponds to a profitable sequence of transactions that takes us back to our starting point: we buy from i1. and so forth. we focus on the problem of finding shortest paths in a graph. As a consequence.q. In this section and the next two. This is generally called both the Minimum-Cost Path Problem and the Shortest-Path Problem. /J Designing and Analyzing the Algorithm A Few False Starts Let’s begin by recalling Dijkstra’s Algorithm for the Shortest-Path Problem when there are no negative costs. we move back to the fully general recurrence for the problem at hand and guess that T(m. Among the motivations for studying this problem.8 Shortest Paths in a Graph For the final three sections. Motivated by this. First. j) ~ E has an associated weight cij. it has important applications for the design of distributed In terms of our financial motivation above. Assume that each edge (i. buy from i2. if there is a negative cycle C. n) ! cmn + T(q.q)n/2 = cmn + kqn/2 + kmn/2 . as described above. and cq represents the cost of a transaction in which we buy from agent i and then immediately sell to agent j. To start with the base cases m ! 2 and n ! 2. negative costs turn out to be crucial for modeling a number of phenomena with shortest paths. n/2) ! cmn + kqn/2 + k(m . a directed cycle C such that o If the graph has no negative cycles. the nodes may represent agents in a financial setting. in certain crucial ways. 6. to be more flexible and decentralized than Dijkstra’s Algorithm. E) be a directed graph. together with some closely related issues. n) ! kmn for some constant k. ~ The Problem Let G = (V. find a path P from an origin node s to a destination node t with minimum total cost: Ec ijEP should be as small as possible for any s-t path. since it’s something that we solved in our earlier discussion of recurrences at the outset of Chapter 5. we’ll guess that T(m. one can find s-t paths of arbitrarily negative cost (by going around the cycle C many times). n’) with a smaller product.8 Shortest Paths in a Graph 291 This is a useful expression.kqn/2 = (c + k/2)mn. and then we use Pt to get from C to the destination t. That method Figure 6. we will picture here the interpretation in which the weight cq represents a cost for going directly from node f to node j in the graph. here are two that particularly stand out. Second. Here we consider the more complex problem in which we seek shortest paths when costs may be negative. Thus negative cycles in such a network can be viewed as good arbitrage opportunities.20. and another path Pt from the cycle to t. we see that these hold as long as k >_ c/2. Earlier we discussed Diikstra’s Algorithm for finding shortest paths in graphs with positive edge costs. . Now.20 In this graph. Specifically. So when m = n and we get an even split. [] routing algorithms that determine the most efficient path in a communication network. and this completes the. finally arriving back at i1 with a net profit. this recurrence implies T(n) = O(n2). sell to i2. It makes sense to consider the minimum-cost s-t path problem under the assumption that there are no negative cycles. As illustrated by Figure 6. The weights can be used to model a number of different things. n/2) + T(m . then we go around C as many times as we want. decide if G has a negative cycle--that is. n) grows like the product of m and n. In this case. the algorithm that we ’develop for dealing with edges of negative cost turns out. a path Ps from s to the cycle. Specifically. Thus the inductive step will work if we choose k = 2c. proof. and see if we can prove this by induction. assuming T(m’. the running time grows like the square of n. n’) ! km’n’ holds for pairs (m’. Given a graph G with weights. and edges with negative costs would represent transactions that result in profits. we have T(m. a path would represent a succession of transactions.

~) to denote the minimum cost of a v-t path using at most i edges.21(b)). v) = Cvw -]. Proof. computes a shortest path from the origin s to every other node v in the graph. If the constant M is large enough. and the first edge is (v. However. The development of dynamic programming as a general algorithmic technique is often credited to the work of Bellman in the !950’s.22). For if P did repeat a vertex v.1. By (6.292 Chapter 6 Dynamic Programming 6.22) I[ G has no negative cycles. P Figure 6. This idea does not immediately work. V) = OPT(i -. that is. mini~v csi. but then compensates with subsequent edges of negative cost. then OPT(i. our original problem is to compute OPT(n -. since we only add 2M to the cost of P’ while adding 3M to the cost of P. Another natural idea is to first modify the costs cij by adding some large constant M to each. V) using smaller subproblems. Let’s use OPT(i.21 (a) With negative edge costs. Dijkstra’s AlgorithIn can give the wrong answer for the Shortest-Path Problem. then after the change in costs. As suggested by the example in Figure 6. but it will change the identity of the shortest s-t path.)) ~ E. that is.1. the Bellman-Ford Algorithm. v). then all modified costs are nonnegafive. (We could instead design an algorithm whose subproblems correspond to the minimum cost of an s-v path using at most i edges. resulting in a path of no greater cost and fewer edges. and we can use Diikstra’s Algorithm to find the minimum-cost path subiect to costs c’. A key observation underlying Dijkstra’s Algorithm is that the shortest path from s to v is the single-edge path {s.e. (6. The above observation is no longer true if we can have negative edge costs. o If the path P uses at most i . then there is a shortest path firom s to t that is simple (i.v)=min(oPT(i 1. The basic idea is to maintain a set S with the property that the shortest path from s to each node in S is known. s). (6. A Dynamic Programming Approach We will try to use dynamic programruing to solve the problem of finding a shortest path from s to t when there are negative edge costs but no negative cycles. Let v be a node on which this minimum is obtained. v}. We could try an idea that has worked for us so far: subproblem i could be to find a shortest path using only the first i nodes. we get the following dynamic programming algorithm to compute the value OPT(n -. . w). a path that starts on an expensive edge. does not repeat nodes). Thus we can immediately add the node to the set S.1. this is another example of the principle of "multiway choices" that we saw in the algorithm for the Segmented Least Squares Problem. We start with S = {s}--since we know the shortest path from s to s has cost 0 when there are no negative edges--and we add elements greedily to this set S. but it would not be as natural in the context of the routing protocols we discuss later. w). Let’s fix an optimal path P representing OPT(i.) We now need a simple way to express OPT(i. This suggests that the Diikstra-style greedy approach will not work here. we let c~j = c0 + M for each edge (i. however. o If the path P uses i edges.OPT(i -.. The problem here is that changing the costs from c to c’ changes the minimum-cost path.min(OPT(i--l’w)÷Cvw))i Using this recurrence. Since every cycle has nonnegative cost.1.8 Shortest Paths in a Graph 293 (a) Figure 6. The dynamic programming solution we develop will be based on the following crucial observation. and the Bellman-Ford Shortest-Path Algorithm was one of the first applications. For example (as in Figure 6. and hence has at most n . We will see that the most natural approach involves the consideration of many different options. essentially using a greedy algorithm.1 edges. we could remove the portion of P between consecutive visits to v. v} is clearly the shortest to v if there are no negative edge costs: any other path from s to v would have to start on an edge out of s that is at least as expensive as edge (s. can be cheaperthan a path that starts on a cheap edge. S). . but it can be made to work with some effort. v).23) If i > 0 then OPT(i. we consider the minimum-cost edge leaving node s. then OPT(i. this approach fails to find the correct minimum-cost paths with respect to the original costs c. This leads to the following recursive formula. (b) Adding 3 to the cost of each edge wi!l make all edges nonnegative. The path {s. P’ will be cheaper.22 The minimum-cost path P from v to t using at most i edges.. we will discuss a simpler and more efficient solution. if a path P consisting of three edges is only slightly cheaper than another path P’ that has two edges. V) as depicted in Figure 6. the shortest path P from s to t with the fewest number of edges does not repeat any vertex v.21 (a). As our first greedy step.1 edges. Here.22. This would form a more natural parallel with Dijkstra’s Algorithm.

s.24) The Shortest-Path method correctly computes the minimum cost. we will use and update a single value M[v] for each node v. tu~V The correctness of the method follows directly by induction from (6. and so each edge is counted exactly once by this expression... and nv denotes the number of edges leaving v. Thus we have ~u~v n~ = m.23). Here we are dealing with directed graphs.1.8 Shortest Paths in a Graph 295 3 Ca) 012345 t Shortest-Path (G. arising from the M array that needs to be stored. this array has size n2. to d-a-b-e-t. For example. we performed exactly this kind of analysis for other graph algorithms. Given the table M containing the optimal values of the subproblems. the ShortestPath Algorithm constructs the dynamic programming table in (b).n. v] from the algorithm. A common problem with many dynamic programming algorithms is the large space usage.23 (a). V] Define M[O. where the goal is to find a shortest path from each node to t. however. we can quantify our speed-up on graphs with relatively fewer edges. Improving the Memory Requirements We can also significantly improve the memory requirements with only a small change to the implementation. The Shoz~ZesZ2Path method can be implemented in O(mn) time: Proof. to d-a-t. As an example. for the running time.9) from that chapter to bound the expression ~usv nu for undirected graphs. The table M has n2 entries. as we allow the path to use an increasing number of edges. We assumed it could take up to O(n) time to compute this minimum..v nu for the directed case: each edge leaves exactly one of the nodes in V. v]. so this gives a running-time bound of 0 -2 -2 -2 3 3 3 3 3 2 0 0 Io 0 Co) Figure 6. and each entry can take O(n) time to compute.1. and runs fn O(n3) time. we now show how to reduce this to O(n). In Chapter 3. w] + cu~)). the shortest path from node d to t is updated four times.I. t) ~= number of nodes in G Array M[O. S] If we are a little more careful in the analysis of the method above. of course. We still run the algorithm for . this way. the shortest path using at most i edges can be obtained in O(in) time. with entries corresponding to the values M[i. The table in Figure 6.. v] using the recurrence (6. we get a running-time bound of O(mn). Plugging this into our expression Extensions: Some Basic Improvements to the Algorithm An Improved Running-Time Analysis We can actually provide a better running-time analysis for the case in which the graph G does not have too many edges. A directed graph with n nodes can have close to n2 edges. v] according to the recurrence (6. min(M[i -..n-I For u~V in any order Compute A4[i. When we work with a graph for which the number of edges m is significantly less than nz.23 (b) shows the array M. the length of the shortest path from v to t that we have found so far. by tracing back through smaller subproblems. v] for each value i. We can bound the running time as follows. Then it takes time O(nu) to compute the array entry M[i.23 For the directed graph in (a). Consider the computation of the array entry M[i.23) Endfor Enddor Return M[~t -. we’ve already seen in a number of cases earlier in the book that it can be useful to write the runningtime in terms of both m and n..u]=oo for all other u~V For i_--l. Rather than recording M[i. and finally to d-a-b-e-c-t. v] = min(M[i . (6.of an s-t path in any graph that has no negative cycles. as it changes from d-t. Thus a single row in the table corresponds to the shortest path from a particular node to t. consider the graph in Figure 6. but many graphs are much sparser than this. v].294 Chapter 6 Dynamic Programming 6.I. let us use nu to denote this number. it is even easier to work out the value of ~v.. we can improve the running-time bound to O(mn) without significantly changing the algorithm itself. But. and used (3. In a sense.t]=O and M[O. we need only compute this minimum over all nodes w for which v has an edge to w.23). as there are at most n nodes w ~ V we have to consider. since there are n possible nodes w. since there could potentially be an edge between each pair of nodes. we have M[i. We have to compute an entry for every node v and every index 0 < i < n . In the BellmanFord Algorithm as written.1.

~m~r~(cuw + M[tu])). we update its value whenever the distance M[v] is updated.must have negative cost. this path must lead to t.26). For example.27) implies that the pointer graph P will never have a cycle. the path whose length is M[v] after i iterations can have substantially more edges than i. Let Vl. Vl) is the last edge to have been added. Such a stopping signal is a simple consequence of the following observation: If we ever execute a complete iteration i in which no M[v] value changes. and we perform updates in the reverse of the order the edges appear on the path. [] Note that in the more space-efficient version of Bellman-Ford. We define a cost cuw representing the delay on the link (v. since future iterations will begin with exactly the same set of array entries. To help with recovering the shortest paths.9 Shortest Paths and Distance Vector Protocols One important application of the Shortest-Path Problem is for routers in a communication network to determine the most efficient path to a destination.!.9 Shortest Paths and Distance Vector Protocols 296 Chapter 6 Dynamic Programming 297 iterations i = 1. which we know is the shortest-path distance..1.. and there is an edge between v and tv if the two touters are connected by a direct communication link. This does not always happen. then we must have M[v] >_ cvw +M[w]. consider the path we get by following the edges in P. but the role of i will now simply be as a counter. Delays are . we will be able to recover the shortest paths much more easily. In order to do this.1 is reached.27) If the pointer graph P contains a cycle C. we need for all these values to remain the same for a single iteration. to first[v~] = v2. Notice that if first[v] = w at any time. the Shortest-Path Problem with these costs is to determine t_he path with minimum delay from a source node s to a destination t. Proof. The main observation is the following. 2 . Since we are only storing an M array that indexes over the nodes. and the sink t is the only node that has no outgoing edge. and so forth. then (6. At this time we have M[v~] >_ cvi~i+~ + M[vi+l] for all i = 1 . from v to first[v] = v~. the left. we will enhance the code by having each node v maintain the first node (after itself) on its path to the destination t. Since the algorithm terminated... and whose edges are {(v.22) as before to show that we are done after n . v2 .6. this requires only O(n) working memory. The value M[t] = O.28) Suppose G has no negative cycles. this is in fact a shortest path in G from v to t. w). as claimed. then no M[v] value will ever change again. We now observe the following fact. however. the path in P from v to t is a shortest v-t path in G. and for each node v..methpd to recover the solution from a similar space-efficient implementation. the M[vi] values cance!. To maintain first[v].. (6. if the graph is a single path from s to t. consider the vaiues right before this last update.ivi+~ + cvm: a negative cycle... and since M[w] may decrease. For a node v. so we cannot claim a worst-case running-time improvement. Thus it is safe to stop the algorithm. we will denote this first node by first[v]. and after i rounds of updates the value M[v] is no larger than the length of the shortest path from v to t using at most i edges. and we get 0 > ~g-li=l c. In the case of the Sequence Alignment Problem in the previous section. Given (6. first[v])}. and we also have M[vk] > cvkul +M[vl] since we are about to update M[vk] and change first[v~] to Vl. Here. we must have M[v] = cuw + M[w]. 6. and hence the length of the path traced out by the pointer graph is exactly M[v]. Now. Proof. in each iteration. n . (6. We claim that when the algorithm terminates. whenever the value of M[v] is reset to the minimum t~EVrnin(cvro + M[w]). Adding all these inequalities. we set first[v] to the node w that attains this minimum. Since the pointer graph has no cycles. we need a stopping signal in the algorithm--something that tells us it’s safe to terminate before iteration n . Note that it is not enough for a particular M[v] value to remain the same. [] (6. For each node v. Consider a node v and let tv = first[v]. we had to resort to a tricky divide-and-conquer..1 iterations. and assume that (vk. vk be the nodes along the cycle C in the pointer graph. then this cycle. then we get the final shortest-path values in just one iteration.and right-hand sides are equal after the update that sets first[v] equal to w. we can then use (6. Indeed. we perform the update M[v] = min(M[v]. of the algorithm..26) Throughout the algorithm M[v] is the length of some path from v to t.. in order to safely terminate. In other words. Finding the Shortest Paths One issue to be concerned about is whether this space-efficient version of the algorithm saves enough information to recover the shortest paths themselves. and consider the pointer graph P at the termination.. but it would be nice to be able to use this fact opportunisticaBy to speed up the algorithm on instances where it does happen. Now note that if G has no negative cycles. We represent the network using a graph in which the nodes correspond to routers. this equation may turn into an inequality. Now let P denote the directed "pointer graph" whose nodes are V. k .

.. If a node w has not changed its value. Our current implementation of the Bellman-Ford Algorithm can be thought of as a pull-based algorithm. Thus the renters will end up executing an asynchronous version of the algorithm: each time a node w experiences an update to its M[w] value. u needs only obtain the value M[w] from’each neighbor w. While reuters can be made to run a protocol in the background that gathers enough global information to implement such an algorithm. each node v has to contact each neighbor w. assuming only that each time a node becomes active. and compute For all edges (U. and a router with an update to report may sometimes experience a delay before contacting its neighbors. however. some reuters may report updates much more quickly than others. and make a global decision about which node to add next to S. and all nodes v ~ V compute their shortest path to t. If M[w] has not changed. s. if no value changes during an iteration. each node w whose distance value M[w] changes in an iteration informs all its neighbors of the new value in the next iteration.. this allows them to update their values accordingly. We also may terminate the algorithm early. Asynchronous-Shortest-Path(G. it becomes "active" and eventually notifies its neighbors of the new value. More generally. where values are only transmitted when they change. uT) in any order M[u] = min(M[u]. However.u + M[w]) If this changes the value of M[u]. the Bellman-Ford Algorithm discussed in the previous section has just such a "local" property. then there is no need for ~ to get the value again. t) ~= number of nodes in G Array M[V] Initialize M[t]=O and M[u]=oo for all other u ~ V For 1=1 . then to update this value. then end the algorithm End/or Return M[S] based on the information obtained. Here is a concrete description of the push-based implementation. it eventually contacts its neighbors. if the nodes correspond to reuters in a network. at the same time. In each iteration i.298 Chapter 6 Dynamic Programming 6. a faster algorithm in practice. will converge to the correct values of the shortest-path distances to t. If we think about it. and "pull" the new value M[w] from it. u has no way of knowing this fact. and so it must execute the pnll anyway. w) in any order M[u] = min(M[u]. Specifically. then we do not expect everything to run in lockstep like this. The algorithm we have developed here uses a single destination t. and there is no need to "push" it to them again. cw.9 Shortest Paths and Distance Vector Protocols 299 naturally nonnegative.. we are . s. nodes are sent updates of their neighbors’ distance values in rounds. cuw + M[w]) If this changes the value of M[U]. then first[u] = w u becomes active End/or u~ becomes inactive EndWhile One can show that even this version of the algorithm. If we were to watch the behavior of all reuters interleaved. it would look as follows. then the neighbors of w already have the current value. This wastefulness suggests a symmetric push-based implementation. Push-Based-Shortest-Path(G. However. Suppose we let each node v maintain its value M[v]. so one could use Dijkstra’s Algorithm to compute the shortest path. then first[u]=w End/or End/or If no value changed in this iteration. as not all values need to be pushed in each iteration. and each node sends out an update in each iteration in which it has changed. it is often cleaner and more flexible to use algorithms that require only local knowledge of neighboring nodes. Dijkstra’s shortest-path computation requires global knowledge of the network: it needs to maintain a set S of nodes for which shortest paths have been determined. This leads to savings in the running time. We now discuss an improvement to the Bellman-Ford Algorithm that makes it better suited for reuters and. t) n= number of nodes in G Array M[V] Initialize M[t]=0 and M[u]=oo for all other uE V Declare t to be active and all other nodes inactive While there exists an active node Choose an active node u) For all edges (u. n-1 For 1//~ V in any order If M[uT] has been updated in the previous iteration then In this algorithm. with essentially no coordination in the ordering of updates.

Seeing this change. Figure 6. Such an algorithm is referred to as a distance uector protocol. as here. in which each node stores not just the distance and first hop of their path to a destination. w) is deleted (say the link goes down). there has been a shift from distance vector protocols to path vector protocols. they require significantly more storage to keep track of the full paths. let’s compare the problem of finding a negative cycle that can reach a given t with the seemingly more natural problem of finding a negative cycle anywhere in the graph. or a link (v. It turns out that the ideas we’ve seen so far will allow us to find negative cycles that have a path reaching a sink t. If an edge (v. To avoid this problem and related difficulties arising from the limited amount of information available to nodes in the Bellman-Ford Algorithm. How do we decide if a graph contains a negative cycle? How do we actually find a negative cycle in a graph that contains one? The algorithm developed for finding negative cycles will also lead to an improved practical implementation of the Bellman-Ford Algorithm from the previous sections. if so. We now consider the more general case of a graph that may contain negative cycles. if they were relying on a path through v. however. it is natural for node v to react as follows: it should check whether its shortest path to some node t used the edge (v." . In the history of the Internet. at the same time. the designers of network routing schemes have tended to move from distance vector protocols to more expressive path vector protocols. nodes can avoid updating their paths to use edges they know to be deleted. In this context. Before we develop the details of this. in which the original graph has three edges (s. these updates will continue indefinitely--a behavior known as the problem of counting to infinity. Given knowledge of the paths. and these changes can cascade through the network. Thus far we’ve been designing algorithms with the tacit understanding that a program executing the algorithm will be running on a single computer (or a centrally managed set of computers). assuming that it will use its cost-1 edge to s. the distributed Bellman-Ford Algorithm will begin "counting to infiniW. not know that the deletion of (v. /~:~ The Problem There are two natural questions we will consider. followed by the supposed cost-3 path from v to t. and. we effectively use n separate computations. the path vector approach is used in the Border Gateway Protocol (BGP) in the Internet core. s) and (u. this assumption becomes troublesome. in which case the cost c~ effectively increases to oo. processing some specified input. Consider the extremely simple example in Figure 6. since each node maintains a vector of distances to every other node in the network.24.24 When the edge (v. node s will update M[s] = csv +M[v] = 4. regardless of its position related to a sink. Notice that this increase in distance from v can now trigger increases at v’s neighbors. Problems with the Distance Vector Protocol One of the major problems with the distributed implementation of BellmanFord on routers (the protocol we have been discussing above) is that it’s derived from an initial dynamic programming algorithm that assumes edge costs will remain constant during the execution of the algorithm. currently.24 is deleted. we have assumed that the underlying graph has negative edge costs but no negative cycles. one for each destination. followed by the supposed cost-2 path from s to t. To obtain such distances. each of cost 1. t) is deleted. in the case. that the network is truly disconnected. Here’s an indication of what can go wrong with our shortest-path algorithm when this happens. it only knows the shortest-path distances of each of its neighbors to t. Thus it does 6. t) in Figure 6. and so it updates M[v] =Cvs +M[s] = 3. it should increase the distance using other neighbors.6. Once we start thinking about routers in a network. How dbes node v react? Unfortunately. w) may even fail. v).10 Negative Cycles in a Graph So far in our consideration of the Bellman-Ford Algorithm. based on its cost-1 edge to v. it sees that M[s]= 2. t) has eliminated all paths from s to t. (v. w). Instead. it does not have a global map of the network. Nodes s and v will continue updating their distance to t until one of them finds an alternate route. Edge costs may change for all ~sorts of reasons: links can become congested and experience slow-downs. t). Now suppose the edge (v. It turns out that if we ~s The deleted edge causes an unbou----nded~ equence of updates by s and u.10 Negative Cycles in a Graph 301 300 Chapter 6 Dynamic Programming presumably interested in finding distances and shortest paths between all pairs of nodes in a graph. but some representation of the entire path. it’s a rather benign assumption to require that the input not change while the progra_m is actually running.

!. then lim OPT(i. v) If the graph has no negative cycles.1. Then this cycle C clearly has an edge to t in G’. that is. V) for all nodes v and all i > n. a Proof. (Do you see why?) However. V) before concluding that the graph has no negative cycles? For example. for any node w on this cycle C. With the presence of a negative cycle in the graph. w) would have to become arbitrarily negative as i increased. we have the following. a node v may satisfy the equation OPT(n.25. V) = OPW(n. v) = OPT(n -.30) implies that the values OPT(i.52) gives an O(mn) method to decide if G has a negative cycle that can reach t. v). V). V) = OPT(n -. v) from the Bellman-Ford Algorithm. a path P from v to t of cost OPT(n. then OPT(i. develop a solution to the first problem. (6. it turns out that we will be in good shape if this equation holds for all nodes. v) = OPT(i -. V) for al! nodes v and all i >_ n. V): for this node. we begin by adopting the original version of the BeLlman-Ford Algorithm. (6.31) has already proved the forward direction.22) implies following statement. But for how large an i do we have to compute the values OPT(i.302 Chapter 6 Dynamic Programming 6. ~ Designing and Analyzing the Algorithm To get started thinking about the algorithm. v) = OPT(n -. Specifically. v) = OPT(n -. V) 7~ OPT(n -. v). Now suppose G’ has a negative cycle with a path to t. To find a negative cycle. and connect each other node v in the graph to node t via an edge of cost 0.32). V) for all nodes v.1.. Let us call the new "augmented graph" G’.22) no longer applies. we see that none of the values will ever change again. we’l! be able to obtain a solution to the second problem as well.1. Since no edge leaves t in G’. OPT(i.25 The augmented graph. but all these values are the same as the corresponding OPW(n -1. Extending this reasoning to future iterations.10 Negative Cycles in a Graph 303 (Any negative cycle in G wil! be able to reach i.!. and yet still lie on a negative cycle. for any node v on a negative cycle that has a path to t. v) fo~: all nodes v. it follows that this cycle is also a negative cycle of G. We find this minimum-cost path P from v to t by tracing back through the subproblems.29) The augmented graph G’ has a negative cycle C such that there is a path from C to the sink t if and only if the original graph has a negative cycle. and indeed the shortest path may Statement (6. Suppose we start with a graph G. v) for nodes of G and for values of i up to n. as shown in Figure 6. So it is really enough to solve the problem of deciding whether G has a negative cycle that has a path to a given sink node t. By (6. in the following way. We compute values of OPT(i. Statement (6. For the other direction. Since G’ is the same as G asidefrom the node t. . a simple path can only have n.! . defining them for values i >_ n. and we do this now. v).30) If node v can reach node t and is contained in a negative cycle. but we have not actually found the cycle. So far we have determined whether or not the graph has a negative cycle with a path from the cycle to t.1. V) can be computed from OPT(n. we consider a node v such that OPT(n. The values of OPT(n + 1.~ get shorter and shorter as we go around a negative cycle. (6. Proof. we use an argument employed earlier for reasoning about when it’s safe to stop the Bellman-Ford Algorithm early. It follows that we will have OPT(n + 1. (6. there is no negative cycle if and only if there is some value of i < n at which OPT(i.32) There is no negative cyc!e with a path to tif and only if opT(n. this cycle cannot contain t. (6.22). Figure 6. (6. Thus there cannot be a negative cycle C that has a path to t. since all nodes have an edge to t. Assume G has a negative cycle. which was less efficient in its use of space.1. then (6. add a new node t to it. We first extend the definitions of OPT(i.31) If there are no negative cycles in G. As in our proof of (6. V) must use exactly n edges. suppose OPT(n. In fact.

and we are done. We now discuss a method that does not require an O(n) blow-up in the running time. by (6. We know that before the new edge (v. here we will be able to make additional use of the work done. [] (6.1 edges has cost greater than that of the path P.304 Chapter 6 Dynamic Programming 6. Instant negative cycle detection wil! be an analogous early termination rule for graphs that have negative cycles. this does not result in a (negative) cycle. then we could spend as much as O(n) time following the path to t and still not find a cycle. We claim that this cycle C has negative cost. we know that if the pointer graph ever has a cycle. In a graph with n nodes. rv) forms a cycle.27). V) ~ OPT(n -1.1. so it consists of paths from each node v to the sink t. Proof.8. and hence. w) the pointer graph has no cycles. whereas in (a) it does not. and runs in O(rnn) time. By (6. v). does this guarantee that the pointer graph will ever have a cycle? Furthermore.8 we discussed a space-efficient implementation of the Bellman-Ford algorithm for graphs with no negative cycles. and so every path using n . w). (Again.10 Negative Cycles in a Graph 305 edges. This contradicts our assumption that OPT(n. In (b).1." delete the Update to first[v] = w (a) Update to first[v] = w Figure 6. then a path P from v to t of cost OPT(n. However. If C were not a negative cycle. the pointer graph was a directed tree. and hence C must be a negative cycle.26. consider the two sample cases in Figure 6.26. let w be a node that occurs on P more than once. Notice that the current distance value Mix] for all nodes x in the subtree was derived from node v’s old value. v) ~: O~T(n -. Let C be the cycle on P between two consecutive occurrences of node w. We’ll mark each of these nodes x as "dormant. rv) creates a cycle is to consider al! nodes in the subtree directed toward v. as OPT(n. if we trace out the sequence of pointers from v like this. Another way to test whether the addition of (v. In addition to the savings in space. we would like to determine whether a cycle is created in the pointer graph P every time we add a new edge (v. but in (b) it does. this creates a (negative) cycle. v) ~ OPT(n -. we can find the subtree in time proportional to the size of the subtree pointing to v. Consider a new edge (v.cost. then a cycle has been formed. Before we add (v.26 Changing the pointer graph P when firstly] is updated from u to w. If w is in this subtree. ’ Extensions: Improved Shortest Paths and Negative Cycle Detection Algorithms At the end of Section 6. if such a cycle e:fists.27). and hence we know that the distance values of all these nodes will be updated again. But if G has a negative cycle. and C has negative cost. v) contains a cycle C. for example. otherwise it does not. we need to have each node v maintain a list of all other nodes whose selected edges point to v. Consider Figure 6. An additional advantage of such "instant" cycle detection will be that we will not have to wait for n iterations to see that the graph has a negative cycle: We can terminate as soon as a negative cycle is found. The most natural way to check whether adding edge (v. (6.33) If G has n nodes and OPT(n. Here we implement the detection of negative cycles in a comparably space-efficient way. in (a). with first[v] = w. However. how much extra computation time do we need for periodically checking whether P has a cycle? Ideally.34) The algorithm above finds a negative cycle in G. w) was added. the graph has a negative cycle. then the cycle has negative. where in both (a) and (b) the pointer firstly] is being updated from u to tv. then (v. V). that is added to the pointer graph P. then deleting C from P would give us a v-t path with fewer than n edges and no greater cost. w) creates a cycle in P is to follow the current path from tv to the terminal t in time proportional to the length of this path. firstly]) that we used for the space-efficient implementation in Section 6. a path consisting of n edges must repeat a node somewhere. this will also lead to a considerable speedup in practice even for graphs with no negative cycles. Given these pointers. V). We have just updated v’s distance. . w) with firstly] = w. If we encounter v along this path. Earlier we saw that if a graph G has no negative cycles. First observe that the path P must have n edges. so P must contain a cycle C. at most O(n) as before.) To be able to find all nodes in the subtree directed toward v. The implementation will be based on the same pointer graph P derived from the "first edges" (v. the algorithm can be stopped early if in some iteration the shortest path values M[v] remain the same for all nodes v.

measured in miles from its western end). w).. but what is the effect on the worst-case running time? We can spend as much as O(n) extra time marking nodes dormant after every update in distances. then we may not update the distance value M[v]. a heavily traveled stretch of road that runs west-east for M miles. in this case. It terminates immediately if the pointer graph P of first[v] pointers contains a cycle C.36) The improved algorithm outlined above finds a negative cycle in G if such a cycle exists. for a total revenue of 10. 5. E and {rl. as simple paths can have at most n. In fact. which implies that all paths used to update the distance values are simple. xn is really the same as the optimal solution Using this claim. Similarly. w). Note that nodes u where M[v] is the actual shortest-path distance cannot be dormant. so the time spent on marking nodes dormant is at most as much as the time the algorithm spends updating distances. (6.from v to t using at most i edges starts on edge e = (u. if the shortest path . x4}={6. and after i iterations. we showed in (6. where iteration i + 1 processes nodes whose distance has been updated in iteration i. {x1. r4} = {5. this may not be true anymore. as mentioned above. Consider an optimal solution for a given input instance. w) is not actually the shortest path. has at most n iterations. x2. or if there is an iteration in which no update occurs to any distance value M[v]. We summarize the discussion with the following claim about the worst-case performance of the algorithm. 6. the optimal solution on sites xl .26) that after i iterations. each in the interval [0. Regulations imposed by the county’s Highway Department require that no two of the billboards be within less than or equal to 5 miles of each other.. the path through edge (u. and runs in O(mn) time in the worst case. This seems like a problem--however. subject to this restriction.. and not use x for future updates until its distance value changes. If you place a billboard at location xi. Give an algorithm that takes an instance of this problem as input and returns the maximum total revenue that can be obtained from any valid subset of sites. For example. with many nodes dormant in each iteration. the value M[v] is no larger than the value of the shortest path from v to t using at most i edges. first[x]) from the pointer graph. The possible sites for billboards are given by numbers xl. You’d like to place billboards at a subset of the sites so as to maximize your total revenue. we use induction to show that after iteration i the value is the distance on all nodes v where the shortest path from v to t uses at most i edges. the value M[v] is the length of the shortest path for all nodes v where there is a shortest v-t path using at most i edges. The running time of the algorithm should be polynomial in n. If we don’t. The fact that updates in iteration i are caused by paths with at least i edges is easy to show by induction on i. a negative cycle. or even negative-cost edges. 7.I iterations that update values in the array M without finding . 1}..35) Throughout the algorithm M[v] is the length of some simple path from v to t. Example. we either place a billboard at site xn or not. as the value M[u] will be updated in the next iteration for all dormant nodes. Suppose M = 20. in this solution. So instead of the simpler property that held for M [v] in the original versions of the algorithm. Now consider the time the algorithm spends on operations other than marking nodes dormant.. this new version is in practice the fastest implementation of the algorithm even for graphs that do not have negative cycles. Xn. However. Then the optimal solution would be to place billboards at xl and x3. a node can be marked dormant only if a pointer had been defined for it at some point in the past. Recall that the algorithm is divided into iterations. the time spent marking nodes dormant is bounded by the time spent on updates... However. Solved Exercises Solved Exercise 1 Suppose you are managing the construction of billboards on the Stephen Daedalus Memorial Highway. 14}. and there can be at most n . (6. the path has at least i edges if the distance value M[v] is updated in iteration i. you receive a revenue of ri > 0. Proof. x3. The first pointers maintain a tree of paths to t. M] (specifying their position along the highway. and so it stays at a value higher than the length of the path through the edge (v. 12.1 edges. each iteration is implemented in O(m) time. we now have the the following claim. The algorithm uses O(n) space. so M[v] will have a chance to get updated later to an even smaller value. x2 . r2. we can see that the worst-case running time of the algorithm is still bounded by O(mn): Ignoring the time spent on marking nodes dormant. and w is dormant in this iteration. r3. Solution We can naturally apply dynamic programming to this problem if we reason as follows. n = 4.306 Chapter 6 Dynamic Programming Solved Exercises 307 edge (x. For the original version of the algorithm.. This can save a lot of future work in updates. Finally.

.308 Chapter 6 Dynamic Programming Solved Exercises 309 on sites x1 . we define x’i = xi ..) Give a polynomial-time algorithm for this problem... OPT(] -. we let e(j) denote the easternmost site xi that is more than 5 miles from xj. At first you’re not sure how your algorithmic background will be of any help to them.. if we do. for different values of i and j... OPT(] =. but you soon find yourself called upon to help two identical-looking software engineers tackle a perplexing problem. To turn this into an algorithm..1))..1)). Conversely.. our reasoning above justifies the following recurrence. CRU has identified a target sequence A of genetic material. as we saw how to do in Chapter 2.. n: Compute M~] using the recurrence Enddor Return M[n] Solved Exercise 2 Through some Mends of friends. Given the values e(]) for all j. For each site location xi. when we get to the entry x. we define an interval with endpoints [x~ .. then XY denotes the string obtained by concatenating them-writing X followed by Y.. each of length at most n.. 2 .. Initi~ize M[0] = 0 and M[1] = r1 For j=2. We now scan through this merged list.. since each iteration of the loop takes constant time... you end up on a consulting visit to the cutting-edge biotech firm Clones ’R’ Us (CRU). Then....max(r/+ OPT(e(])). Here’s a final observation on this problem. and they want to produce a sequence that is as similar to A as possible. and find an optimal solution on what’s left. given any nonoverlapping set of intervals. Second.. Let’s define some notation to help express this. you may assume that you are given a gap cost 8 and a mismatch cost %q for each pair p. consisting of the first j sites for j = 0.5. x~ in linear time.. where each Bi belongs the set L. Thus the collections of nonoveflapping intervals correspond precisely to the set of valid billboard placements. For a site xj. and there’s a fundamental reason for that. we have a recurrence that lets us build up the solutions to subproblems.. xl . Thus we say that a concatenation over L is any sequence of the form B1B2¯ ¯ ¯ B~.3 . The problem they are currently working on is based on the concatenation of sequences of genetic material. they have a library L consisting of k (shorter) sequences. then we should ehminate xn and all other sites that are within 5 miles of it. given any such set of sites (no two within 5 miles).... We then merge the sorted list x~ . x~_~ are not.. Clearly. xj.. For this purpose. Suppose that for each site xi. xn with the sorted list x~ .. ft. q E g. If we let OPT(j) denote the revenue from the optimal subset of sites among x~ . so B~ and Bj could be the same string in L. (Again. and so we . simply define e(]) to be the largest value of i for which we’ve seen xi in our scan.. with the same consequences. Now. They can cheaply produce any sequence consisting of copies of the strings in L concatenated together (with repetitions allowed). We now have most of the ingredients we need for a dynamic programming algorithm.. x2 .5. We can also compute al! e(]) values in O(r0 time as follows. we have a set of n subproblems. 1. then we have OPT(]) ---. xi] and weight ri. xj: we either include xj in the optimal solution or we don’t. we just need to define an array M that will store the OPT values and throw a loop around the recurrence that builds up the values M[j] in order of increasing j. the solution looks very much fike that of the Weighted Interval Scheduling Problem. repetitions are allowed. the intervals associated with them will be nonoverlapping. consisting of ra symbols. First. If X and Y are each strings over a fixed alphabet g. (For the purpose of computing the sequence alignment cost. Since sites are numbered west to east. we know that anything from this point onward to xj cannot be chosen together with xy (since it’s within 5 miles). this means that the sites xl. and so dropping the set of intervals we’ve just defined (with their weights) into an algorithm for Weighted Interval Scheduling will yield the desired solution... but the sites Xeq)+~ . The same reasoning applies when we’re looking at the problem defined by just the firstj sites. the running time of the algorithm is O(n).) The problem is to find a concatenation over IBm} for which the sequence alignment cost is as small as possible.. xn-1. the corresponding set of sites has the property that no two lie within 5 miles of each other. In fact. As with all the dynamic programming algorithms we’ve seen in this chapter. as follows. our billboard placement problem can be directly encoded as an instance of Weighted Interval Scheduling. an optimal set of billboards can be found by tracing back through the values in array M. given by OPT(]) = max(r/+ OPT(e(])).. xeq) are still valid options once we’ve chosen to place a billboard at xj..

27 In the optimal concatentation of strings to align with A. Having found this optimal alignment for Aa. if there were a way to better align Ae with Be.]) End/or For ]=1. and this takes O(m) time to compute the minimum.mint<] c(t.. Essentially.1). we search over each string in L and find the one that aligns best with A[x :y]. (See Figure 6.) Now.0. and the cost of aligning with what’s left is just OPT(t -. y) denote the cost of the optimal alignment of A[x : y] with any string in L.. and OPT(0) ----.. So let’s set up things to make the search for A~ possible. If we wanted to pursue this analogy. Summing this over all choices of j = 1. inclusive. let t be the first position in A that is matched with some symbol in Be.. (That is.27 for an illustration of this with g = 3. there is a final string (B3 in the figure) that a~gns with a substring of A (A3 In the figure) that e~xtends from some position t to the end. First.j). finding the optimal alignment of A[t :j] with a single string in L.37) OPT(j) -.j). .t))= O(mn). and then iterate on the remaining input points. we can break it bff al~d continue to find the optimal solution for the remainder of A.) Consider an optimal alignment M of A with B. B is an optimal solution to the input instance.. Let B = B1B2 ¯ ¯ ¯ Be denote a concatenation over L that aligns as well as possible with the given string A.37) to compute M~] End/or Return M[n] As usual. and compute the optimal alignment of B with A[t :j] in time O(n(j. (That is. 2 . The argument above says that an optimal solution on A[1 :j] consists of identifying a final "segment boundary" t < j.m. t Let’s consider the running time of this algorithm. and use this alignment cost as c(t. we could substitute it for the portion of M that aligns Ae with Be and obtain a better overall alignment of A with B. Thinking about the problem this way doesn’t tell us exactly how to proceed--we don’t know how long A~ is supposed to be.2 . (6. we could search for a solution as follows. for t < j.. The fl~ algorithm consists of first computing the quantities c(t. Figure 6. This suggests that our subproblems fit together very nicely. The cost of this alignment of A[t :j] is just c(t. following recurrence. we’re in about the same spot we were in with the Segmented Least Squares Problem: there we knew that we had to break off some final subsequence of the input points. and for_this piece all we’re doing is finding the string in L that aligns with it as well as possible..37). we try each string of the k strings B ~ L. and then building up the values OPT(j) in order of increasing j. and it justifies the.j) values is O(kman). we get O(m2) time for this portion of the algorithm. We hold these values in an array M.1) forj > !.310 Chapter 6 Dynamic Programming Solved Exercises 311 Solution This problem is vaguely reminiscent of Segmented Least Squares: we have a long sequence of "data" (the string A) that we want to "fit" with shorter segments (the strings in L). the point is that in this optimal alignment M. First. and let Ae denote the substring of A from position t to the end.) Let OPT(J) denote the alignment cost of the optimal solution on the string All :j]. we can get a concatentation that achieves it by tracing back over the array of OPT values.!]. fit them as well as possible with one line. or which ~tring in L it should be aligned with. n Use the recurrence (6.. There’s some final piece of Aa that is aligned with one of the strings in L. Set M[0] = o For all pairs l_<t_<j_<m Compute the cost c(t. j) that need to be computed.j) + OPT(t -. the substring Ae is optimally aligned with B6 indeed. there are O(m2) values c(t. let A[x : y] denote the substring of A consisting of its symbols from position x to position y.]) as follows: For each string B EL Compute the optimal alignment of B with A[t :]] End/or Choose the B that achieves the best alignment. This tells us that we can look at the optimal solution as follows. Thus the total time to compute all c(t. But this is the kind of thing we can search over in a dynamic programming algorithm. and iterating on All : t . This dominates the time to compute all OPT values: Computing OPT(j) uses the recurrence in (6. Let c(x. For each.

.28. and you add hi to the value if you choose "high-stress" in week L (You add 0 ff you choose "none" in week i. they need a full week of prep time to get ready for the crushing stress level. (Such a plan will be called optimal..312 Chapter 6 Dynamic Programming Exercises 313 Exercises 1. is that in order for the team to take on a high-stress job in week i.) Example. The goal in this question is to solve the following problem: Find an independent set in a path G whose total weight is as large as possible. The value of this plan would be 0 + 50 + 10 + 10 = 70.28 A paths with weights on the nodes. find a plan of maximum value.imagine." "high-stress. we associate a positive integer weight Consider. and the values of ei and h~ are given by the following table. each week. the set of possible jobs is di~aded into those that are low-stress (e. for example. Recall that a subset of the nodes is called an independent set if no two of them are joined by an edge. then "none" has to be chosen for week i . Suppose n = 4. it’s required that they do no job (of either type) in week i . is whether to take on a low-stress job or a high-stress job. Call a graph G = (V. a high-stress job in week 2. (It’s okay to choose a high-stress job in week 1. The weights are the numbers drachm inside the nodes." or "none" for each of the n weeks. The running time should be polynomial in n. Then the plan of maximum value would be to choose "none" in week 1. E) be an undirected graph with n nodes. The maximum weight of an independent set is 14. With each node u~. it’s okay for them to take a lowstress job in week i even if they have done a job (of either type) in week So.heaviest-first" greedy algorithm Start with S equal to the empty set While some node remains in g Pick a node u~ of maximum weight Add u~ to S Delete ui and its neighbors from g Endwhile Return S (c) Give an algorithm that takes an n-node path G with weights and returns an independent set of maximum total weight. and return this one Week 1 Figure 6.) The value of the plan is determined in the natural way: for each i. The. Now.. The catch.) hn. given a sequence of n weeks. if you select a high-stress jobl you get a revenue of h~ > 0 dollars. Let G = (V.1. however. The basic question. (b) Give an example to show that the following algorithm also does not always find an independent set of maximum total weight.1. the five-node path drawn in Figure 6. and each week you have to choose a job for them to undertake. you add ei to the value ff you choose "low-stress" in week i. Week 2 1 50 Week 3 10 5 Week 4 !0 1 10 5 . as you can well. (a) Give an example to show that the following algorithm does not always find an independent set of maximum total weight. Finding large independent sets is difficult in general. or helping a desperate group of Corne~ students finish a project that has something to do with compilers).g. with the property that if "high-stress" is chosen for week i > 1.. a plan is specified by a choice of "low-stress. setting up a Web site for a class at the local elementary school) and those that are high-stress (e. then you get a revenue of ei > 0 dollars. Suppose you’re managing a consulting team of expert computer hackers.g. E) a path flits nodes canbe written as with an edge between u~ and uj ff and only if the numbers i andj differ by exactly 1. independent of the values of the weights. but here we’ll see that it can be done efficiently if the graph is "simple" enough. I£ you select a low-stress job for your team in week i. and lo ~w-stress jobs in weeks 3 and 4. On the other hand. protecting the nation’s most valuable secrets. Let S1 be the set of all u~ where i is an odd number Let S2 be the set of all ui where i is an even number (Note that S1 and S2 are both independent sets) Determine which of SI or S2 has greater total weight.

(Again. While there is an edge out of the node w Choose the edge (w. the length of a path is the number of edges in the path.29 The correct answer for this ordered graph is 3: The longest path from v~ to Vn uses the three edges (v. That is.. every directed edge has the form (vi. you’ll incur an operating cost of Si if you run the business out of SF. ff you run the business out of one city in month i. for every n . ~n and hn and returns the value of an optimal plan. i = 1. For iterations i = I to n If hi+I > ~i + gi+l then Output "Choose no job in week i" Output "Choose a high-stress job in week i+I" Continue with iteration i+ 2 Else Output "Choose a low-stress job in week i" Continue with iteration i+ 1 Endif Figure 6. That is. ~) for which ] is as small as possible Set m = ~ Increase i by 1 end while Return i as the length of the longest path To avoid problems with overflowing array bounds.) However.. say what the correct answer is and also what the algorithm above finds.. Q. we difine hi = ~i = 0 when i > n. We say that G is Let G = (V. Give an efficient algorithm that takes an ordered graph G and returns the length of the longest path that begins at vI and ends at vn. vi) with i < j. US). by giving an example of an ordered graph on which it does not return the correct answer. Your clients are distributed between the East Coast and the West Coast. two associates. by giving an instance on which it does not return the correct answer. The length of a path is the number of edges in it. v4). The goal in this question is to solve the following problem (see Figure 6. (It depends on the distribution of client demands for that month. Given a sequence of n months. and then out of the other city in month i + 1. say what the correct answer is and also what the above algorithm finds.(v2...) Suppose you’re running a lightweight consulting business--just you.. and (U4. plus a moving cost of M for each time you switch cities.1. . then you incur a. an ordered graph if it has the following properties.314 Chapter 6 Dynamic Programming Exercises 315 (a) Show that the following algorithm does not correctly solve Ms problem.. Given an ordered graph G. Set ~u ---. you can either run your business from an office in New York (NY) or from an office in San Francisco (SF). 2 . there is at least one edge of the fo. .UI Set L=O Each month. vi).rm (vi. The plan can begin in either city.29 for an example). The cost of a plan is the sum of the operating costs for each of the n months. vn. node vi. you’ll incur an operating cost of hri ff you run the business out of NY. In your example.. ~) Give an efficient algorithm that takes values for Q. a plan is a sequence of n locations-each one equal to either NY or SF--such that the ith location indicates the city in which you will be based in the ith month. v2). and this leads to the following question. E) be a directed graph with nodes v~ .fixed moving cost of M to switch base offices. In month i. (ii) Each node except Vn has at least one edge leaving it. and some rented equipment. find the length of the longest path that begins at vt and ends at vn... In your example. (i) Each edge goes from a node with a lower index to a node with a higher index. (a) Show that the following algorithm does not correctly solve this problem.

say what the correct answer is and also what the algorithm above finds...316 Chapter 6 Dynamic Programming Exercises 317 The problenz Given a value for the moving cost M. a segmentation of y is a partition of its letters into contiguous blocks of letters. incorporating the notion that solutions should not only be reasonable at the word level.) Give an efficient algorithm that takes a string y and computes a segmentation of maximum total quality. where the final term of 10 arises because you change locations once. and others of you may be interested to learn. the goal of "pretty-printing" is to take text with a ragged right margin. Nn and Sl . find a plan of minimum cost. The total quality of a segmentation is determined by adding up the qualities of each of its blocks. word segmentation software in practice works with a more complex formulation of the problem--for example.. suppose you axe given a black box that. M = 10. 5. (So we’d get the right answer above provided that quaIity("rneet") + quality("at") + quality(" eight") was greater than the total quality of any other segmentation of the string. (a) Show that the following algorithm does not correctly solve this problem.) Example. (c) Give an efficient algorithm that takes values for n.. but also form coherent phrases and sentences. For i = I to n If Ni < Si then Output "NY in Month Else Output "SF in Month i" End In your example. This number can be either positive or negative..) Given a long string of letters y = YlY2 ¯" "Yn. If we think of this in the terminology of formal languages. a number of languages (including Chinese and Japanese) are written without spaces between the words. Thus. with a total cost of 1 + 3 + 2 + 4 + 10 = 20. M. while quality("ght") would be negative.. SF. Sn. not necessary for solving the problem: To achieve better performance. larger numbers correspond to more plausible English words. and sequences of operating costs N1 . Provide a brief explanation. If we consider the example "theyouthevent." or "meet ate ight.) In a word processor. But even with these additional criteria and constraints... never mind how long precisely. Month 1 NY SF 1¯ 50 Month 2 3 20 Month 3 20 2 Month 4 30 4 Then the plan of minimum cost would be the sequence of locations [NL N~. dynamic programming approaches lie at the heart of a number of successful segmentation systems.. and returns the cost of an optimal plan. Call me Ishmael. How could we automate this process? A simple approach that is at least reasonably effective is to find a segmentation that simply maximizes the cumulative "quality" of its individual constituent words. text. Some years ago. (Such a plan will be called optimal. and the operating costs are given by the following table. (b) Give an example of an instance in which every optimal plan must move (i.. Consequently. (You can treat a single call to the black box computing quality(x) as a single computational step. and sequences of operating costs N1 . for any string of letters x = xlx2-." there are at least three valid ways to segment this into common English words. the analogous problem would consist of taking a string like "meetateight" and deciding that the best segmentation is "meet at eight" (and not "me et at eight... As some of you know well. saying why your example has this property... but one constitutes a much more coherent phrase than the other two. Suppose n = 4.. (So quaIity("rne") would be positive.." or any of a huge number of even less plausible alternatives). SF].) (A final note. Nn and S1 .. each block corresponds to a word in the segmentation. by giving an instance on which it does not return the correct answer. this broader problem is like searching for a segmentation that also can be parsed well according to a grammar for the underlying language.. like this. If English were written without spaces.e. change locations) at least three times. xk.Sn. will return a number quality(x). software that works with text written in these languages must address the word segmentation problem--inferring li_kely boundaries between consecutive words in the .

I thought I would sail about a little and see the watery part of the world. ¯ You have at your disposal an electromagnetic pulse (EMP). In the 4th second. you know this sequence Xl. ¯ So specifically. x2 . Some years ago. x2 . We will assume we have a fixed-width font and ignore issues of punctuation or hyphenation..) ¯ We will also assume that the EMP starts off completely drained. Given the data on robot arrivals x~. (After t~s use. Example. then we should have ~(Q+I) +Ck_<L.... p(j) . never mind how long precisely. A formatting of W consists of a partition of the words in W into lines. The residents of the underground city of Zion defend themselves through a combination of kung fu.) In the solved exercise. . 1) = 1 robot. The problem. it will be completely drained. it’s possible to do better than this. wz . and turn it into text whose right margin is as "even" as possible. which can destroy some of the robots as they arrive. Recently they have become interested in automated methods that can help fend off attacks by swarms of robots. xn. in fact. so ff it is used for the first time in the jth second. The d~ference between the left-hand side and the right-hand side will be called the slack of the line--that is. and efficient algorithms. then it is capable of destroying up to f(]) robots.. ¯ i Xi f(O 1 1 1 23 10 10 2 4 4 1 8 The best solution would be to activate the EMP in the 3rd and the 4tu seconds. Give an efficient algorithm to find a partition of a set of words W into valid lines. if we want to maximize the profit per [ A swarm of robots arrives over the course of n seconds. xi robots arrive. I thought I would sail about a little and see the watery part of the world. Here’s what one of these robot attacks looks like.p(i)? (If there is no way to make money during the n days. Based on remote sensing data. For each day i. word except the last.. Show how to find the optimal numbers i and j in time O(n). To make this precise.. there should be a space after each wg are assigned to one line. then it will destroy rrfin(xk. choose the points in time at which you’re going to activate the EMP so as to destroy as many robots as possible.. L We will call an assignment of words to a line valid if it satisfies this inequality.) so that ifj seconds have passed since the EMP was last used. W = {wl.... there is a function f(. running time for the following problem. ff it is used in the kth second.. Suppose n = 4. we should conclude this instead. and the values of xi and f(i) are given by the following table. x~ in advance. and it has beenj seconds since it was previously used. we need to figure out what it means for-the right margins to be "even. and so if wj. and given the recharging function f(. numbered i = 1. we gave an algorithm with O(n log n) ~//7.) We’d like to know: How should we choose a day i on which to buy the stock and a later day j > i on which to sell it. But.. and so it destroys min(10. (We’ll assume for simplicity that the price was fixed during each day. wj+l . and it destroys min(1. and nothing particular to interest me on shore. In the words assigned to a single line.. heavy artillery. the EMP has gotten to charge for 3 seconds.. Exercises 319 share.318 Chapter 6 Dynamic Programming having little or no money in my pu~se. so that the sum of the squares of the slacks of all lines including the last line) is minkn~zed. the EMP has only gotten to charge for 1 second since its last use. wn}.. As a solved exercise in Chapter 5. then it is capable of destroying up to f(]) robots. n. the EMP’s power depends on how long it’s been allowed to charge up. This is a total of 5.. the number of spaces left at the right margin. 2 .). Call me Ishmael. having little or no money in my purse. We’re looking at the price of a given stock over n consecutive days. in the ith second." So suppose our text consists of a sequence of words. like this. we showed how to find the optimal pair of days i and j in time O(n log n). 4) = 4 robots.. f(])) robots. In the 3ra second. To make this precise enough for us to start ~g about how to write a pretty-printer for text. we have a maximum line length of L... we have a price p(i) per share for the stock on that day. and nothing particular to interest me on shore. where wi consists of ci characters.

Exercises

320

Chapter 6 Dynamic Programming

321

(a) Show that the following algorithm does not correctly solve this problem, by giving an instance on which it does not return the correct

the days on which you’re going to reboot so as to maximize the total amount of data you process. Example. Suppose n = 4, and the values of xi and st are given by the following table. Day 1 x s 10 8 Day 2 1 4 Day 3 7 2 Day 4 7 1

Schedule-EMP(xI ..... Xn) Let j be the smallest number for which [(]) >_ Xn (If no such ] exists then set ] ---- ~) Activate the EMP in the ~th second If n--]>_l then Continue recursively on the input Xl,..., Xn-j (i.e. , invoke Schedule-EMP(Xl ..... Xn-l)) In your example, say what the correct answer is and also what the algorithm above finds. Give an efficient algorithm that takes the data on robot arrivals Xl, xz ..... Xn, and the recharging function f(-), and returns the maximum number of robots that can be destroyed by a sequence of EMP activations.

The best solution would be to reboot on day 2 only; this way, you process 8 terab~es on day 1, then 0 on day 2, then 7 on day 3, then 4 on day 4, for a total of 19. (Note that if you didn’t reboot at all, you’d process 8 + 1 + 2 + 1 = 12; and other rebooting strategies give you less than 19 as

wel!.)
(a) Give an example of an instance with the following properties. - There is a "surplus" of data in the sense that xi > Sl for every L - The optimal solution reboots the system at least twice. In addition to the example, you shonld say what the optimal solution is. You do not need to provide a proof that it is optimal. (b) Give an efficient algorithm that takes values for x~, x2 ..... Xn and Sl, s2 ..... sn and returns the total number of terabytes processed by an optimal solution.

~)

9. You’re helping to run a high-performance computing system capable of processing several terabytes of data per day. For each of n days, you’re presented with a quantity of data; on day ~, you’re presented with xi terabytes. For each terabyte you process, you receive a fixed revenue, but any unprocessed data becomes unavailable at the end of the day (i.e., you can’t work on it in any future day). You can’t always process everything each day because you’re constralned by the capabilities of your computing system, which can only process a fixed number of terabytes in a given day. In fact, it’s running some one-of-a-kind software that, while very sophisticated, is not totally reliable, and so the amount of data you can process goes down with each day that passes since the most recent reboot of the system. On the first day after a reboot, you can process sl terabytes, on the second day after a reboot, you can process s~ terabytes, and so on, up to sn; we assume sl > s2 > s3 > " " - > sn > 0. (Of course, on day ~ you can only process up to x~ terabytes, regardless of how fast your system is.) To get the system back to peak performance, you can choose to reboot it; but on any day you choose to reboot the system, you can’t process any data at all The problem. Given the amounts of available data xx, xz ..... Xn for the next n days, and given the profile of your system as expressed by s~, s2 ..... Sn (and starting from a freshly rebooted system on day 1), choose

10. You’re tr~ng to run a large computing job in which you need to simulate a physical system for as many discrete steps as you can. The lab you’re working in has two large supercomputers (which we’ll call A and B) which are capable of processing this job. However, you’re not one of the highpriority users of these supercompu~ers, so at any given point In time, you’re only able to use as many spare cycles as these machines have available.
Here’s the problem you face. Your job can only run on one of the machines in any given minute. Over each of the next n minutes, you have a "profile" of how much processing power is available on each machine. In minute i, you would be able to run ag > 0 steps ,of the simnlation if your job is on machine A, and bg > 0 steps of the simulation if your job is on machine B. You also have the ability to move your job from one machine to the other; but doing this costs you a minute of time in which no processing is done on your job. So, given a sequence of n minutes, a plan is specified by a choice of A, B, or "move" for each minute, with the property that choices A and

322

Chapter 6 Dynamic Programming

Exercises

323

/3 cannot appear in consecutive minutes. For example, if your job is on machine A in minute i, and you want to switch to mach~e B, then your choice for minute i + 1 must be move, and then your choice for minute i + 2 canbe B. The value of a plan is the total number of steps that you manage to execute over the n minutes: so it’s the sum of ai over all minutes in which the job is on A, plus the sum of bi over all minutes in which the job is on B. bn, find a plan of maximum value. (Such a strategy will be called optgmal.) Note that your plan can start with either of the machines A or B in minute 1. Example. Suppose n = 4, and the values of a~ and bi are given by the following table.
Minute 1 A B 10 5 Minute 2 1 1 Minute 3 1 20 Minute 4 10 20

In your example, say what the correct answer is and also what the algorithm above finds. an and
bn

and returns the value of an optimal plan.

11. Suppose you’re consulting for a company that manufactures PC equip-

ment and ships it to distributors all over the country. For each of the next n weeks, they have a projected supply s~ of equipment (measured in pounds), whi4h has to be shipped by an air freight carrier. Each week’s supply can be carried by one of two air freight companies, AorB. Company A charges a fixed rate r per pound (so it costs r- s~ to ship a week’s supply si).
Company B makes contracts for a fixed amount c per week, independent of the weight. However, contracts with company B must be made In blocks of four consecutive weeks at a time. A schedule, for the PC company, is a choice of air freight Company (A or B) for each of the n weeks, with the restriction that company B, whenever it is chosen, must be chosen for blocks of four contiguous weeks at a 0me. The cost of the schedule is the total amount paid to company A and B, according to the description above. Give a polynomial-time algorithm that takes a sequence of supply Sn and returns a schedule of minimum cost. Example. Suppose r = 1, c = 10, and the sequence of values is
11, 9, 9, 12, 12, 12, 12, 9, 9, 11.

Then the plan of maximum value would be to choose A for:minute 1, then move for minute 2, and then B for minutes 3 and 4. The value of this plan would be 10 + 0 + 2O + 20 = 5O. (a) Show that the following algorithm does not correctly solve this problem, by giving an instance on which it does not return the correct

In minute I, choose the machine achieving the larger of aI, bl Set ~hile i < n What was the choice in minute i--I? If A: If hi+l >ai+ai+l then Choose moue in minute i and B in minute i+ 1 Proceed to iteration i+ 2 Else Choose A in minute Proceed to iteration i+ Endif If B: behave as above with roles of A and B reversed EndWhile

Then the optimal schedule would be to choose company A for the first three weeks, then company B for a block of four consecutive weeks, and then company A for the fInal three weeks.

12. Suppose we want to replicate a file over a collection of n servers, labeled Sn. TO place a copy of the file at server Si results in a placement cost of q, for an integer q > 0. Now, if a user requests the file from server Si, and no copy of the file is present at S,, then the servers S~+l, S~+2, S,+3... are searched In order until a copy of the file is fInally found, say at server Si, where j > i. This results In an access cost ofj - i. (Note that the lower-indexed servers S~_> S~_2 .... are not consulted In this search.) The access cost is 0 if Si holds a copy of the file. We will require that a copy of the file be placed at server Sn, so that all such searches ~ terminate, at the latest, at

324

Chapter 6 Dynamic Programming We’d like to place copies of the fries at the servers so as to minimize the sum of placement and access costs. Formally, we say that a configuration is a choice, for each server Si with i = 1, 2 ..... n - 1, of whether to place a copy of the file at Si or not. (Recall that a copy is always placed at Sn.) The total cost of a configuration is the sum of all placement costs for servers with a copy of the file, plus the sum of all access costs associated with all n servers. Give a p olynomial-time algorithm to find a configuration of minimum total cost. two opposing concerns in maintaining such a path: we want paths that are short, but we also do not want to have to change the path frequently as the network structure changes. (That is, we’d like a single path to continue working, if possible, even as the network gains and loses edges.) Here is a way we might model this problem. Suppose we have a set of mobile nodes v, and at a particular point in time there is a set E0 of edges among these nodes. As the nodes move, the set of edges changes from E0 to E~, then to E2, then to E3, and so on, to an edge set Eb. Fir i = 0, 1, 2 ..... b, let G~ denote the graph (V, E~). So if we were to watch the structure of the network on the nodes V as a "time lapse," it would look precisely like the sequence of graphs Go, G~, G2 ..... Gb_~, G~. We will assume that each of these graphs G~ is connected. Now consider two particular nodes s, t ~ V. For an s-t path P in one of the graphs Gi, we define the length of P to be simply the number of edges in P, and we denote this g(P). Our goal is to produce a sequence of paths P0, P~ ..... P~ so that for each i, Pg is an s-t path in G~. We want the paths to be relatively short. We also do not want there to be too many changes--points at which the identity of the path switches. Formally, we define changes(Po, P~ ..... P~) to be the number of indices i (0 < i < b - 1) for which Pi # P~+I" Fix a constant K > 0. We define the cost of the sequence of paths
PO, P1 ..... Pb tO be b COSt(Po, PI ..... Pb) = ~ f-(Pi) + K. changes(Po, P~ .....

Exercises

325

13. The problem of searching for cycles in graphs arises naturally in financial
trading applications. Consider a firm that trades shares in n different companies. For each pair i ~j, they maintain a trade ratio rq, meaning that one share of i trades for rq shares ofj. Here we allow the rate r to be fractional; that is, rq = ~ means that you can trade ~ee shares of i to get two shares of j. A trading cycle for a sequence of shares ~1, iz ..... ~k consists of successively trading shares in company il for shares in company ~z, then shares in company iz for shares i3, and so on, finally trading shares in ik back to shares in company ~. After such a sequence of trades, one’ends up with shares in the same company i~ that one starts with. Trading around a cycle is usually a bad idea, as you tend to end up with fewer shares than you started with. ]But occasionally, for short periods of time, there are opportunities to increase shares. We will call such a cycle an opportunity cycle, if trading along the cycle increases the number of shares. This happens exactly if the product of the ratios along the cycle is above 1. In analyzing the state of the market, a firm engaged in trading would like to know if there are any opportunity cycles. Give a polynomial-time algorithm that finds such an opportunity cycle, if one exists. 14, A large collection of mobile wireless devices can naturally form a network

Pb).

(a) Suppose it is possible to choose a single path P that is an s-t path in each of the graphs Go, G~ .....Gb. Give a polynomial-time algorithm to find the shortest such path. (b) Give a polynomial-time algorithm to find a sequence of paths P~ of minimum cost, where P~ is an s-t path in G~ for P0, P~ ..... i=0,1 ..... b. 15. On most clear days, a group of your friends in the Astronomy Department gets together to plan out the astronomical events ~they’re going to try observing that night. We’ll make the following assumptions about the events. o There are n events, which for simplicity we’ll assume occur in sequence separated by exactly one minute each. Thus event j occurs at minute j; if they don’t observe this event at exactly minute j, then they miss out on it.

in which the devices are the nodes, and two devices x and y are connected by an edge if they are able to directly communicate with each other (e.g., by a short-range radio link). Such a network of wireless devices is a highly dynamlc object, in which edges can appear and disappear over time as the devices move around. For instance, an edge (x, y) might disappear as x and y move far apart from each other and lose the ability to communicate directly. In a network that changes over time, it is natural to look for efficient ways of maintaining a path between certain designated nodes. There are

Exercises

326

Chapter 6 Dynamic Programming

327

The sky is mapped according to a one-dimensional coordinate system ( measured in degrees from some central baseline); event j will be taldng place at coordinate dj, for some integer value dj. The telescope starts at coordinate 0 at minute 0. The last event, n, is much more important than the others; so it is required that they observe event n. The Astronomy Department operates a_large telescope that can be used for viewing these events. Because it is such a complex instrument, it can only move at a rate of one degree per minute. Thus they do not expect to be able to observe all n events; they just want to observe as many as possible, limited by the operation of the telescope and the requirement that event n must be observed. We say that a subset S of the events is viewable ff it is possible to observe each event j ~ S at its appointed time j, and the telescope has adequate time (moving at its maximum of one degree per minute) to move between consecutive events in S. . The problem. Given the coordinates of each of the n events, find a viewable subset of maximum size, subject to the requirement that it should contain event n. Such a solution will be called optimal.

i

Update current position to be coord.~~ at minute ] Endwhile Output the set S

In your example, say what the correct answer is and also what the algorithm above finds. ~) Give an efficient algorithm that takes values for the coordinates dl, da ..... dn of the events and returns the size of an optimal solution. 16. There are many sunny days in Ithaca, New York; but t~s year, as it
happens, the spring ROTC picnic at CorneAl has fallen on a rainy day. The ranldng officer decides to postpone the picnic and must notify everyone by phone. Here is the mechanism she uses to do t~s.

should call B before D. )

Each ROTC person on campus except the ranking officer reports to a unique superior officer. Thus the reporting hierarchy can be described by a tree T, rooted at the ranking officer, in which each other node v has a parent node u equal to his or her superior officer. Conversely, we will call u a direct subordinate of u. See Figure 6.30, In which A is the Figure 6.30 A hierarchy with ranking officer, B and D are the direct subordinates of A, and C is the four people. The fastest broadcast scheme is for A d~ect subordinate of B.

Example. Suppose the one-dimensional coordinates of the events are as shown here.
Event Coordinate 1 2 345 678 1 -4 -1 4 5 -4 6 7 -2 9

Then the optimal solution is to observe events 1, 3, 6, 9. Note that the telescope has time to move from one event in this set to the next, even moving at one degree per minute. (a) Show that the following algorithm does not correctly solve this problem, by giving aninstance onwhichit does not return the correct
anSWer.

To notify everyone of the postponement, the ranking officer first ca~s each of her direct subordinates, one at a time. As soon as each subordinate gets the phone call, he or she must notify each of his or her direct subordinates, one at a time. The process continues this way until everyone has been notified. Note that each person in this process can only cal! direct subordinates on the phone; for example, in Figure 6.30, A would not be allowed to call C.
We can picture this process as being divided into rounds. In one round, each person who has already learned of the postponement can call one of his or her direct subordinates on the phone. The number of rounds it takes for everyone to be notified depends on the sequence in which each person calls their direct subordinates. For example, in Figure 6.30, it will take only two rounds if A starts by calling B, but it will take three rounds if A starts by ca]Jing D. Give an efficient algorithm that determines the minimum number of rounds needed for everyone to be notified, and outputs a sequence of phone calls that achieves this minimum number of rounds. / 17~Your friends have been studying the dosing prices of tech stocks, looking for interesting patterns. They’ve defined something called a rising trend, as follows.

to call B in the first round. In the second round, A ca~s D and B calls C. If A were to call D first, thenC could not learn the news until the third round.

Mark all events j with Idn-dil >n-] as illegal (as observing them would prevent you from observing event n) Mark all other events as legal Initialize clLrrent position to coordinate 0 at minute 0 While not at end of event sequence Find the earliest legal event ] that can be reached without exceeding the maximum movement rate of the telescope Add ] to the set S

328

Chapter 6 Dynamic Programming They have the closing price for a given stock recorded for n days in Pin]. A rising trend p[ik], for days i~ < i2 <.. - < ik, SO that * i~ = 1, and k- 1. Thus a rising trend is a subsequence of the days--beginning on the first day and not necessarily contiguous--so that the price strictly increases over the days in this subsequence. They are interested in fin~Rng the longest rising trend in a given sequence of prices. Example. Suppose n = 7, and the sequence of prices is

Exercises Suppose you are given two strings A = ala2 ¯ .. am and B = b~b2 . . . bn and a proposed alignment between them. Give an O(mn) algorithm to decide whether this alignment is the unique minimum-cost alignment between A and B.

329

19. You’re consulting for a group of people (who would prefer not to be mentioned here by name) whose jobs consist of monitoring and analyzing electronic signals coming from ship s in coastal Atlantic waters. They want a fast algorithm for a basic primitive that arises frequently: "untangling" a superposition of two known signals. Specifically, they’re picturing a situation in which each of two ships is emitting a short sequence of 0s and Is over and over, and they want to make sure that the signal they’re hearing is simply an interleaving of these two emissions, with nothing extra added in.
This describes the whole problem; we can make it a little more explicit as follows. Given a string x consisting of 0s and ls, we write x~ to denote k copies of x concatenated together. We say that a string x’ is a repetition ofx if it is a prefix ofxk for some number k. So x’ = 10110110110 is a repetition of x = 101. We say that a string s is an interleaving of x and y if its symbols can be partitioned into two (not necessarily contiguous) subsequences s’ and s", so that s’ is a repetition ofx and s" is a repetition ofy. (So each symbol in s must belong to exactly one of s’ or s".) For example, if x = 101 and y = 00, then s = 100010101 is an interleaving ofx and y, since characters 1,2,5,7,8,9 form 101101--a repetition of x--and the remaining characters 3,4,6 form 000--a repetition of y. In terms of our application, x and y are the repeating sequences from the two ships, and s is the signal we’re listening to: We want to make sure s "unravels" into simple repetitions of x and y. Give an efficient algorithm that takes strings s, x, and y and decides if s is an interleaving of x and y.

Then the longest rising trend is given by the prices on days !, 4, and 7. Note that days 2, 3, 5, and 6 consist of increasing prices; but becaflse fffis subsequence does not begin on day 1, it does not fit the definition of a rising trend. (a) Show that the following algorithm does not correctly r&urn the length of the longest rising trend, by giving an instance on which it fails to return the correct answer. Define i=I L=I For ] = 2 to n If PU]> P[i] then Set i=j. Add 1 to L Endif Endfor In your example, give the actual length of the longest rising trend, and say what the algorithm above returns. Give an efficient algorithm that takes a sequence of prices P[1], Pin] and returns the length of the longest rising trend.

18. Consider the sequence alignment problem over a four-letter alphabet {zl, z2, z3, z4}, with a given gap cost and given mismatch costs. Assume that each of these parameters is a positive integer.

20. Suppose it’s nearing the end of the semester and you’re taking n courses, each with a final project that still has to be done. Each project will be graded on the following scale: It w~ be assigned an integer number on a scale of 1 to g > 1, higher numbers being better grades. Your goal, of course, is to maximize your average grade on the n projects. You have a total of H > n hours in which to work on the n projects cumulatively, and you want to decide how to divide up this time. For simplicity, assume H is a positive integer, and you’ll spend an integer number of hours on each project. To figure out how best to divide up n] (rough

Exercises

330

Chapter 6 Dynamic Programming estimates, of course) for each of your rt courses; if you spend tt < H hours on the project for course i, you’]] get a grade of fi(h). (You may assume that the functions fi are nondecreasing: if tt < h’, then fi(h) < f~(h’).) So the problem is: Given these functions {fi}, decide how many hours to spend on each project (in integer values only) so that your average grade, as computed according to the fi, is as large as possible. In order to be efficient, the running time of your algorithm should be polynomial in n, g, and H; none of these quantities should appear as an exponent in your running time. 21. Some time back, you helped a group of friends who were doing simnlations for a computation-intensive investment company, and they’ve come back to you with a new problem. They’re looking at n consecutive days of a given stock, at some point in the past. The days are numbered i = 1, 2 ..... n; for each day i, they have a price p(i) per share for the stock on that day. For certain (possibly large) values of k, they want to study what they call k-shot strategies. A k-shot strategy is a collection of m pairs of days (hi, Sl) ..... (brn, sin), where 0 _< rn < k and l <_bl <Sl <b2 <s2..’<bm <Srn <-rt" We view these as a set of up to k nonoverlapping intervals, dur’.mg each of which the investors buy 1,000 shares of the stock (on day b~) and then sel! it (on day s~). The return of a given k-shot strategy is simply the profit obtained from the rn buy-sell transactions, namely, 1,000 ~ p(si) - p(bi). The investors want to assess the value of k-shot strategies by running simulations on their rt-day trace of the stock price. Your goal is to design an efficient algorithm that determines, given the sequence of prices, the kshot strategy with the maximum possible return. Since k may be relatively large in these simulations, your running time shonld be polynomial in both r~ and k; it should not contain k in the exponent. 22. To assess how "well-connected" two nodes in a directed graph are, one can not only look at the length of the shortest path between them, but can also count the number of shortest paths. This turns out to be a problem that can be solved efficiently, subject to some restrictions on the edge costs. Suppose we are given a directed graph G = (V, E), with costs on the edges; the costs may be posigve or negative, but every cycle in the graph has strictly positive cost. We are also given two nodes v, w ~ V. Give an efficient algorithm that computes the number of shortest v-w paths in G. (The algorithm should not list all the paths; just the number suffices.)

331

23. Suppose you are given a directed graph G = (V, E) with costs on the edges
ce for e ~ E and a sink t (costs may be negative). Assume that you also have ~te values d(v) for v E V. Someone claims that, for each node v E V, the quantity d(v).is the cost of the minimum-cost path from node v to the sink t.

(a) Give a linear-time algorithm (time O(m) if the graph has rn edges) that
verifies whether this claim is correct. (b) Assume that the distances are correct, and d(v) is finite for all v ~ V. Now you need to compute distances to a different sink t’. Give an O(rn log n) algorithm for computing distances d’(v) for all nodes v ~ V to the sink node t’. (Hint: It is useful to consider a new cost function defined as follows: for edge e = (v, w), let c’~ = c~ - d(v) + d(w). Is there a relation between costs of paths for the two different costs c and c’?) 24. Gerrymandering is the practice of carving up electoral districts in very
careful ways so as to lead to outcomes that favor a particular poetical party. Recent court challenges to the practice have argued that through this Calculated redistric~g, large numbers of voters are being effectively (and intentionally) disenfranchised. Computers, it turns out, have been implicated as the source of some of the "villainy" in the news coverage on this topic: Thanks to powerful software, gerrymandering has changed from an activity carried out by a bunch of people with maps, pencil, and paper into the industrial-strength process that it is today. Why is gerrymandering a computational problem? There are database issues involved in tracking voter demographics down to the level of individual streets and houses; and there are algorithmic issues involved in grouping voters into districts. Let’s think a bit about what these latter issues look like. Suppose we have a set of n precincts P~, P2 ..... Pa, each containing m registered voters. We’re supposed to divide thes~e precincts into two districts, each consisting of r~/2 of the precincts. Now, for each precinct, we have information on how many voters are registered to each of two political parties. (Suppose, for simplicity, that every voter is registered to one of these two.) We’ll say that the set of precincts is susceptible to gerrymandering ff it is possible to perform the division into two districts in such a way that the same party holds a majority in both districts.

332

Chapter 6 Dynamic Programming Give an algorithm to determine whether a given set of precincts is susceptible to gerrymandering; the running time of your algorithm should be polynomial in n and m. Example. Suppose we have n = 4 precincts, and the following information on registered voters. Precinct 1 2 3 60 40 4 47 53

Exercises

333

Number registered for party A 55. 43 Number registered for party B 45 57

This set of precincts is susceptible since, if we grouped precincts 1 and 4 into one district, and precincts 2 and 3 into the other, then party A would have a majority in both districts. (Presumably, the "we" who are doing the grouping here are members of party A.) This example is a quick illustration of the basic unfairness in gerrymandering: Although party A holds only a slim majority in the overall population (205 to 195), it ends up with a majority in not one but both districts.
25. Consider the problem facedby a stockbroker trying to sell a large number

the best way to sell x shares by day n. In other words, find natural numbers Y~, Ya ..... Yn so that x = y~ +... + Yn, and selling Yi shares on day i for i = 1, 2 ..... n maximizes the total income achievable. You should assume that the share value Pi is monotone decreasing, and f(.) is monotone increasing; that is, selling a larger number of shares causes a larger drop in the price. Your algorithm’s running time can have a polynomial dependence on n (the number of days), x (the number of shares), and p~ (the peak price of the stock). Example Co~sider the case when n = 3; the prices for the three days are 90, 80, 40; and f(y) = 1 for y _< 40,000 and f(y) = 20 for y > 40,000. Assume you start with x = 100,000 shares. Selling all of them on day i would yield a price of 70 per share, for a total income of 7,000,000. On the other hand, selling 40,000 shares on day 1 yields a price of 89 per share, and selling the remaining 60,000 shares on day 2 results in a price of 59 per share, for a total income of 7,100,000. 26. Consider the following inventory problem. You are running a company
that sells some large product (let’s assume you sell trucks), and predictions tell you the quantity of sales to expect over the next n months. Let di denote the number of sales you expect in month i. We’ll assume that MI sales happen at the beginning of the month, and trucks that are not sold are stored until the beginning of the next month. You can store at most S trucks, and it costs C to store a single truck for a month. You receive shipments of trucks by placing orders for them, and there is a fixed ordering fee of K each time you place an order (regardless of the number of trucks you order). You start out with no trucks. The problem is to design an algorithm that decides how to place orders so that you satisfy all the demands {d~}, and minimize the costs. In summary: * There are two parts to the cost: (1) storage--it costs C for every truck on hand that is not needed that month; (2) ordering fees--it costs K for every order placed. * In each month you need enough trucks to satisfy the demand d~, but the number left over after satisfying the demand for the month should not exceed the inventory limit S. Give an algorithm that solves this problem in time that is polynomial in n andS. 27. The owners of an Independently operated gas station are faced with the following situation. They have a large underground tank in which they store gas; the tank can hold up to L gallons at one time. Ordering gas is quite expensive, so they want to order relatively rarely. For each order,

of shares of stock in a company whose stock price has been steadily falling in value. It is always hard to predict the right moment to sell stock, but owning a lot of shares in a single company adds an extra complication: the mere act of selling many shares in a single day will have an adverse effect on the price. Since future market prices, and the effect of large sales on these prices, are very hard to predict, brokerage firms us.e models of the market to help them make such decisions. In this problem, we will consider the following simple model. Suppose we need to sell x shares of stock in a company, and suppose that we have an accurate model of the market: it predicts that the stock price will take the values Pl, P2 ..... Pn over the next n days. Moreover, there is a function f(.) that predicts the effect of large sales: ff we sell y shares on a single day, it will permanently decrease the price by f(y) from that day onward. So, ff we sell Yl shares on day 1, we obtain a price per share of Pl - f(Yl), for a total income of yl- (p~ - f(Y3). Having sold y~ shares on day 1, we can then sell y2 shares on day 2 for a price per share ofp2 - f(Y0 - f(Y2); this yields an additional income of Y2 (P2 - f(Y3 - f(Y2))" This process continues over all n days. (Note, as in our calculation for day 2, that the decreases from earlier days are absorbed into the prices for all later days.) Design an efficient algorithm that takes the prices p~ ..... pn~a~d the function f(.) (written as a list of values f(1), f(2) ..... f(x)) and determines

334

Chapter 6 Dynamic Programming they need to pay a fixed price P for delivery in addition to the cost of the gas ordered. However, it costs c to store a gallon of gas for an extra day, so ordering too much ahead increases the storage cost. They are planning to close for a week in the winter, and they want their tank to be empty by the time they close. Luckily, based on years of experience, they have accurate projections for how much gas they will need each day tmtfl this point in time. Assume that there are n days left unti! they close, and they need gt gallons of gas for each of the days i = 1 ..... n. Assume that the tank is empty at the end of day 0. Give an algorithm to decide on which days they should place orders, and how much to order so as to minimize their total cost.

Notes and Further Reading with a spanning subtree T of G[Z]. The weight of the Steiner tree is the weight of the tree T. Show that there is function f(.) and a polynomial function p(.) so that the problem of finding a ~um-weight Steiner tree on X can be solved in time O(f(k). p(n)).

335

Notes and Further Reading
Richard Bellman is credited with pioneering the systematic study of dynamic programming (Bellman 1957); the algorithm in this chapter for segmented least squares is based on Bellman’s work from this early period (Bellman 1961). Dynamic programming has since grown into a technique that is widely used across computer science, operations research, control theory, and a number of other areas. Much of the recent work on this topic has been concerned with stochastic dynamic programming: Whereas our problem formulations tended to tacitly assume that al! input is known at the outset, many problems in scheduling, production and inventory planning, and other domains involve uncertainty, and dynamic programming algorithms for these problems encode this uncertainty using a probabilistic formulation. The book by Ross (1983) provides an introduction to stochastic dynamic programming. Many extensions and variations of the Knapsack Problem have been studied in the area of combinatorial optimization. As we discussed in the chapter, the pseudo-polynomial bound arising from dynamic programming can become prohibitive when the input numbers get large; in these cases, dynamic programming is often combined with other heuristics to solve large instances of Knapsack Problems in practice. The book by Martello and Toth (1990) is devoted to computational approaches to versions of the Knapsack Problem. Dynamic programming emerged as a basic technique in computational biology in the early 1970s, in a flurry of activity on the problem of sequence comparison. Sankoff (2000) gives an interesting historical account of the early work in this period. The books by Waterman (1995) and Gusfield (1997) provide extensive coverage of sequence alignment algorithms (as well as many related algorithms in computational biology); Mathews and Zuker (2004) discuss further approaches to the problem of RNA secondary structure prediction. The space-efficient algorithm for sequence alignment is due to Hirschberg (1975). The algorithm for the Shortest-Path Problem described in this chapter is based originally on the work of Bellman (1958) and Ford (1956). Many optimizations, motivated both by theoretical and experimental considerations,

28. Recall the scheduling problem from Section 4.2 in which we sought to

~ze the maximum lateness. There are r~ jobs, each with a deadline dt and a required processing time tt, and all jobs are available to be scheduled starting at l~me s. For a job i to be done, it needs tO be assighed a period from st >_ s to ft = st + tt, and different jobs should be assigned nonoverlapping intervals. As usual, such an assignment of times will be called a schedule. In this problem, we consider the same setup, but want to optimize a different objective. In particular, we consider the case in which each job must either be done by its deadline or not at all. We’ll say that a subset J of the jobs is schedulable if there is a schedule for the jobs in J so that each of them finishes by its deadline. Your problem is to select a schedulable subset of maximum possible size and give a schedule for this subset that allows each job to finish by its deadline. (a) Prove that there is an optimal solution J (i.e., a schedulable set of maximum size) in which the jobs in J are scheduled in increasing order of their deadlines. (b) Assume that a~ deadlines dt and required times tt are integers. Give an algorithm to find an optimal solution. Your algorithm should run in time polynomial in the number of jobs n, and the maximum deadline D = rnaxt dr.

29. Let G : (V, E) be a graph with n nodes in which each pair of nodes is joined by an edge. There is a positive weight tvq on each edge (i,j); and we will assume these weights satisfy the triangle inequality tvtk < tvq + tvjk. For a subset V’ ___ V, we will use G[V’] to denote the subgraph (with edge weights) induced on the nodes in V’. We are given a set X c V of k terminals that must be connected by edges. We say that a Stein~r tree on X is a set Z so that X __ Z _ V, together

We have seen a number of such situations in our earlier discussions of graphs and bipartite graphs. out of one of the original problems we formulated at the beginning of the course: Bipartite Matching.1 A bipartite graph. the edges constitute pairs of nodes. In this chapter. Now. or characters in the Sequence Alignment Problem. and Stewart (1998). the nodes in Y represent machines. Dreyfus and R. Recall the set-up of the Bipartite Matching Problem. We often draw bipartite graphs as in Figure 7. (Think of men (X) matched to women (Y) in the Stable Matching Problem. yj) indicates that machine yj is capable of processing job xi. A perfect matching is. with the nodes in X in a column on the left. Bipartite graphs can represent many other relations that arise between two . in a sense. and Exercise 29 is based on a result of S. with the property that no element of the set appears in more than one pair. and each edge crossing from the left column to the right column. A bipartite graph G = (V. Goldberg and Radzik (1994). E) is an undirected graph whose node set can be partitioned as V = X ~3 Y.1. with the property that every edge e ~ E has one end in X and the other end in Y. Matchings in bipartite graphs can model situations in which objects are being assigned to other objects. Notes on the Exercises Exercise 5 is based on discussions with Lillian Lee. based on work by Cherkassky. The applications of shortest-path methods to Internet routing. a Web site maintained by Andrew Goldberg contains state-of-the-art code that he has developed for this problem (among a number of others). are covered in books by Bertsekas and Gallager (1992). then. Keshav (1997). and an edge (xi. we focus on a rich set of algorithmic problems that grow. E) is a set of edges M _c E with the property that each node appears in at most one edge of M. Exercise 6 is based on a result of Donald Knuth. and the trade-offs among the different algorithms for networking applications. A set of edges M is a peryect matching if every node appears in exactly one edge of M. we’ve already seen the notion of a matching at several points in the course: We’ve used the term to describe collections of pairs over a set. with the property that each machine is assigned exactly one job.) In the case of a graph. Wagner. and we consequently say that a matching in a graph G = (V. One natural example arises when the nodes in X represent jobs. the nodes in Y in a column on the right. a way of assigning each job to a machine that can process it. Figure 7. Exercise 25 is based on results of Dimitris Bertsimas and Andrew Lo.336 Chapter 6 Dynamic Programming have been added to this basic approach to shortest paths.

These assumptions make things cleaner to think about. even though it has no edges leaving it. while ~e out of u f(e) is the sum of flow values over all edges leaving node v. and absorbed at sink nodes. Thus the flow on an edge cannot exceed the capacity of the edge.) This problem turns out to be solvable by an algorithm that runs in polynomial time. transmitted across edges. they preserve essentially all the issues we want to think about. which is a nonnegative number that we denote ce. One of the oldest problems in combinatorial algorithms is that of determining the size of the largest matching in a bipartite graph G. sink (or destination) nodes in the graph. for example. a highway system in which the edges are highways and the nodes are interchanges. . or flow. is defined to be the amount of flow generated at the source: Here Y~. ]: :E --~ R+. the traffic itself. While the initial motivation for network flow problems comes from the issue of traffic in a network. (As a special case. we define ]:°ut(v)= ~e out of u f(e) and ]:~(v) = ~e into u f(e). if S c_ V.2 A flow network. and third. capacity values given next to each edge. We can extend this to sets of vertices. we i Our notion of flow models traffic as it goes through the network at a steady rate. We have a single variable f(e) to denote the amount of flow on edge e. such as the relation between customers and stores. we have e into u e out of u f(e) sums the flow value f(e) over al! edges entering node v. Formally. For every node other than the source and the sink.einto u v(f)-. (i) (Capacity conditions) For each e ~ E. with source s and sink t. the value ]:(e) intuitively represents the amount of flow carried by edge e. which can "absorb" traffic as it arrives.IVl and it has a matching of size IXI. Flow Networks We’ll be considering graphs of this form. The Defining Flow Next we define what it means for our network to carry traffic. Network models of this type have several ingredients: capacities on the edges. Symmetrically. which is transmitted across the edges. that no edge enters the source s and no edge leaves the sink t. numbers next to the edges are the capacities. the sink is allowed to have flow coming in. that there is at least one edge incident to each node. f(e). we begin by formulating a general class of problems--network ]:low problems--that includes the Bipartite Matching Problem as a special case. denoted v(f). source nodes in the graph. and so forth. 7. We do not model bursty traffic.338 Chapter 7 Network Flow distinct sets of objects. The value of a flow f. We say that an s-t [low is a function [ that maps each edge e to a nonnegative real number. Rather than developing the algorithm directly. we’ll say that a [low network is a directed graph G = (V. and the nodes are iunctures where pipes are plugged together. Nodes other than s and t will be called internal nodes. and we refer to the traffic as ]:low--an abstract entity that is generated at source nodes. Consider. the amount of flow entering must equal the amount of flow leaving. it can generate flow. second.2 illustrates a flow network with four nodes and five edges. and while they eliminate a few pathologies.y~. which generate traffic. t0 Figure 7. and Figure 7. in other words. o Associated with each edge e is a capacity. and fina~y. e out of s To make the notation more compact. note that G has a perfect matching if and orfly if IXl -. We will make two assumptions about the flow networks we deal with: first.1 The Maximum-Flow Problem and the Ford-Fulkerson Algorithm f! The Problem One often uses graphs to model transportation networks--networks whose edges carry some sort of traffic and whose nodes act as "switches" passing traffic between different edges.1 The Maximum-Flow Problem and the Ford-Fulkerson Algorithm 339 There is a single source node s e V. but it is allowed to have flow going out. The source has no entering edges (by our assumption). or a fluid network in which edges are pipes that carry liquid. E) with the following features. we will see that they have applications in a surprisingly diverse set of areas and lead to efficient algorithms not just for Bipartite Matching. or houses and nearby fire stations. and show how this provides an efficient algorithm for Bipartite Matching as well. A flow ]: must satisfy the following two properties. or a computer network in which the edges are links that can carry packets and the nodes are switches. that a!l capacities are integers.1 7. but for a host of other problems as well. where the flow fluctuates over time. but the development of this algorithm needs ideas fundamentally different from the techniques that we’ve seen so far. the Maxim~m-FIotv Problem. we have 0 _< f(e) <_ % (ii) (Conservation conditions) For each node v other than s and t. We then develop a polynomial-time algorithm for a general problem. indicating how much they can carry. There is a single sink node t ~ V.

Co) Pushing 20 units of flow along the path s. since it is possible to construct a flow of value 30. t)} and increase the flow on each of these edges to 20. our algorithm wil! also compute the minimum cut. this now results in too much flow coming into v. and a flow f on G. t. which provides a systematic way to search for forwardbackward operations such as this. u. think about it.4 for the residual graph of the flow on Figure 7. Now.f(e) "leftover" units of capacity on which we could try pushing flow forward. For each edge e = (u.1 The Maximum-Flow Problem and the Ford-Fulkerson Algorithm 341 define fout (S) = ~e out of S f(e) and fin(s) ---. t. in Figure 7. ~ Designing the Algorithm Suppose we wanted to find a maximum flow in a network.3 (a) The network of Hgure 7. and its value is 30. The Maximum-Flow Problem Given a flow network. the value of our flow is 20. How should we go about doing this~.3(c).) The node set of Gf is the same as that of G. Suppose we start with zero flow: f(e) = 0 for al! e. where the dark edges are carrying flow before the operation. (u. Then. In this way. Essentially. This is a more general way of pushing flow: We can push forward on edges with leftover capacity. We will see that the problem of finding cuts of minimum capacity in a flow network turns out to be as valuable. finally. from the point of view of applications. A and B. So we "undo" 10 units of flow on (u. It takes some testing out to decide that an approach such as dynamic programming doesn’t seem to work--at least. . We now define the residual graph. As we think about designing algorithms for this problem. we push 10 units of flow along (u. t becomes fin(v) = f°ut(v). Thus. The maximum-flow algori_thm that we develop here will be intertwined with a proof that the maximum-flow value equals the minimum capacity of any such division. What we need is a more general way of pushing flow from s to t. it’s useful to consider how the structure of the flow network places upper bounds on the maximum value of an s-t flow. The Residual Graph Given a flow network G. Clearly this respects the capacity and conservation conditions. and leave f(e) = 0 for the other two. (c) The new kind of augmenting path using the edge (u. v.and we can ask: Is this the maximum possible for the graph in the figure? If we . v) of G on which f(e) < ce. As a bonus. restoring the conservation condition at u. t). so that s ~ A and t ~ B. v). up to the limits imposed by the edge capacities. so that in a situation such as this.2. v) backward.3. The problem is that we’re now stuck--there is no s-t patii on which we can directly push flow without exceeding some capacity--and yet we do not have a maximum flow. the conservation condition for nodes v ~ s. we might choose the path consisting of the edges {(s. and we can write v(f) = f°Ut(s). we also increase it on an edge leaving the node. we’d like to perform the following operation denoted by a dotted line in Figure 7. we see that the answer is no. called the minimum cut. We now have a valid flow. and we can push backward on edges that are already caning flow. we could go back and think about simple greedy approaches. the problem is tha~ its value is 0. and the dashed edges form the new kind of augmentation. we still respect the capacity conditions--since we only set the flow as high as the edge capacities would allow--and the conservation conditions-since when we increase flow on an edge entering an internal node.~e into $ f(e).3. Thus the basic algorithmic problem we will consider is the following: Given a flow network. to see where they break down. In this terminology. there are ce. ~). 10 10 2O (a) (b) Figure 7. See Figure 7. v). as that of finding a maximum flow. We push 10 units of flow along (s. This suggests that each such "cut" of the graph puts a bound on the maximum possible flow value. (~. this restores the conservation condition at v but results in too little flow leaving u. and thereby use up some of the edge capacity from A to B. (~ee Figure 7. there is no algorithm known for the Maximum-Flow Problem that could really be viewed as naturally belonging to the dynamic programming paradigm. In the absence of other ideas.3 after pushing 20 units of flow along the path s.54O Chapter 7 Network Flow 7. any flow that goes from s to t must cross from A into B at some point. We now try to increase the value of f by "pushing" flow along a path from s to t. v. a natural goa! is to arrange the traffic so as to make as efficient use as possible of the available capacity. intuitively. we have a way to increase the value of the current flow. Here is a basic "obstacle" to the existence of large flows: Suppose we divide the nodes of the graph into two sets. u. to divert it in a different direction. find a flow of maximum possible value. So. we define the residual graph Gf of G with respect to f as follows. u).

u)~P If e= (u.f(e)) = ce. thus we have 0 <_ f(e) <_ f’(e) = f(e) + bottleneck(P. Let u be such a node. (7. Let P be a simple s-t path in that is. If (u.f(e). to reflect the importance of augment. P) is a new flow [’ in G. u)) decrease f(e) in G by b Endif Endfor Return([) 343 2( 20 10 20 I0 ~0 20 It was purely to be able to perform this operation that we defined the residual graph.4 (a) The graph G with the path s.f(e) = O. but its direction is reversed. u) in Gf. and again the capacity condition holds. u. and whether the edge of P that exits u is a forward or backward edge. We will call edges included this way forward edges.1) f ’ is a flow in G. Thus. so the capacity condition holds. to be the minimum residual capacity of any edge on P. with respect to the flow f.bottleneck(P. The dotted line is the new augmenting path. t used to push the first 20 units of flow. we need to check the capacity conditions only on these edges. P). note that bottleneck(P. by pushing flow backward. and if (u. f) is no larger than the residual capacity of (u. v) is a forward edge. u. (a) (b) Figure 7. We need to check the conservation condition at each internal node that lies on the path P. We now define the following operation augment(f. v. (c) The residual graph after pushing an additional !0 units of flow along the new augmenting path s. u. we specific!lly avoided decreasing the flow on e below 0. let (u. u) is a backward edge arising from edge e = (u. Augmenting Paths in a Residual Graph Now we want to make precise the way in which we push flow from s to t in Gf. We must verify the capacity and conservation conditions. o For each edge e = (u. then its residual capacity is ce . t. P does not visit any node more than once.f(e). we specifically avoided increasing the flow on e above ce. and we leave them to the reader. u) is a backward edge arising from edge e = (u. with a capacity of f(e). u) ~ E. each of these cases is easily worked out. However. u) is a backward edge. with the residual capacity next to each edge. Note that each edge e in G can give rise to one or two edges in Gf: If 0 < f(e) < ce it results in both a forward edge and a backward edge being included in Gf. Since f’ differs from f only on edges of P. there are f(e) units of flow that we can "undo" if we want to. P) Let b = bottleneck(P. f) <_ f(e) + (ce . sO we include the edge e’ = (v. we can verify that the change in the amount of flow entering v is the same as the change in the amount of flow exiting u.7. v) of G on which f(e) > 0. obtained by increasing and decreasing the flow values on edges of P. u) ~ E. The result of augment(i:. Note that e’ has the same ends as e. More concretely. We define bottleneck(P. v). We will sometimes refer to the capacity of an edge in the residual graph as a residual capacity. This completes the definition of the residual graph Gf. Let us first verify that [’ is indeed a flow. Technically. then its residual capacity is f(e). to help distinguish it from the capacity of the corresponding edge in the original flow network G. Informally. u) is a forward edge. the capacity condition continues to hold because if e = (u. so must f’. augment(f. If e = (u. So we include the edge e = (u. u) in Gf. one often refers to any s-t path in the residual graph as an augmenting path. f) >_ f(e) . so we have ce >_ f(e) >_ f’(e) = f(e) . N . u) be an edge of P. f) For each edge (u. there are four cases to check. Thus Gf has at most twice as many edges as G. u) is a forw~rd~edge increase f(e) in G by b.1 The Maximum-Flow Problem and the Ford-Fulkerson Algorithm 342 Chapter 7 Network Flow Else ((u. (b) The residual graph of the resulting flow [. then Proof. and let e= (u. we will call edges included this way backward edges. depending on whether the edge of P that enters v is a forward or backward edge. since f satisfied the conservation condition at u. which yields a new flow f’ in G. with a capacity of ce .

but it’s handy for us as a finite.4 for a run of the algorithm. Since G has no edges entering s. The residual graph Gf has at most 2m edges. Proof. We have assumed that all nodes have at least one incident edge. we can now prove termination. that all capacities in the flow network G are integers. Endwhile Keturn [ Proof.3) Let f be a flow in G. m We can use this property to prove that the Ford-Fulkerson Algorithm terminates. Proof. The FordFulkerson Algorithm is really quite simple. simply stated bound.2).4) that the algorithm terminates in at most C iterations of the Wh±le loop. Max-Flow Initially [(e)=0 for all e in G While there is an s-t path in the residual graph Let P be a simple s-t path in G[ f’ = augment(f. hence m > n/2. (7. Thus the flow f’ will have integer values. We’ll call this the Ford-Fulkerson Algorithm. Now suppose it is true after ] iterations. and so we can use O(m + n) = O(m) to simplify the bounds. since all residual capacities in Gf are integers. as above. so by (7. As at previous points in the book we wil! look for a measure of progress that will imply termination. We increase the flow on this edge by bottleneck(P. Then the Ford-Fulkerson Algorithm can be implemented to run in O(mC) time. Since it starts with the value 0. m We need one more observation to prove termination: We need to be able to bound the maximum possible flow value.) Using statement (7.2) At every intermediate stage of the Ford-Fulkerson Algorithm. we will have two linked lists for each node v. and let P be a simple s-t path in G[. and since bottleneck(P. we can use breadth-first search or depth-first search. the edge e must be a forward edge. Therefore the value of f’ exceeds the value of f by bottleneck(P. The first edge e of P must be an edge out of s in the residual graph Gf. f).(f). The answers to both of these questions turn out to be fairly subtle. We will maintain Gf using an adjacency list representation. the Wh±le loop in the Ford-Fulkerson Algorithm can run for at most C iterations.unt Of work involved in one iteration when the current flow is [. as above. To find an s-t path in G[. and hence so wil! the capacities of the new residual graph.7. due to the capacity condition on the edges leaving s. {7.4) Suppose. P) Update [ to be f’ Update the residual graph G[ to be G[. We noted above that no flow in G can have value greater than C. . (C may be a huge. Theft v(f’) : v(f) + bottleneck(P. Then the Ford-Fulkerson Algorithm terminates in at most C iterations of the While loop. the value bottleneck(P. the value of the flow would be y~. See Figure 7. Here’s one upper bound: If all the edges out of s could be completely saturated with flow. f). What is not at all clear is w. We know from (7._we have 9(f’) > . Let n denote the number of nodes in G. the flow values {f(e)} and the residual capacities in G[ are integers. hethe! its central ~h±le loop terminates. it does not visit s again. it increases by at least 1 in each iteration. f).3). ¯ Next we consider the running time of the Ford-Fulkerson Algorithm. Now. {7. by (7.3). the value of the flow maintained by the Ford-Fulkerson Algorithm increases in each iteration. that all capacities in the flow network G are integers. and one containing the edges leaving v. Proof. after the two researchers who developed it in 1956. (7. We therefore consider the amo. First we show that the flow value strictly increases when we apply an augmentation. f) > 0.1 The Maximum-Flow Problem and the Ford-Fulkerson Algorithm 344 Chapter 7 Network Flow 345 This augmentation operation captures the type of forward and backward pushing of flow that we discussed earlier. and m denote the number of edges in G. and whether the flow returned is a maximum flow. since each edge of G gives rise to at most two edges in the residual graph. overestimate of the maximum value of a flow in G. and cannot go higher than C. f) for the augmenting path found in iteration j + 1 will be an integer. ~ Analyzing the Algorithm: Termination and Running Time First we consider some properties that the algorithm maintains by induction on the number of iterations of the ~hile loop. and since the path is simple. relying on our assumption that all capacities are integers. The statement is clearly true before any iterations of the Vhile loop. Thus we have v(f) < C for all s-t flows f.eoutofsCe. Let’s now consider the following algorithm to compute an s-t flow in G. Let C denote this sum.B) Suppose. Then. and we do not change the flow on any other edge incident to s. one containing the edges entering v.

Note that if (A. To make progress toward this goa!. we will not only learn a lot about the algorithm. although the proof requires a little manipulation of sums. an activity that will occupy this whole section. and (A. Finally. and fin(A) = 0 as there are no edges entering the source by assumption. In view of this. If e has only its head in A. This says that we could have originally defined the value of a flow equally wel! in terms of the sink t: It is fEn(t). If an edge e has both ends in A. We make this precise via a sequence of facts. The procedure augment (f.I edges.346 Chapter 7 Network Flow 7. If we set A = V . This statement is actually much stronger than a simple upper bound. we can exactly measure the flow value: It is the total amount that leaves A. B): . with a "+’. B) any s’t cut. O(m + n) is the same as O(m). By our assumption the sink t has no leavj. as the source s has no entering edges.’ng edges. since the only term in this sum that is nonzero is the one in which v is set to s. since all the flow must cross from A to B somewhere. (7. Thus we have f°Ut(A) = fin(B) and fin(A) = f°Ut(B).6) is the following upper bound.f°Ut(B) = fin(t) . which we wil! denote c(A. if e has neither end in A.{t} and B = {t} in (7. Proof. By definition u(f) = f°Ut(s). so we have f°ut(t) = O. then f(e) also appears just once in the sum. so that s s A and t ~ B. Sometimes this bound is useful. then f(e) appears just once in the sum. In the process.1: the way in which the structure of the flow network places upper bounds on the maximum value of an s-t flow. as expressed by our intuition above. (7.6} Let f be any S-t flow. but also find that analyzing the algorithm provide~ us with considerable insight into the Maximum-Flow Problem itself. As in our discussion in Section 7. A very useful consequence of (7. Formally. is simply the sum of the capacities of all edges out of A: c(A. then f(e) doesn’t appear in the sum at all. and hence these two terms cancel out. then f°Ut(A) = f°Ut(s). Similarly. as the path P has at most n . If e has only its tail in A.fin(v) = v~A e out of A f!~ Analyzing the Algorithm: Flows and Cuts Our next goal is to show that the flow that is returned by the Ford-F~kerson Algorithm has the maximum possible value of any flow in G. B) = ~e out of ACe" Cuts turn out to provide very natural upper bounds on the values of flows. We now use the notion of a cut to develop a much more genera! means of placing upper bounds on the maximum-flow value. we have the statement of (7.B) any s-t cut.B) any s-t cut. then the edges into B are precisely the edges out of A. B). So the statement for this set A = {s} is exactly the definition of the flow value u (f).fin(u)). This makes sense intuitively. and (A.6) in the following way. with a "-".. Then V(f) f(e) . It says that by watching the amount of flow f sends across a cut. we have ~ f°ut(v) . but sometimes it is very weak. [] A somewhat more efficient version of the algorithm would maintain the linked lists of edges in the residual graph Gf as part of the augment procedure that changes the flow f via augmentation.B.6). Thus v(f) = ~(f°ut(v) . we say that an s-t cut is a partition (A. A and .fin(v) = 0 for all such nodes. (7. u~A 7.8) Let f be any s’t flow. If A = {s}. any such division places an upper bound on the maximum possible flow value. minus the amount that "swirls back" into A. We have already seen one upper bound: the value u(f) of any s-t-flow f is at most C = ~e out of s Ce. so that s ~ A and t s B. we return to an issue that we raised in Section 7. e into A Putting together these two equations.~ f(e) = f°Ut(A) . we have v(f) = fin(B) .2 Maximum Flows and Minimum Cuts in a Network 347 which run in O(m + n) time. Since every node v in A other than s is internal. we know that f°ut(v) . then f(e) appears once in the sum with a "+" and once with a "-". so we can write v(f) = f°Ut(s) . B) of the vertex set V. B). the amount of flow axfiving at the sink. B) is a cut. Given the new flow f’. just by comparing the definitions for these two expressions. by our assumption that ra >_ n/2. Consider dividing the nodes of the graph into two sets.f°ut(t). P) takes time O(n). and (A.fin(s). the edges out of B are precisely the edges into A.2 Maximum Flows and Minimum Cuts in a Network We now continue with the analysis of the Ford-Fulkerson Algorithm.7) Let f be any s-t flow. Then v(f) < c(A. By assumption we have fin(s) = 0. Let’s try to rewrite the sum on the right as follows. we construct the correct forward and backward edges in G[.fin(A). So we can rephrase (7. we can build the new residual graph in O(rn) time: For each edge e of G.7). The capacity of a cut (A. Then v(f) =fin(B) -f°Ut(B).1.

fin(A*) = 2 f(e).: Next. we would obtain an s-v path in Gf. while all edges into A* are completely unused. The Ford-Fulkerson Algorithm terminates when the flow f has no s-t path in the residual graph Gf. and (A*. contradicting our assumption that v s B*. We want to show that ~ has the maximum possible value of any flow in G. However.B*) in G for which v(f) = c(A*.8) says is that the value of every flow is upper-bounded by the capacity of every cut. [] In a sense. and that (A*.6) to reach the desired conclusion: v(f) = f°Ut(A*) . ) ~ f(e) out of A e out of A = c(A. and since v’ ~ A*. v) is an edge in G for which u ~ A* and v ~ B*. we pass from the first to the second since fro(A) >_ 0.. Now suppose that e’ = (u’. ~ four (A) fin (A) Residual graph wi(U. This turns out to be the only property needed for proving its maximality. /~ Analyzing the Algorithm: Max-Flow Equals Min-Cut Let ~ denote the flow that is returned by the Ford-Fulkerson Algorithm. we would obtain an s-u’ path in Gf. In other words. [] . To this end. let A* denote the set of all nodes v in G for which there is an s-v path in @. thus we must now identify such a cut. if we exhibit any s-t cut in G of some value c*. since its right-hand side is independent of any particular flow f. Let B* denote. u’) in the residual graph Gf.e into A* f(e) 0 e out of A* =c(A*. So all edges out of A* are completely saturated with flow. n(U’. v’) is an edge in G for which u’ ~ B* and v’ ~ A*. there is an s-v’ path in Gf. v) is samrated~ th flow. . We claim that f(e’) = 0. B*) has the minimum capacity of any s-t cut. appending e to this path. there is an s-u path in Gf. Moreover.B*). e’ would give rise to a backward edge e" = (v’.. For if not. and since u ~ A*. suppose that e = (u. f has the maximum value of any flow in G..2 Maximum Flows and Minimum Cuts in a Network 348 Chapter 7 Network Flow 349 Proof.5.A*. B*) has the minimum capacity of any s-t cut in G.6). Here the first line is simply (7. (7. The statement claims the existence of a cut satisfying a certain desirable property.5 The (A*.9). (7. Conversely. B*). Consequently. contradicting our assumption that u’ ~ B*. We claim that f(e) = Ce. appending e" to this path.. Figure 7. t ~ A* by the assumption that there is no s-t path in the residual . It is clearly a partition of V. since it is only an inequality rather than an equality. We can now use (7.8) looks weaker than (7. The source s belongs to A* since there is always a path from s to s. B*). v’) carries~ o flow. we know immediately by (7.9) If f is an s-t-flow such that there is no s-t path in the residual graph then there is an s-t cut (A*. What (7. B*) cut in the proof of (7.7. as shown in Figure 7. This immediately establishes that ~ has the maximum value of any flow.~e set of all other nodes: B* = V . e would be a forward edge in the residual graph Gf. For if not. B). it will be extremely useful for us. we know immediately by (7. and we pass from the third to the fourth by applying the capacity conditions to each term of the sum. Proof. B*) is indeed an s-t cut.8) that there cannot be an s-t cut in G of value less than v*. First we establish that (A*.6).8) that there cannot be an s-t flow in G of value greater than c*. and we do this by the method discussed above: We exhibit an s-t cut (A% B*) for which v(~) = c(A*. if we exhibit any s-t flow in G of some value v*.

the Ford-Fulkerson Algorithm with real-valued capacities can run forever. for if there were a flow f’ of greater value.9).7. and by (7.14).12) must be a maximum s-t flow. it follows that (A. So one can prove (7. Proof. we maintain an integer-valued flow at all times. since we can determine the least common multiple of all capacities. as follows.12) is often called the Max-Flow Min-Cut Theorem. our algorithmic approach here provides what is probably the easiest way to prove it.6) immediately implies its optimality. B*). B*). the problem of pathological choices for the augmenting paths can manifest itself even with integer capacities: It can make the Ford-Fulkerson Algorithm take a gigantic number of iterations. Curiously. in (7.4).8). B’) of smaller capacity. the maximum value of an s-t flow is equal. and return the cut (A*. We then define B* =~ V .4).9). that the value of the flow increased by at least 1 in every step.14) does not claim that every maximum flow is integer-valued. (7. and perform breadth-first search or depth-first search to determine the set A* of all nodes that s can reach. starting from a maximum flow TAs a bonus. The flow -~ returned by the Ford-Fulkerson Algorithm is a maximum We also observe that our algorithm can easily be extended to compute a minimum s-t cut (A*.2) to establish.11) Given a flow f of maximum value. we should be concerned that the value of our flow keeps increasing. we can see why the two types of residual edges-forward and backward--are crucial in analyzing the two terms in the expression from (7.12) is a minimum cut--no other cut can have smaller capacity--for if there were a cut (A’.10} flow.11) is simply finding a particular one of these cuts. one can still prove that the Max-Flow Min-Cut Theorem (7. (7. in retrospect. which clearly needed it). B) in (7. (7. and this again would contradict (7. (7. (7.5) and (7. we conclude with a maximum flow. the value of f’ would exceed the capacity of (A. We simply follow the construction in the proof of (7. for the following reason: With pathological choices for the augmenting path. Further Analysis: Integer-Valued Flows Among the many corollaries emerging from our analysis of the Ford-Fuikerson Algorithm. . to the minimum capacity of an s-t cut. the residual graph has no s-t-path. and multiply them all by this value to obtain an equivalent problem with integer capacities. and hence we have no guarantee that the number of iterations of the loop is finite.A*. B). there.. Note that (7. And this turns out to be an extremely real worry. We construct the residual graph Gf.8).9) assumed only that the flow f has no s-t path in its residual graph @. B). First we notice that allowing capacities to be rational numbers does not make the situation any more general. With real numbers as capacities. we relied on it quite crucially: We used (7. B) so that v(f) = c(A. By (7.14) If all capacities in the flow network are integers. the capacities in any practical application of network flow would be integers or rational numbers. However.2 Maximum Flows and Minimum Cuts in a Network 350 Chapter 7 Network Flow 351 Note how. we can ask how crucial our assumption of integer capacities was (ignoring (7. only that some maximum flow has this property. it would be less than the value of f. here is another extremely important one. then there is a maximum flow f for which every flow value f(e) is an integer.6). Thus we have (7. Of course. but in increments that become arbitrarily smaller and smaller. But what if we have real numbers as capacities? Where in the proof did we rely on the capacities being integers? In fact. However. we can compute an s-t cut Of minimum capacity in O(m) time. the procedure in the proof of (7. and this would contradict (7. there is a flow f and a cut (A. exists a maximum flow.12) In every flow network. (7. Given that the Ford-Fulkerson Algorithm terminates when there is no s-t in the residual graph. (7. Clearly. and is phrased as follows.13) In every flow network. in order to conclude that there is an s-t cut of equal value. Note that (7. Real Numbers as Capacities? Finally. The point is that f in (7. Due to these implications.2).14) makes no reference to the Ford-Fulkerson Algorithm.12) in the case of real-valued capacities by simply establishing that for every flow network. we have obtained the following striking fact through the analysis of the algorithm. for any flow f of maximum value. [] Note that there can be many minimum-capacity cuts in a graph G.12) is true even if the capacities may be real numbers. otherwise there would be a way to increase the value of the flow. although (7. before moving on. Similarly.

t and path s. u. we can improve this bound significantly. v). For the next augmenting path. In this case. consider the example graph in Figure 7. so the reverse edge is in the residual graph. u. This flow can be obtained by a sequence of two augmentations. however. u.~x~ 100 (b) 99 Pl (d) Figure 7. I00 7. u. using the paths of nodes s. Recall that augmentation increases the value of the maximum flow by the bottleneck capacity of the selected path. v). In this second augmentation. t in this order. so the edge is again in the residual graph. and has f(e) = 100 for the edges (s. we saw that any way of choosing an augmenting path increases the value of the flow. A natural idea is to select the path that has the largest bottleneck capacity. v.3 Choosing Good Augmenting Paths 353 In the next section.6). (v. u. u). 100 .4). t in order.2. u. (s. t in this order (as shown in Figure 7. where C = ~e out of s % When C is not very large. since C = 200 in this example. This is exactly the bound we proved in (7. this can be a reasonable bound. we discuss how to select augmenting paths so as to avoid the potential bad behavior of the algorithm. each augmentation will have 1 as the bottleneck capacity. t in order and the path P2 through the nodes s. but this time assume the capacities are as follows: The edges (s. Instead. v) has capacity !. u).6. We focus here on one of the most natura! approaches and will mention other approaches at the end of the section. (s. we will be making a lot of progress. and it will take 200 augmentations to get the desired flow of value 200. Having to find such paths can slow down each individual iteration by quite a bit. Let Gf(A) be the subset of the residual graph consisting only of edges with residual capacity of at least A. we choose the path P2 of the nodes s. and this led to a bound of C on the number of augmentations.3 Choosing Good Augmenting Paths In the previous section. Suppose we start with augmenting path P1 of nodes s. After this second augmentation. This path has bottleneck(P1. The algorithm is as follows. so if we choose paths with large bottleneck capacity. it is very weak when C is large. But consider how bad the Ford-Fulkerson Algorithm can be with pathological choices for the augntenting paths. u.352 Chapter 7 Network Flow 7. We-will avoid this slowdown by not worrying about selecting the path that has exactly the largest bottleneck capacity. A large amount of work has been devoted to finding good ways of choosing augmenting paths in the Maximum-Flow Problem so as to minimize the number of iterations. After this augmentation. we have f(e) = 0 for the edge e = (u. (v. f) = 1 as well. v. f) = 1. we have [(e) = 1 on the edge e = (u. To get a sense for how bad this bound can be. ~ Designing a Faster Flow Algorithm The goal of this section is to show that with a better choice of paths. v). We will work with values of A that are powers of 2. Suppose we alternate between choosing PI and P2 for augmentation. t. u). t) and (u. It is easy to see that the maximum flow has value 200. and the edge (u. t) and (u. as shown in Figure 7. v). we get bottleneck(P2. t) and value 0 on the edge (u. Scaling Max-Flow Initially f(e)= 0 for all e in G Initially set A to be the largest power of 2 that is no larger than the maximum capacity out of s: A<maXeoutofsCe While A >_ 1 While there is an s-t path in the graph Let P be a simple s-t path in G/(A) . and we will look for paths that have bottleneck capacity of at least A. u. we will maintain a so-called scaling parameter A.6 Parts (a) through (d) depict four iterations of the Ford-Fu~erson Algorithm using a bad choice of augmenting paths: The augmentations alternate between the path Pl through the nodes s. t) have capacity 100.

appending e" to this path. Gf( A ) is the same as Gp and hence when the algorithm terminates the [low. This implies that when A = 1. the flow [ cannot be too far from the maximum possible value. and since u ~ A. v) in G for which u ~ A and v ~ B. it drops by factors of 2. The new loops. The idea here is that we are using paths that augment the flow by a lot. B) to obtain the bound claimed in the second statement. contradicting our assumption that u’ ~ B. It is easy to give an upper bound on the number of different h-scaling phases. /2~_~ Analyzing the Algorithm First observe that the new Scaling Max-Flow Algorithm is really just an implementation of the original Ford-Fulkerson Algorithm. Let A denote the set of all nodes v in G for which there is an s-v path in Gf(A). For if this were not the case. There is an s-t cut (A. then e would be a forward edge in the graph G~(A).16) The number of iterations o[ the outer While loop is at most 1 + [log2 C]. f is o[ maximum value.18) Let f be the [low at the end of the A-scaling phase. We use the cut (A. This proof. So all edges e out of A are almost saturated--they satisfy ce < f(e) + A-and all edges into A are almost empty--they satisfy f(e) < A.7. and hence all residual capacities are integer-valued. Now consider an edge e = (u.A. We call an iteration of the outside ~nile loop--with a fixed value of h--the h-scaling phase. (7. we would obtain an s-u’ path in Gf(A). We can now use (7.3 Choosing Good Augmenting Paths 354 Chapter 7 Network Flow f’ = augment(f.8). we claim that for any edge e’ = (u’.6) to reach the desired conclusion: ~(f)= e out of A f(e).17) Durin~ the A-scaling phase. then throughout the Scaling MaxFlow Algorithm the flow and the residual capacities remain integer-valued. Here the first inequality follows from our bounds on the flow values of edges across the cut. B) in G for which c(A. the maximum [low in the network has value at most v(f) + m A. Hence all the properties that we proved about the original Max-Flow Algorithm are also true for this new version: the flow remains integer-valued throughout the algorithm. B) with the desired property. we have [(e’) < A.mA. B) is indeed an s-t cut as otherwise the phase would not have ended. Consequently. we would obtain an s-v path in G[(A). there is an s-v’ path in Gf(A). P) Update f to be f’ and update Gf(A) Endwhile zX = ~/2 Endwhile Return f 355 The key insight is that at the end of the A-scaling phase. and the second inequality follows from the simple fact that the graph only contains m edges total. and it never gets below 1. if [(e’) > A. (7. During the Ascaling phase. The initial value of A is at most C. As in that proof. where m is the number of edges in the graph G. and so there should be relatively few augmentations. then e’ would give rise to a backward edge e" = (v’. the value A. Using (7. [low value by at least A. Similarly. We can see that (A.3).is analogous to our proof of (7. we have (7. (7. >_ e out of A (Ce-A). there is an s-u path in Gf(A).~ A e into A =Ece. Proof. u’) in the graph Gf(A). each augmentation increases the. The harder part is to bound the number of augmentations done in each scaling phase.9). B) < v(f) + mA.15) I[ the capacities are integer-valued. in terms of the value C = ~e out of s ce that we also used in the previous section. we must identify a cut (A. appending e to this path. The maximum-fiow value is bounded by the capacity of any cut by (7. . Next we consider the running time. v’) in G for which u’ ~ B and v’ ~ A. contradicting our assumption that v ~ B. B) . Let B denote the set of all other nodes: B = V . which established that the flow returned by the original Max-Flow Algorithm is of maximum value. We claim that ce < [(e) + A. we only use edges with residual capacity of at least A. and since v’¢ A. Indeed.~ f(e) e intoA Thus.E A-EA e out of A e out of A e into A > c(A. and the restricted residual graph Gf(A) are only used to guide the selection of residual path--with the goal of using edges with large residual capacity for as long as possible.

which are polynomial in the magnitudes of the input numbers but not in the number of bits needed to represent them. and so the algorithm will have to maintain something less well behaved than a flow--something that does not obey conservation--as it operates.Flow Algorithm in a graph with m edges and irtteger capacities finds a mlaximum flow in at most 2//1(1 + [log2 C]) augmentations. for short) is a function f that maps each edge e to a nonnegative real number. ~ Designing the Algorithm Algorithms based on augmenting paths maintain a flow f. In this section we study one such technique.18).. and independently Edmonds and Karp. these were the first polynomial algorithms for the Maximum-Flow Problem.) number of iterations that is polynomial in the numbers 4 and 5. Changing the flow on a single edge will typically violate the conservation condition.Flow Algorithm 357 (7. Dinitz. As a result. e into v e out of v Extensions: Strongly Polynomial Algorithms Could we ask for something qualitatively better than what the scaling algorithm guarantees? Here is one thing we could hope for: Our example graph (Figure 7. which is polynomial in IVI and IEI only. while the scaling algorithm only requires time proportional to the number of bits needed to specify the capacities in the input to the problem. Preflows We say that an s-t preflow (prefIow. we had capacities of size 100. The statement is clearly true in the first scaling phase: we can use each of the edges out of s only for at most one augmentation in that phase. we used A’ = 2A as our parameter. there are some very powerful techniques for maximum fl0w that are not explicitly based on augmenting paths. We have at most 1 + [log2 C1 scaling phases and at most 2m augmentations in each scaling phase. In the A-scaling phase.20) The Scaling Max. which can require close to C iterations.19) The number of aug//1entations in a scaling phase is at most 2//1. Bad implementations of the Ford-Fnlkerson Algorithm. we’ll discuss a strongly polynomial maximum-flow algorithm based on a different principle. this time bound is much better than the O(mC) bound that applied to an arbitrary implementation of the Ford-Fulkerson Algorithm. the scaling algorithm is running in time polynomial in the size of the input (i. It can be implemented to run in at most 0(//l2 log2 C) time. Such an algorithm. where the last bound assumes that all capacities are integral and at most U. the maximum flow f* is at most v(f*) <_ u(fp) + mlA’ = v(fp) + 2mlA. and use the augment.356 Chapter 7 Network Flow 7. In our example at the beginning of this section. the PreflowPush Algorithm will. Thus we have the following result. In place of the conservation conditions. so it would be nice to use a . and so it meets our traditional goal of achieving a polynomial-time algorithm. A preflow f must satisfy the capacity conditions: (i) For each e E E. f : E --~ R+. and works with numbers having a polynomial number of bits. our discussion of the Maximum-Flow Problem-has been centered around the idea of an augmenting path in the residual graph. completely independently of the values of the capacities. There are currently algorithms that achieve running times of O(mln log n). (ii) For each node v other than the source s. (Recall that in Section 6. Proof. By (7. the number of edges and the numerical representation of the capacities). while the scaling algorithm will take time proportional to log2(21°°) = 100. we require only inequalities: Each node other than s must have at least as much flow entering as leaving. in this case. the Preflow-Push Algorithm. By way of contrast. procedure to increase the value of the flow.4 The Preflow-Push Maximum-Flow Algorithm From the very beginning. However. including the time required to set up the graph and find the appropriate path. When C is large. In the next section. (7. " An augmentation takes 0(//1) time. in essence. and let fp be the flow at the end of the previous scaling phase. In fact.4 The Preflow-Push Maximum. we have ~ f(e)>_ ~ f(e). increase the flow on an edge-by-edge basis. One way to view this distinction is as follows: The generic Ford-Fulkerson Algorithm requires time proportional to the/magnitude of the capacities. Now consider a later scaling phase A.6) had four nodes and five edges. In that phase. and hence there can be at most 2//1 augmentations. and O(min(n2/~. we have 0 < f(e) <_ ce.e. but we could just as well have used capacities of size 21°°. ml/a)m log n log U). is called a strongly polynomial algorithm. proved that with this choice the algorithm terminates in at most O(mn) it6rations. the generic Ford-Fulkerson Algorithm could take time proportional to 21°°. There has since been a huge amount of work devoted to improving the running times of maximum-flow algorithms. 7. do not meet this standard of polynomiality. there is a simple and natural implementation of the Ford-Fulkerson Algorithm that leads to such a strongly polynomial bound: each iteration chooses the augmenting path with the fewest number of edges. O(n3).4 we used the term pseudo-polynomia! to describe such algorithms. In fact. each augmentation increases the flow by at least A.

However. while the steepness condition will help by making the descent of the flow gradual enough to make it to the sink. This contradiction proves the claim. Notice that a preflow where all nodes other than s and t have zero excess is a flow. We will push flow from nodes with higher labels to those with lower labels. we define the initial Heights 4 ~_may dges in the residual gr~phq not be too steep. while (7. we can invoke (7. we wil! need to define an initial preflow f and labeling h that are compatible. and h(s) = n. The Ford-Fulkerson Algorithm maintains a feasible flow while changing it gradually toward optimality.9) that if there is no s-t path in the residual graph Gf of a flow f. To start the algorithm.22) If s-t flow f is compatible with a labeling h. and k < n as the path P is simple.. The key property of a compatible preflow and labeling is that there can be no s-t path in the residual graph. . as our initial labeling. Pref~ows and Labelings The Preflow-Push Algorithm will maintain a preflow and work on converting the preflow into a flow.4 The Preflow-Push Maximum-Flow Algorithm 359 We wil! call the difference e into v e out of v Intuitively. as shown in Figure 7. we need to make sure that no edges leaving s are in the residual graph (as these edges do not satisfy the steepness condition).7. Thus the Preflow-Push Algorithm will maintain a preflow f and a labeling h compatible with f.22) is more restrictive in that it applies only to flows. vk = t. h(t) = 0 by definition. the height difference n between the source and the sink is meant to ensure that the flow starts high enough to flow from s toward the sink t. Assume that the nodes along P are s." The "heights" for this intuition will be labels h(v) for each node v that the algorithm will define and maintain. The source node s must have h(s) = n and is not drawn in the figure. Notice that the last node of the path is vk = t. and the value of the flow is exactly el(t) = -ef(s). and it will work on modifying f and h so as to move f toward being a flow. We can still define the concept of a residual graph Gf for a preflow f. ) 3 2 0 Nodes Figure 7.k. (7. vl) is in the residual graph. [] Recall from {7. This implies the following corollary. on the other hand. the excess of the preflow at node v. To make this precise.i..1 = n .21) If s-t preflow f is compatible with a labeling h. then there is no s-t path in the residuhI graph Gf. (7. v1 .22) to conclude that it is a maximum flow. : Note that (7. Using induction on i and the steepness condition for the edge (vi_~. maintains a condition that would imply the optimality of a preflow f. then the flow has maximum value. and hence h(v~) > h(s) . iust as we did for a flow. hence we get that h(t) > n . The Preflow-Push Algorithm. Let P be a simple s-t path in the residual graph G. The algorithm is based on the physical intuition that flow naturally finds its way "downhi!l. and the algorithm gradually transforms the preflow f into a flow. if it were to be a feasible flow. We prove the statement by contradiction. In light of this. To make a preflow f compatible with this labeling. The edge (s. we have h(v) <_ h(w) + 1. then f is a flow of maximum value. we have that h(s) = n.21) applies to preflows. No edge in the residual graph can be too "steep"--its taft can be at most one unit above its head in height.. following the intuition that fluid flows downhi!l.7 A residual graph and a compatible labeling.1. a labeling is a function h : V -+ Z>0 from the nodes to the non_negative integers. vi). we can view the Preflow-Push Algorithm as being in a way orthogonal to the FordFulkerson Algorithm.358 Chapter 7 Network Flow 7. We will also refer to the labels as heights of the nodes. To this end. The algorithm wil! "push" flow along edges of the residual graph (using both forward and backward edges). we get that for all nodes vi in path P the height is at least h(vi) > n .. We wil! say that a labeling h and an s-t preflow f are compatible if O) (Source and sink conditions) h(t) = 0 and h(s) = n. Once f actually becomes a flow. Proof. By definition of a labeling compatible with preflow f. (ii) (Steepness conditions) For all edges (v. We will use h(v) = 0 for all v 7~ s. w) ~Ef in the residual graph.

h. Preflow-Push Initially h(u)=O for all v~s and h(s)=n and f(e)=ce for all e= (s. If we cannot push the excess of v along any edge leaving v. [ is a flow by definition. and hence the preflow f is in fact a flow.24) Throughout the Preflow-Push Algorithm: (i) the labels are nonnegative integers./~ Analyzing the Algorithm As usual. then the preflow [ is integral. h. ~ve will have to specify which node with excess to choose. u) Applicable if el(v) > O. If there is any edge e in the residual graph @ that leaves v and goes to a node w at a lower height (note that h(w) is at most I less than h(v) due to the steepness condition). h. to) Proof.22) that f would be a maximum flow at termination. and [(e) = 0 for all other edges. the reverse edge (u. ~ = rain@f (u).) Further. The relabel operation increases the label of v. and hence increases the steepness of all edges leaving u. it only applies when no edge leaving v in the residual graph is going downward. U) Endwhile Return(f) (7. (We’ll discuss later how to implement it reasonably efficiently. It then follows from (7. The algorithm terminates if no node other than s or t has excess. h. to) is a backward edge then let e = (to. then we will need to raise v’s height. in summary. and hence the preflow [ and the labeling h are compatible after relabeling. w) Applicable if el(v)>0. push(f. relabel(f. w). However. and that excesses all remain nonnegative. and for all edges (v. it is clear that each iteration of this algorithm can be implemented in polynomial time. The push operation modifies the preflow [. In this case. I[ the algorithm returns a preflow [. then [ is a flow of maximum value. However. For an implementation of the algorithm. then we can modify [ by pushing some of the excess flow from v to w. h) . h.23) the initial preflow [ and labeling h are compatible. u. while keeping it compatible with some labeling h. h) The Full Pretlow-Push Algorithm So. ce . and since the preflow [ and the labeling h . but the bounds on ~ guarantee that the [ returned satisfies the capacity constraints. and this edge does satisfy the steepness condition. ~0 [ is a preflow. and i[ the capacities are integral. v) and f(e)=0 for all other edges While there is a node v # t with excess el(u)> 0 Let ~ be a node with excess If there is to such that push(f. v. u). Consider any node v that has excess--that is. tu) can add one edge to the residual graph.f(e)) and increase f(e) by 3 If (u. We summarize a few simple observations about the algorithm. and how to efficient[ select an edge on which to push. it is not hard to see that the preflow f and the labeling h are compatible throughout the algorithm. the Preflow-Push Algorithm is as follows. Else relabel(/. We will call this a push operation. v.23) The initial preflow f and labeling h are compatible. so [ is a preflow. f(e)) and decrease f@) by ~ Return(f. (7. w) is a forward edge then let 3 = min(ef(u). h(to)<h(v) and If e= (v. and (iiO the pre[low [ and labeling h are compatible. v. to) ~ Ef we have h(to) >_ h(u) Increase h(u) by 1 Return(f. Pushing and Relabeling Next we will discuss the steps the algorithm makes toward turning the preflow [ into a feasible flow. v) leaving the source. To see that the preflow [ and the labeling h are compatible. note that push(/. this algorithm is somewhat underspecified. el(v) > O. We wil! show using induction on the number of push and relabel operations that [ and h satisfy the properties of the statement. We wi!l call this a relabel operation.7. h. to) can be applied then push(f. If the algorithm terminates--something that is far from obvious based on its description--then there are no nodes other than t with positive excess. By (7.4 The Preflow-Push Maximum-Flow Algorithm 360 Chapter 7 Network Flow 361 preflow as [(e) = Ce for a~ edges e = (s.

In other words. v. Next we wil! bound the number of push operations. Now consider the sum of excesses in the set B. After a saturating push(f. to) appear in_the residual graph.~(fin(v) -. w) is no longer in the residual graph Gf. w) operation. (7.29) Throughout the algorithm. or (v. By (7. h. and the total number of relabeIing operations is less than 2n2.25) Let f be a preflotv. (7. with a "-". as an edge with f(e) > 0 would give rise to a reverse edge (y. (7. and recall that each node in B has nonnegative excess. Further. So we get 0 <_. The hardest part of the analysis is proving a bound on the number of nonsaturating pushes. no edges e = (x. . so this statement immediately implies a limit on the number of relabeling operations.1. x) in the residual graph. so a saturating push from v to tv can occur at most n times. Now we are ready to prove that the labels do not change too much.362 Chapter 7 Network Flow 7. The following consequence of this fact will be key to bounding the labels. h. w) operation is saturating if either e = (v. first we have to push from tv to v to make the edge (v. the number of saturating push operations is at most 2nm. and this also will be the bottleneck for the theoretical bound on the running time. "-". w) in the residual graph. The only source of flow in the network is the source s. Recall that n denotes the number of nodes in V. We only consider a node v for relabel when v has excess. since each individual excess in B is nonnegative. The initial labels h(t)= 0 and h(s)= n do not change during the algorithm. each node is relabeled at most 2n . such that there is a path from w~ to s in the residual graph Gf.1 times. as shown in Figure 7. which proves the statement. The algorithm changes v’s label only when applying the relabel operation. then f(e) appears once in the sum with a "+’" and once with a. and let B = V-A. We will distinguish two kinds of push operations.f(e). So we simply need to give a limit on how high a label can get. [] Next we will consider the number of push and relabel operations. We need to show that all nodes with excess are in A.22) implies that f is a flow of maximum value. A push(f. The algorithm never changes the label of s (as the source never has positive excess). w) is a backward edge with e = (w. If e has only its head in B.h(s) < IPI.25) there is a path P in the residual graph Gf from v to s. so overall we can have at most 2nm saturating pushes. then f(e) appears iust once in the sum. Notice that s ~ A.1. t. [] as s CB. The steepness condition implies that heights of the nodes can decrease by at most 1 along each edge in P. and we saw above that all edges leaving A have f(e) = 0. to) is no longer in the residual graph. hence. 0 <_ ~ el(v). the edge (v. and note that IPI < n . First we will prove a limit on the relabel operations.4 The Preflow-Push Maximum-Flow Algorithm 363 remain compatible throughout the algorithm. we see that the sum of the excesses in B is zero. Consider some other node v # s. [] (7. and hence these two terms cancel out. then e leaves A. If the node v has excess. all nodes have h(v) <_ 2n . the push is saturating if.1 times. after the push. we have h(v) = h(w) + 1. in order to push from tv to v. y) leaving A can have positive flow. the number of nonsaturating push operations is at most 2ham. Each other node v starts with h(v) = 0.28) Throughout the algorithm. Consider an edge (v. Proof. ~ er(v) = -f°Ut(B)¯ vEB Since flows are nonnegative. All other push operations will be referred to as nonsaturating. v. If e has only its tail in B. However. and its label increases by 1 every time it changes.26) Throughout the algorithm. Let IPI denote the number of edges in P. Each edge e ~ E can give rise to two edges in the residual graph.r°ut(v)) wB Let’s rewrite the sum on the right as follows.27) Throughout the algorithm. (7. The label of tv can increase by 2 at most n . the excess at v must have originated at s. and this will help prove a limit on the maximum number of push operations possible. h. we first need for tv’s label to increase by at least 2 (so that tv is above v). Let A denote all the nodes u.8. Labels are monotone increasing throughout the algorithm. If an edge e has both ends in B. so let f and h be the preflow and labeling returned by a relabel(f. and hence h(v) . then there is a path in G[ from v to the source s. they must therefore all be 0. Proof. (7. and then y would have been in A. v) operation. rv) is a forward edge in Ef and ~ = ce . and the edge (v. Proof. v) and 8 = f(e). intuitively. Before we can push again along this edge.

t. Implementing the Preflow-Push Algorithm Finally.1. This maximum height H can only increase due to relabeling (as flow is always pushed to nodes at lower height). Maintaining a few simple data structures will allow us to effectively implement . As a follow-up to (7. ¯ (f. is at a height 1 less than v.26). However. we define ~(f. during this phase.7. h. This is a point that we pursue further in some of the exercises. A relabel operation increases cb(f. w) operation does not change labels. the only node that gets new excess from the operation. and H changes at most 4n2 times. but it can increase q~(f. and so the total increase in H throughout the algorithm is at most 2nz by (7. There are at most 2nz relabel operations. which is at most 2n. h) can increase by at most 4ranz during the algorithm. v. h) = 0. w) operation decreases cb(f. Proof. A nonsaturating push(f.1. We claim that each node can have at mo~t one nonsaturating push operation during this phase.1). then the number of nonsaturating push operations throughout the algorithm is at most 4n3. A saturating push(f. so ~b (f. u. ~(f. For this proof. This would increase q~(f. since after the push the node v will have no excess. h). and it decreases by at least 1 on each nonsaturating push operation. h).h)= ~ h(v) v:ef(v)>O to be the sum of the heights of all nodes with positive excess. So. h.30) If at each step we choose the node with excess at maximum heigh. so the number of times H changes is at most 4nz.4 The Preflo~v-Push Maximum-Flow Algorithm 364 Chapter 7 Network Flow 365 Heights ~pnuhe height of node w has to~ crease by 2 before it can | sh flow back to node v.30). [] Extensions: An Improved Version of the Algorithm There has been a lot of work devoted to choosing node selection rules for the Preflow-Push Algorithm to improve the worst-case running time. we will use a so-called potential function method. (~ is often called a potential since it resembles the "potential energy" of all nodes with positive excess. There are at most 2nm saturating push operations. [] Proof. it must receive flow from a node at height H + I before we can push from it again. all nodes with positive excess are at height 0. and w. Consider the maximum height H = maxu:er(u)>0 h(v) of any node with excess as the algorithm proceeds. v. since the node w may suddenly acquire positive excess after the push. h) by exactly 1. Here we consider a simple" rule that leads to an improved O(n3) bound on the number of nonsaturating push operations. it is interesting to note that experimentally the computational bottleneck of the method is the number of relabeling operations. h) due to push operations is at most 2mn(2n . h) remains nonnegative throughout the algorithm. each saturating push and each relabel operation can increase q~(f. For a preflow f and a compatible . flow is being pushed from nodes at height H to nodes at height H . h) by at least l.labeling h. Indeed. Since there are at most n nonsaturating push operations between each change to H. it follows that there can be at most 4ran2 nonsaturating push operations. (7. and a better experimental running time is obtained by variants that work on increasing labels faster than one by one. between the two causes. ) But since qb remains nonnegative throughout. Now consider the behavior of the algorithm over a phase of time in which H remains constant.8 After a saturating push(f. so the total increase in q~(f.) In the initial prefiow and labeling. h) due to relabel operations is 2nz. h. Nodes Figure 7. the height of u exceeds the heig. we need to briefly discuss how to implement this algorithm efficiently. and after a nonsaturating push operation from v. so the total increase in q~(f. H starts out 0 and remains nonnegative. w). The analysis will use this maximum height H in place of the potential function qb in the previous O(nam) bound.ht of tv by 1. the total number of nonsaturating push operations is at most 4n3. h) by the height of w.

In particular. it remains a node with maximum height H. as claimed. So. E Since edges do not have to be considered again for push before relabeling. 7. h. we reset current(v) to the first edge on the list and start considering edges again in the order they appear on v’s list. then the maximum height will be H . The key observation is that. the pointer current(v) will stay at this edge. tv) reenter the residual graph. then by (7. we get the followi. when we apply push to edge (tv.ng. In the latter case. we will maintain a linked list of all nodes with excess at every possible height. roe cannot apply push to this edge until v gets relabeIed. We will select edges leaving a node v for push operations in the order they appear on node v’s list. and with each edge we keep its capacity and flow value. Thus the total time spent on advancing the current pointers throughout the algorithm is O(y~. More precisely. we discuss the Bipartite Matching Problem mentioned at the beginning of this chapter. Note that this way we have two copies of each edge in our data structure: a forward and a backward copy. tv). implemented using the above data structures. and overall to implement the algorithm in time O(mn) plus the number of nonsaturating push operations. tvhiIe the version tvhere rye altvays select the node at maximum height runs in O(na) time. After relabeling node v. Now assume we have selected a node v. one needs to apply push to the reverse edge (tv. if node v no longer has excess after a nonsaturating push operation out of node v. Both push and relabel operations can be implemented in O(1) time. . or continues to have positive excess after a push.33) The running time of the Preflotv-Push Algorithm. and so v needs to be relabeled before one can push flow from v to tv again. Consider a node v. Proof. we advance current(v) to the next edge on the list.u~V ndu) = O(mn). and so we will be able to select a node with excess in constant time. once the operation has been selected. the generic Preflotv’Push Algorithm runs in O(n2m) time.1. In the next section.5 A First Application: The Bipartite Matching Problem 367 the operations of the algorithm in constant time each. ~ Proof. for each node v. between two times that node v gets relabeled. (7. we discuss the more general Disjoint Paths Problem.. so that updates done at one copy can be carried over to the other one in O(1) time. (7. edge is not in the residual graph. we will use the adjacency list representation of the graph. (7. One has to be a bit more careful to be able to select a node with maximum height H in constant time.1.32) When the current(v) pointer reaches the end of the edge list for v. After a saturating push operation out of node v. First. In order to do this. At the moment current(v) is advanced from the edge (v. in this section. there is some reason push cannot be applied to this edge. we will maintain a pointer current(v) for each node v to the last edge on the list that has been considered for a push operation. Thus we only have to select a new node after a push when the current node v no longer has positive excess. tv). Either h(tv) > h(v). In the first case. We begin with two very basic applications. and we need to select an edge (v. We know that v can be relabeled at most 2n times throughout the algorithm. we will maintain. We can maintain all nodes with excess on a simple list. However. w). To facilitate this selection. is O(mn) plus 0(1) for each nonsaturating push operation. or the . u) to make (v.32) we spend O(du) time on advancing the current(v) pointer between consecutive relabelings of v. then tv is above v. v). These two copies will have pointers to each other. The initial flow and relabeling is set up in O(m) time. since the previous push operation out of v pushed flow to a node at height H . v. we will not want to apply push to this edge again until we relabel v. we clearly need to relabel v before applying a push on this edge. To be able to select an edge quickly. w) (or relabel(f. If node v was at height H. We will consider the total time the algorithm spends on finding the fight edge on which to push flow out of node v. then the new node at maximum height will also be at height H or.366 Chapter 7 Network Flow 7. all possible edges leaving v in the residual graph (both forward and backward edges) in a linked list.5 A First Application: The Bipartite Matching Problem Having developed a set of powerful algorithms for the Maximum-Flow Problem. while the version that always selects the node at maximum height will run in O(n3) time. u) if no sluch W exists). if no node at height H has excess. we now turn to the task of developing applications of maximum flows and minimum cuts in graphs.31) After the current(v) pointer is advanced from an edge (v. w) on which to apply push(f. Note that whenever a node v gets relabeled. the relabel operation can be applied to node v.. Hence the generic algorithm will run in O(mn2) time. after advancing the pointer current(v) from an edge (v. and we will use the same edge for the next push operation out of v. h. If node v has du adjacent edges.

. (7. M’ contains/c edges.14). First. consider the cut (A.5 A First Application: The Bipartite Matching Problem 369 368 Chapter 7 Network Flow /~--"~ Analyzing the Algorithm ~ The Problem One of our original goals in developing the Maximum-Flow Problem was to be able to solve the Bipartite Matching Problem. and an edge (s. since there are no edges entering A. (b) The corresponding flow network.9. Proof. Proof. we give each edge in G’ a capacity of 1. Conversely. Since our flow is integer-valued. at least two units of flow would have to come into x--but this is not possible. We then add a node s. our analysis will show how one can use the flow itself to recover the matching. Thus. suppose x ~ X were the tail of at least two edges in M’. we see that if we view M’ as a set of edges in the original bipartite graph G. [] By the same reasoning. and we now show how to do this. this means that f(e) is equal’to either 0 or 1 for each edge e.37) The size of the maximum matching in G is equal to the value of the maximum flow in G’ . . By conservation of flow. We will discover that the value of this maximum is equal to the size of the maximum matching in G.. We now compute a maximum s-t flow in this network G’. Recall that a bipartite graph G = (V. xij. Thus x is the tail of at most one edge in M’. To prove this. ~ Designing the Algorithm The graph defining a matching problem is undirected. y) on which the flow value is 1. Finally. we have proved the following fact. We add a node t. Yil) . we construct a flow network G’ as shown in Figure 7. and each carries exactly one unit of flow. Beginning with the graph G in an instance of the Bipartite M~tching Problem.7. yik). but it is actually not difficult to use an algorithm for the Maximu_mFlow Problem to find a maximum matching.35) Each node in X is the tail of at most one edge in M’. (7. suppose there is a flow f’ in G’ of value k. f(e) = 1 for each edge on one of these paths. and since all capacities are 1. suppose there is a matching in G consisting of/~ edges (xil. with the property that every edge e ~ E has one end in X and the other end in Y. since only a single edge of capacity 1 enters x.. [] (7. Ca3 Figure 7. The value of the flow is the total flow leaving A.36) Each node in Y is the head of at most one edge in M’. E) is an undirected graph whose node set can be partitioned as V = X U Y. Here are three simple facts about the set M’. The first of these terms is simply the cardinality of M’. consider the set M’ of edges of the form (x. we can show (7. Moreover. and the edges in such a matching in G are the edges that carry flow from X to Y in G’. yij. The analysis is based on showing that integer-valued flows in G’ encode matchings in G in a fairly transparent fashion. One can verify easily that the capacity and conservation conditions are indeed met and that f is an s-t flow of value k. To prove this. A matching M in G is a subset of the edges M ___ E such that each node appears in at most one edge in M. Combining these facts. t) from each node in Y to t. The second of these terms is 0. t--that is. and an edge (y. Then consider the flow f that sends one unit along each path of the form s.. (Xik. B) in G’ with A = {s} U X. this means that at least two units of flow leave from x.34) M’ contains t~ edges. In summary. minus the total flow entering A. we get a matching of size k. with all capacities equal to 1. we know there is an integer-valued flow f of value k. The Bipartite Matching Problem is that of finding a matching in G of largest possible size. since these are the edges leaving A that carry flow. By the integrality theorem for maximum flows (7. Now. First we direct all edges in G from X to Y. x) from s to each node in X.9 (a) A bipartite graph. while flow networks are directed.

Let n = IXI = IYI. we have that C = ~eoutofs Ce "~.10 (a) A bipartite graph. then clearly the graph has no perfect matching: both xl and x2 would need to get matched to the same node y.... as s has an edge of capacity I to each node of X... Y2). we want a certificate that has a natural meaning in terms of the original graph G. consider a subset of nodes A _ X. Note that the edges (x2. (c) The matching obtained by the augmentation.IXI =-n. Y2) and (x3. . so f is not a maximum s-t flow. But let’s ask a slightly less algorithmic question. Ys) in the bipartite graph in Figure 7. These bounds were designed to be good for all instances.. see also Figure 7210. (7. since converting this to a matching in G is simple... by using the O(mC) bound in (7.1.. upon concluding that there is no perfect matching. We’ll tacitly assume that there is at least one edge incident to each node in the original problem. as all edges of the graph G’ go from X to Y. . There is nothing contradictory in this. The time to compute a maximum matching is dominated by the time to compute an integer-valued maximum flow in G’. Because the augmenting path goes from s to t. and replace them with the edges going forward. Let f be the corresponding flow in G’.10(b).. So.38} The Ford-Fulkerson Algorithm can be used to find a maximum match" ing in a bipartite graph in O(mn) time.. (b) The augmenting path in the corresponding residual graph. More generally.. we get the following. and all other edges are used forward.5). One such augmenting path is marked in Figure 7.. One way to understand the idea of such a certificate is as follows. Bounding the Running Time Now let’s consider how quickly we can compute a maximum matching in G. and the other end of each edge is the same node y. it would be nice if the algorithm. Y3)’ and (xs.... Thus.. . in a way.5 A First Application: The Bipartite Matching Problem 371 Note the crucial way in which the integrality theorem (7.. and so the cost of this extra sophistication is not needed. a cut with capacity less than n provides such a certificate.. and let F(A) _c y denote the set of all nodes . For this flow problem. we’d get the inferior running times of O(m2 log n) or O(n~) for this problem. (a) (b) Figure 7. It is worthwhile to consider what the augmenting paths mean in the network G’.. and hence m >_ n/2.. Augmenting paths are therefore also called alternating paths in the context of finding a maximum matching. thus the size of the matching increases by one..370 Chapter 7 Network Flow 7.. with a matching M. without havhag to look over a trace of the entire execution of the algorithm....... : ~ .14) figured in this construction: we needed to know if there is a maximum flow in G’ that takes only the values 0 and 1.. even when C is very large relative to m and n. ~ ~ .. The effect of this augmentation is to take the edges used backward out of the matching. The certificate could allow someone to be quickly convinced that there is no perfect matching. we’ve seen how to find perfect matchings: We use the algorithm above to find a maximum matching and then check to see if this matching is perfect. and hence there is an augmenting path in the residual graph G}. It’s interesting that if we were to use the "better" bounds of O(m2 log2 C) or 3) O(n that we developed in the previous sections. (X3... . We can decide if the graph G has a perfect matching by checking if the maximum flow in a related graph G’ has value at least n. What might such a certificate look like? For example.. Consider the matching M consisting of edges (x2. However. and let m be the number of edges of G. Not all bipartite graphs have perfect matchings. y3) are used backward. This matching is not maximum. x2 ~ X that have only one incident edge each.. there will be an s-t cut of capacity less than n if the maximum-flow value in G’ has value less than n. if there are nodes x1. after we run the algorithm? More concretely.. Extensions: The Structure of Bipartite Graphs with No Perfect Matching Algorithmically. there is one more forward edge than backward edge.... could produce a short "certificate" of this fact. All augmenting paths must alternate between edges used backward and forward. By the Max-Flow Min-Cut Theorem. What does a bipartite graph without a perfect matching look like? Is there an easy way to see that a bipartite graph does not have a perfect matching--or at least an easy way to convince someone the graph has no perfect matching... But C = rt for the Bipartite Matching Problem...

11(a). We need to show that if the value of the maximum flow is less than n. provided we add the obvious condition that IXl = Igl (without which there could certainly not be a perfect matching). then there is a cut (A’.) Next consider the capacity of this minimum cut (A’. Now the assumption that c(A’.39) If a bipartite graph G = (V. The edges crossing the cut are dark.6 Disjoint Paths in Directed and Undirected Graphs In Section 7. The proof of the statement also provides a way to find such a subset A in polynomial time. B’) that has F(A) _. But is the converse of (7. don’t have to be concerned about nodes x ~ X that are not in A. as we’ve seen in (7. then there is a subset A such that IV(A)I < IAI. By (7.) (a) (b) Figure 7. overall. Let’s see if we can revive the more dynamic.11. then for all A c_ X we must have IF(A)[ >_ This statement suggests a type of certificate demonstrating that a graph does not have a perfect matching: a set A c__ X such that IF(A)I < IAI. t) now crosses the cut.37). the capacity of the cut cannot increase. Assume that IXI = IYI = n. we see that the only edges out of A’ are either edges that leave the source s or that enter the sink t. But our actual definition of a flow has a much more static feel to it: For each edge e. A perfect matching or an appropriate subset A can be found in O(mn) time: Proof. (b) The same cut after moving node y to the A’ side. as it goes from B’ to A’.11 (a) A minimum cut in proof of (7. E) has two sides X and Y such that [XI = [YI. but this edge does not add to the capacity of the cut. y) with x ~ A. We claim that the set A =X N A’ has the claimed property. there is a set A like this that proves it? The answer turns out to be yes. then each node in A has to be matched to a different node in F(A). B’) < n implies that n . and may contain nodes from both X and Y as shown in Figure 7. Comparing the first and the last terms.IAI + IF(A)[ _< IX A B’I 3. so F(A) has to be at least as large as A.372 Chapter 7 Network Flow 7. This will prove both parts of the statement.1. as claimed in the statement. and IY aA’] > [F(A)]. This gives us the following fact.12). where A =X C/A’ as before. For what happens when we move y from B’ to A’? The edge (y. If the graph has a perfect matching. since y ~ F(A).c A’ as shown in Figure 7. Thus the capacity of the cut is exactly Notice that IX ~B’] = n -]A[.40) Assume that the bipartite graph G = (V. we simply specify a number f(e) saying the amount of flow crossing e. if the maximum-flow value is less than n. y) will be on different sides of the cut.39) also true? Is it the case that whenever there is no perfect matching. B’) with capacity less than n in G’.37) the graph G has a maximum matching ff and only if the value of the maximum flow in G’ is n. increasing the capacity by one.11) that a minimum cut (A’.40). Now the set A’ contains s. [] 7. First we claim that one can modify the minimum cut (A’. all edges from A and y used to cross the cut. We claim that by moving y from B’ to A’. and don’t anymore.6 Disjoint Paths in Directed and Undirected Graphs 373 that are adjacent to nodes in A.IY ~ A’[ = c(A’. traffic-oriented picture a bit. consider a node y ~ F(A) that belongs to B’ as shown in Figure 7. we described a flow f as a kind of "traffic" in the network. The two ends of the edge (x. (7. This statement is known in the literature as Hall’s Theorem. To do this. We wil! use the same graph G’ as in (7. we get the claimed inequaliW [A[ > /F(A)I. we do not increase the capacity of the cut. B’) can also be found by running the Ford-Fulkerson Algorithm. Since all neighbors of A belong to A’. Thus. (Note that we tNode y can be moved "~ o the s-side of the cut. But previously there was at least one edge (x. E) with two sides X and Y has a perfect matching. By the Max-Flow Min-Cut Theorem (7. B’) so as to ensure that F(A) _ A’. B’) < n. though versions of it were discovered independently by a number of different people--perhaps first by KSnig--in the early 1900s.Then the graph G either has a perfect matching or there is a subset A c_X such that IF(A)I < IAI. and try formalizing the sense in which units of flow "travel" from the source to . (7.1! (b).

we get the v edge-disjoint paths as claimed.Chapter 7 Network Flow 7. with its two distinguished nodes s and t. ~ Designing the Algorithm Both the directed and the undirected versions of the problem can be solved very naturally using flows. despite the fact that the Maximum-Flow Problem was defined for a directed graph. E) with two distinguished nodes s. This new flow f’ has value v . we will extend the Disjoint Paths Problem to undirected graphs. . We now proceed to prove this converse statement. (The edges in the figure all carry one unit of flow. not only give us the maximum number of edge-disjoint paths. ~. Thus computing a maximum flow in G will If the first case happens--we find a path P from s to t--then we’ll use this path as one of our u paths. then the value of the maximum s-t flow in G is at least k.1.42) If f is a 0-1 valued flow of value u. and f(e’) = 0 on all other edges. which. confirming that this approach using flow indeed gives us the correct answer. Applying the induction hypothesis for f’. By (7. We’ll see that. Otherwise. and then there must be an edge (v. If u = 0. but it has fewer edges that carry flow. then there exist k edge-disioint s-t paths. but also node-disjoint (of course.~ The Problem In defining this problem precisely. Let’s start with the directed problem. and the flow is integer-valued. along with path P. we will make precise this intuitive correspondence between units of flow traveling along paths. From this more dynamic view of flows. we can make progress in a different way. Consider the cycle C of edges visited between the first and second appearances of v. respectively.I edge-disjoint paths. which has just reached a node v for the second time.41) as well: If there is a flow of value k. If we continue in this way. (7. and so forth. E). Given the graph G = (V. To prove this~ we will consider a flow of value at least k.41) If there are k edge-disjoint paths in a directed graph G from s to t. We obtain a new flow f’ from f by decreasing the flow values on the edges along C to 0. we get v . we define a flow network in which s and t are the source and sink. Since all edges have a capacity bound of 1. and it has fewer edges that carry flow. since it will immediately establish the optimality of the flow-based algorithm to find disjoint paths. Applying the induction hypothesis for f’. t ~ V. If P reaches a node v for the second time.41) is the heart of the analysis. and construct k edge-disioint paths. it can naturally be used also to handle related problems on undirected graphs. u) that carries one unit of flow.6 Disjoint Paths in Directed and Undirected Graphs 375 the sink. we will arrive at something called the s-t Disjoint Paths Problem.14). though multiple paths may go through some of the same nodes.. and with a capacity of 1 on each edge.) In this case. We say that a set of paths is edge-disjoint if their edge sets are disjoint. Thus we just need to show the following. or we will reach a node v for the second time. The Undirected Edge-Disjoint Paths Problem is to find the maximum number of edge-disioint s-t paths in an undirected graph G. each edge that carries flow under f has exactly one unit of flow on it. v) that carries one unit of flow. Now suppose there are k edge-disioint s-t paths. w) that carries one unit of flow. other than at nodes s and t) will be considered in the exercises to this chapter. Suppose we could show the converse to (7. the Directed Edge-Disjoint Paths Problem is to find the maximum number of edge-disjoint s-t paths in G. Proof. Second. but the paths as well. that is. one of two things will eventually happen: Either we will reach t. form the u paths claimed. First.. The related question of finding paths that are nbt only edge-disioint. and the dashed edges indicate the path traversed sofar. We now "trace out" a path of edges that must also carry flow: Since (s. there is nothing to prove. Given a directed graph G = (V. ~ Analyzing the Algorithm Proving the converse direction of (7. there must be an edge (s. we wil! deal with two issues.12. then we have a situation like the one pictured in Figure 7. we know that there is a maximum flow f with integer flow values. Our analysis will also provide a way to extract k edge-disioint paths from an integer-valued flow sending k units from s to t. then the set of edges with flow value f(e) = 1 contains a set of u edge-disjoint paths. We can make each of these paths carry one unit of flow: We set the flow to be f(e) = 1 for each edge e on any of the paths. . We prove this by induction on the number of edges in f that carry flow. This new flow f’ has value v. u) carries a unit of flow. Let f’ be the flow obtained by decreasing the flow values on the edges along P to 0. no two paths share an edge. (7. and the notion of flow we’ve studied so far. Then we could simply compute a maximum s-t flow in G and declare (correctly) this to be the maximum number of edge-disioint s-t paths. it follows by conservation that there is some edge (u. and this defines a feasible flow of value k.

can also be made to run in O(mn) time.2 Extensions: Disjoint Paths in Undirected Graphs Finally. the maximum number of edge-disjoint s-t paths is equal to the minimum number of edges whose removal separates s from t.42) in the following result. since they were discovered in this order. having thought a lot about these problems. which produces the paths themselves.376 Chapter 7 Network Flow 7.42). and perhaps even considered it obvious. K6nig didn’t believe it could be right and stayed up all night searching for a counterexample. v) in G by two directed edges (u. the proof of Hall’s Theorem (7.6 Disjoint Paths in Directed and Undirected Graphs 377 I~low around a cycle~ ~can be zeroed out. in fact. the opposite happened. We can summarize (7. Thus. then each s-t path must use at least one edge from F. note that this procedure. one of the independent discoverers of HaWs Theorem. You might think that K6nig. . But.43} There are k edge-disjoint paths in a directed graph G from s to t if and only if the value of the maximum value of an s-t flow in G is at least k.45) is often called Menger’s Theorem. then.5). (7. We say that a set F ___ E of edges separates s from t if. no s-t paths remain in the graph. we can use the maximumflow algorithm to obtain edge-disjoint paths in G. A Version of the Max-Flow Min-Cut Theorem for Disjoint Paths The MaxFlow Min-Cut Theorem (7. Notice also how the proof of (7. and hence the number of edge-disjoint_ s-t paths is at most IFI. we get an integer maximum flow in O(mn) time.I2 The edges in the figure all carry one unit of flow. To prove the other direction. The next day. Figure 7. zation of the maximum number of edge-disjoint s-t paths. can be viewed as the natural special case of the MaxFlow Min-Cut Theorem in which all edge capacities are equal to ~. Despite the fact that our graph G is now undirected. removing these u edges from G separates s from t. In fact. we consider the disjoint paths problem in an undirected graph G. and so it can be proved using Menger’s Theorem rather than the general Max-Flow Min-Cut Theorem. much before the full MaxFlow Min-Cut Theorem was formulated and proved. with a little care. The path decomposition procedure in the proof of (7.45) In every directed graph with nodes s and t. given an integer-valued maximum flow in G.43) the maximum number of edge-disjoint paths is the value v of the maximum s-t flow. The path P of dashed edges is one possible path in the proof of (7.40) for bipartite matchings involves a reduction to a graph with unit-capacity edges. B) with capacity v. If we think about it. we have shown (7. Hence we have shown that our flow-based algorithm finds the maximum number of edge-disjoint s-t paths and also gives us a way to construct the actual paths. In summary. after removing the edges F from the graph G. so IFI = v and. exhausted.13). and hence in O(m) time. as there are at most IVI edges out of s. he sought out Menger and asked him for the proof. Now (7. To see this. Hall’s Theorem is really a specia! case of Menger’s Theorem. (7. In other words. Let F be the set of edges that go from A to B. Menger relates his version of the story of how he first explained his theorem to K6nig. can produce a single path from s to t using at most constant work per edge in the graph. Bounding the Running Time For this flow problem.13) can be used to give the following characteri- This result. This procedure is sometimes referred to as a path decomposition of the flow. Each edge has capacity 1. by the definition of an s-t cut. C = ~eoutofs ce < Igl-= n. by using the O(mC) bound in (7.42) provides an actual procedure for constructing the k paths. would have immediately grasped why Menger’s generalization of his theorem was true. each of which has capacit’] 1. Since there can be at most n 1 edge-disioint paths from s to t (each must use a different edge out of s). it therefore takes time O(mn) to produce al! the paths. since it "decomposes" the flow into a constituent set of paths.41) and (7. And the history follows this progression.13) states that there is an s-t cut (A.42). for this reason. v) and a In fact. {7. we will use the Max-Flow Min-Cut Theorem (7. this special case was proved by Menger in 1927.44) The Ford-Fulkerson Algorithm can be used to find a maximum set of edge-disjoint s-t paths in a directed graph G in O(mn) time. in an interesting retrospective written in 1981. *. By (7. which in turn is a special case of the Max-Flow Min-Cut Theorem. Proof. The idea is quite simple: We replace each undirected edge (u. a few decades apaxt. If the remOva! of a set F __c E of edges separates s from t.

As before. We use S to denote the set of all nodes with negative demand and T to denote the set of all nodes with positive demand. and mo-dify f by decreasing the flow value on both e and e’ by 8. and its value on one of e and e’ is 0. we focus on two generalizations of maximum flow. then the other must go from the t-side to the s-side). the node is a sink.42) to obtain edge-disjoint paths in the undirected graph G. Now." and look for more general conditions we might impose on this traffic. we wi!l consider a problem where sources have fixed supply values and sinks have fixed demand values. These more general conditions will turn out to be useful for some of our further applications. v) and e’= (v. If du = 0. there is a maximum . Rather. Furthermore.) Now we want to use the Ford-Fulkerson Algorithm in the resulting directed graph. We consider any maximum flow f. then there also is such an integral maximum flow. Now suppose that there can be a set S of sources generating flow. for example. If du > 0.7 Extensions to the Maximum-Flow Problem Much of the power of the Maximum-Flow Problem has essentially nothing to do with the fact that it models traffic in a network. u). we simply want to satisfy all the demand using the available supply. In particular. However. has the same value as f. that the network represents a system of highways or railway lines in which we want to ship products from factories (which have supply) to retail outlets (which have demand). either f(e) = 0 or f(e’) = O. (We may delete the edges into s and out of t. (7. u) are opposite directed edges. If du < 0. u). and we modify it to satisfy the claimed condition. However. 7. it will be okay for it to have flow that enters on incoming edges. f(e’) ~: 0. and it wishes to send out -du units more flow than it receives. and a set T of sinks that can absorb flow. Proof. we stay with the picture of flow as an abstract kind of "traffic. the maximum number of edge-disjoint s-t paths is equal to the minimum number of edges whose removal separates s from t. we will not be seeking to maximize a particular value. then the node v is neither a source nor a sink. Let ~ be the smaller of these values. this indicates that the node v has a demand of du for flow. Bipartite Matching is a natural first application in this vein. Notice that two paths P1 and P2 may be edge-disjoint in the directed graph and yet share an edge in the undirected graph G: This happens if P1 uses directed edge (u. If the capacities of the flow network are integral. We will assume that all capacities and demands are integers. since they are not useful. rather. Assume e = (u. (7. as in any s-t cut. Although a node v in S wants to send out more flow than it receives.47) There are k edge-disjoint paths in an undirected graph G ~rom s to t if and only if the maximum value of an s-t flow in the directed version G’ of G is at least k. and it wishes to receive du units more flow than it sends out. The undirected analogue of (7. be solved in polynomia! time because they can be reduced to the problem of finding a maximum flow or a minimum cut in a directed graph. Imagine. there is an integer capacity on each edge. at most one of the two oppositely directed edges can cross from the s-sid~ to the tside of the cut (for if one crosses. v) and e’ = (v. (7. [] Now we can use the Ford-Fulkerson Algorithm and the path decomposition procedure from (7. this indicates that v has a supply of -du. and f(e) 7~ 0. With multiple sources and sinks. The same applies (in the opposite direction) to the set T. In this type of problem. To begin with. and our goal is to ship flow from nodes with available supply to those with given demands. it lies in the fact that many problems with a nontrivial combinatorial search component can . E) with capacities on the edges.46) In any flow network. u).flow f where for all opposite directed edges e = (u. Thus we are given a flow network G = (V. We will see that both can be reduced to the basic Maximum-Flow Problem. associated with each node v ~ V is a demand du. in the coming sections. So instead of maximizing the flow value. The resulting flow f’ is feasible. v) while P2 uses edge (v.378 Chapter 7 Network Flow 7. ~ The Problem: Circulations with Demands One simplifying aspect of our initial formulation of the Maximum-Flow Problem is that we had only a single source s and a single sink t. there is an important issue we need to deal with first. and in this way create a directed version G’ of G.45) is also true. the Ford-Fulkerson Algorithm can be used to find a maximum set of disjoint s-t paths in an undirected graph G in O(mn) time.48) In every undirected graph with nodes s and t. it is not hard to see that there always exists a maximum flow in any network that uses at most one out of each pair of oppositely directed edges. we investigate a range of further applications. the node is a source.7 Extensions to the Maximum-Flow Problem 379 (v. it is a bit unclear how to decide which source or sink to favor in a maximization problem. it should just be more than compensated by the flow that leaves v on outgoing edges.

as shown in Figure 7.f°ut(u) = du. The reduction looks very much like the one we used for Bipartite Matching: we attach a "super-source" s* to each node in S. u).13 shows the result of applying this reduction to the instance in part (a). and since this holds for all values f(e). we have 0 < f(e) < Ce. the overall sum is 0. Note that there cannot be an s*-t* flow in G’ of value greater than D.f°ut(u)" Now. and a node t* that "siphons" the extra flow out of the sinks. part (b) of Figure 7. t*). and a flow value of du on each edge (u. consider the instance in Figure 7. we can think of this reduction as introducing a node s* that "supplies" all the sources with their extra flow. we are concerned with a feasibility problem: We want to know whether there exists a circulation that meets conditions (i) and (ii). t*) with capacity du. in this latter expression. and so this is a maximum flow. we know that E v:dv>O v:du<O In this setting. J Figure 7. Conversely. These two terms cance! out. (b) The result of reducing this instance to an equivalent instance of the Maximum-Flow Problem. If we consider an arbitrary instance of the Circulation Problem. For each node u ~ S--that is. Now.49). Intuitively. Figure 7. More specifically..14. We carry the remaining structure of G over to G’ unchanged. the value f(e) for each edge e = (u. and a "super-sink" t* to each node in T. O) (Capacity conditions) For each e E E..~ Designing and Analyzing an Algorithm for Circulations It turns out that we can reduce the problem of finding a feasible circulation with demands {dr} to the problem of finding a maximum s-t flow in a different network. suppose there is a (maximum) s*-t* flow in G’ of value D. each node with du < 0--we add an edge (s*. and two of the nodes are sinks. /.49) If there exists a feasible circulation with demands {du}. Thanks to (7. Suppose there exists a feasible circulation f in this setting. B) with A = {s*} only has capacity D. we create a graph G’ from G by adding new nodes s* and t* to G. For example.380 Chapter 7 Network Flow 7. with demands 2 and 4. (7.7 Extensions to the Maximum-Flow Problem 381 supplies source~ ith flow. In this graph G’. Two of the nodes are sources.v du = ~u fin(u) . then ~u du = O.13 (a) Aninstance of the Circulation Problem together with a solution: Numbers inside the nodes are demands. ot* siphons flow~ ut of sinks. we will be seeking a maximum s*-t* flow. The flow values in the figure constitute a feasible circulation. u) is counted exactly twice: once in f°Ut(u) and once in fin(u). we have u. we say that a circulation with demands {d. (ii) (Demand conditions) For each u ~ V. Proof. if there is a feasible circulation f with demands {du} in G. indicating how all demands can be satisfied while respecting the capacities. here is a simple condition that must hold in order for a feasible circulation to exist: The total supply must equal the total demand. u) with capacity -du. Then ~-~. numbers labeling the edges are capadties and flow values. we obtain an s*t* flow in G’ of value D. with the flow values inside boxes. since the cut (A. instead of considering a maximization problem. For each node v ~ T--that is. It must be that every edge . [] Let D denote this common value.} is a function f that assigns a nonnegative real number to each edge and satisfies the following two conditions. For example. with demands -3 and -3. each node u with dv > 0--we add an edge (u. Now.14 Reducing the Circulation Problem to the Maximum-Flow Problem.13(a). fin(u) . then by sending a flow value of -du on each edge (s*.

51) The graph G has a [easible circulation with demands {dr} i[ and only i[ [or all cuts (A. We know that on each edge e.5. This can be enforced by placing lower bounds on edges.. This is the same as the instance we saw in Figure 7. we not only want to satisfy demands at various nodes. We will assume 0 _< ge <.50) There is a [easible circulation with demands {dr} in G i[ and only i[ the maximum s*-t* [low in G~ has value D.e.fl°Ut(v) = dv . and there is a [easible circulation. (7. but no lower bounds. (7. This reduces the upper bound on the edge and changes the demands at the two’ ends of the edge. with capacities and demands.[°ut(v) = dv for each node v. and lower bounds are integers. A harder problem is the MuIticommodity Flow Problem. In particular. The given quantities have the same meaning as before. ~ The Problem: Circulations with Demands and Lower Bounds Finally. And how much capacity do we have with which to do this? Having already sent *e units of flow on each edge e. if there is a flow of value D in G’.~e more units to work with. let us generalize the previous problem a little. in turn. (ii) (Demand conditions) For every v ~ V. For example. we eliminate this lower bound by sending two units of flow across the edge.g. B). and absorbed at multiple sinks.7 Extensions to the Maximum-Flow Problem 382 Chapter 7 Network Flow 383 out of s*.40) of bipartite graphs that do not have perfect matchings.ce for each e. we wish to decide whether there exists a feasible circulation--one that satisfies these conditions. and a total of only four units of capacity on its outgoing edges. (i) (Capacity conditions) For each e ~ E. but if not. In the context of circulation problems with demands. In summary. we obtain a circulation [ in G with [in(v) . if we delete these edges.Lu.Lv. These considerations directly motivate the following construction. each node v will also have a demand du. Consider a flow network G = (V. for each i. Thus a circulation in our flow network must satisfy the following two conditions. So suppose that we define an initial circulation fo simply by fo(e) = *e. (We’ve seen that this latter problem. f0 satisfies all the capacity conditions (both lower and upper bounds). except that we have now given one of the edges a lower bound of 2. we also want to force the flow to make use of certain edges. B) is any partition of the node set V into two sets. we need to send at least ge units of flow. we have fin(v) . we cannot place restrictions on which source will supply the flow to which sink. We include the characterization here without a proof. Let the graph G’ have the same nodes and edges. E d~ <_ c(A. then there is a [easible circulation that is integer-valued. In many applications. In part (b) of the figure. but it presumably does not satisfy all the demand conditions.. We can give an analogous characterization for graphs that do not have a feasible circulation. can be reduced to the standard Maximum-Flow Problem.15(a). In the process.13. since after applying the construction there is a node with a demand of -5.) The idea is as follows. as well as the usual upper bounds imposed by edge capacities. I[ all capacities and demands in G are integers. We will discuss this issue further in Chapter 11. The demand of node v will be dv . then we need to superimpose a circulation fl on top of f0 that will clear the remaining "imbalance" at v. it becomes clear that there is no feasible circulation. . and every edge into t*. The characterization uses the notion of a cut.f°ut(v) = d. we can therefore use our algorithm for this latter problem. we used the Max-Flow Min-Cut Theorem to derive the characterization (7. and now a lower bound ge means that the flow value on e must be at least *e. e into v e out of v It is important to note that our network has only a single "Kind" of flow. Further. then we have satisfied the demand condition at v. consider the instance in Figure 7. capacities. with no restriction on which side of the partition the sources and sinks fall. We now claim that our general construction produces an equivalent instance with demands but no lower bounds. we have proved the following. which can be either positive or negative. is completely saturated with flow: Thus. At the end of Section 7. So we need flirt(v) . Although the flow is supplied from multiple sources. we have to let our algorithm decide this. here sink ti must be supplied with flow that originated at source si. we have ce . we have ge -< f(e) _< ce. f0in(v)--f0°ut(v)= *e. then there is such a flow that takes integer values. We will assume that all demands.7. The capacity of edge e wil! be ce . As before. adapted to the present setting. Let us denote this quantity by Lv. ~ Designing and Analyzing an Algorithm with ~Lower Bounds our strategy will be to reduce this to the problem of finding a circulatf~n with demands but no lower bounds. B). As before. E) with a capacity ce and a lower bound ge on each edge e. If Lu = dv. a cut (A.

To make each questionnaire informative. 7. a simple version of a task faced by many companies wanting to measure customer satisfaction. suppose there is a circulation f in G. and fin(v) . for each customer i = ! .Lv) = d~. Each customer wil! receive questions about a certain subset of the products. the problem illustrates how the construction used to solve the Bipartite Matching Problem arises naturally in any setting where we want to carefully balance decisions across a set of options--in this case. and there is a feasible circulation. (Those of you with "Shopper’s Club" cards may be able to guess how this data gets collected. and lower bounds in G are integers. and numbers labeling the edges are capacities. One point that will emerge is the following: Sometimes the solution one wants involves the computation of a maximum flow. to collect sufficient data about each product. More generally. sending customized questionnaires to a particular group of n of its customers. (ge + f’(e)) = Lv + (d~ .) Lower bound of 2 ~ ~ 1 3 (a) Figure 7. k. If all demands.(f’)°ut(v) = E (f(e) ./°ut(v) = E (ge + f’(e)) e into v Here are the guidelines for designing the survey. We begin with a basic application that we call survey design. both flows and cuts are very useful algorithmic tools. We also assign a lower bound of 2 to one of the edges. A customer can only be asked about products that he or she has purchased.~’e) = so it satisfies the demand conditions in G’ as well.) The company wishes to conduct a survey. Further. Consider a company that sells k products and has a database containing the purchase histories of a large number of customers. Then f’ satisfies the capacity conditions in G’.g... capacities. More formally. each customer i should be asked about a number of products between q and c[..~) e into v e out of v if(e) . we have limits pj <_ p. Define a circulation f in G by f(e) = f’(e) + ge..8 Survey Design Many problems that arise in applications can.’and there is an edge between customer i and product j if he or she has ever purchased product j.~e. and define a circulation f’ in G’ by f’(e) = f(e) . and sometimes it involves the computation of a minimum cut. and (f’)in(v) . (7.Then f satisfies the capacity conditions in G.. designing questionnaires by balancing relevant questions across a population of consumers.. on the number of distinct customers that have to be asked about it. for each product j = 1 . Finally. but not too long so as to discourage participation. e out of v so it satisfies the demand conditions in G as well. n. In the next few sections.384 Chapter 7 Network Flow 7.15 (a) An instance of the Circulation Problem with lower bounds: Numbers inside the nodes are demands. Conversely. The goal is to indicate what such reductions tend .. The problem is to decide if there is a way to design a questionnaire for each customer so as to satisfy all these conditions. there must be between pj and pj distinct customers asked about each product j. to look like and to illustrate some of the most common uses of flows and cuts in the design of efficient combinatorial algorithms. (b) The result of reducing this instance to an equivalent instance of the Circulation Problem without lower bounds.52) There is a feasible circulation in G if and only if there is a feasible circulation in G’. but it is often difficult to discover when such a reduction is possible. we have limits ci < c[ on the number of products he or she can be asked about..8 Survey Design 385 b~oiuminating a lower ~ nd from an edge. in fact. First suppose there is a circulation f’ in G’. Proof. to try determining which products people like overall. be solved efficiently by a reduction to Maximum Flow. we give several paradigmatic examples of such problems. then there is a feasible circulation that is integer-valued. ~ The Problem A major issue in the burgeoning field of data mining is the study of consumer preference patterns. the input to the Survey Design Problem consists of a bipartite graph G whose nodes are the customers and the products.

n. Figure 7. s). ~ The Problem Suppose you’re in charge of managing a fleet of airplanes and you’d like to create a flight schedule for them. Our algorithm is simply to construct this network G’ and check whether it has a feasible circulation. . Customer i will be surveyed about product j if and only if the edge (i.9 Airline Scheduling The computational problems faced by the nation’s large airline carriers are almost too complex to even imagine.. its destination airport. i) for each customer i = 1 .) . the toy problem will be much more useful for our purposes than the "real" problem. Here’s a very simple model for this. and its arrival time. s) is the overall number of questions asked. The construction above immediately suggests a way to turn a survey design into the corresponding flow.53) The graph G’ just constructed has a feasible circulation if and only there is a feasible way to design the survey. then by (7. They have to produce schedules for thousands of routes each day that are efficient in terms of equipment usage. Covering these computational problems in any realistic level of detail would take us much too far afield. i) is the number of products included on the questionnaire for customer i.Pittsburgh (arrive 8 A.j) will carry one unit of flow if customer i is asked about product j in the survey..~ Analyzing the Mgorithm (7. and will carry no flow otherwise. we’ll discuss a "toy" problem that captures. as is common in this book.!6..M. edges (j.. add nodes s and t with edges (s. s) corresponds to the overall number of questions asked. and such an integer-valued circulation naturally corresponds to a feasible survey design. The flow on the edges (s. some of the resource allocation issues that arise in a context such as this.. It’s not . the flow on the edge (j. for the solution to the toy problem involves a very genera! technique that can be applied in a wide range of situations. The flow on the edge (j. The edge (i. To obtain the graph G~ from G. This flow satisfies the 0 demand. flight segment j is specified by four parameters: its origin ai~ort. if the Circulation Problem is feasible. crew allocation. /-. Instead. The flow carried by the edge (t. k. The circulation in this network will correspond to the way in which questions are asked.) !¢j Designing the Algorithm We will solve this problem by reducing it to a circulation problem on a flow network G’ with demands and lower bounds as shown in Figure 7. that is. in a very clean way.M. t) for each product ] = 1 . We can give this edge a capacity of ~i c’i and a lower bound of ~i q.9 Airline Scheduling 387 Proof.52) there is a feasible circulation that is integer-valued. there is flow conservation at every node. the flow on edge (t. and 0 as the lower bound...surprising that they’re among the largest consumers of high-powered algorithmic techniques.17(a) shows a simple example.16 The Survey Design Problem can be reduced to the problem of finding a feasible circulation: Flow passes from customers (with capadW bounds indicating how many questions they can be asked) to products (with capadW bounds indicating how many questions should be asked about each product). we orient the edges of G from customers to products. consisting of six flight segments you’d like to serve with your planes over the course of a single day: (I) Boston (depart 6 A. and an edge (t. j) carries a unit of flow.. If the survey satisfies these rules. its departure time. Your market research has identified a set of m particular flight segments that would be very lucrative if you could serve them. Conversely.M. The flow on the edge (s. Figure 7. then the corresponding flo~v satisfies the capacities and lower bounds. 7. and finally. i) is the number of questions asked from customer i. t) is the number of customers who were asked about product j. t) will correspond to the number of customers who were asked about product j. Each edge (i.) . so this edge will have a capacity of c~ and a lower bound of ci. All nodes have demand 0.M. so this edge will have a capacity of p~ and a lower bound of pj. And. and a host of other factors--all in the face of unpredictable issues like weather and breakdowns.Washington DC (arrive 7 A.) (2) Philadelphia (depart 7 A. We now formulate a claim that establishes the correctness of this algorithm. j) going from a customer to a product he or she bought has capacity 1. customer satisfaction..386 Chapter 7 Network Flow Customers Products 7.

.M. (3). These pairs can form an arbitrary directed acyclic graph. f! Designing the Algorithm We now discuss an efficient algorithm that can solve arbitrary instances of the Airline Scheduling Problem.) But the point is that we can handle any set of rules with our definition: The input to the problem will include not just the flight segments. let’s go back to the instance in Figure 7.M. If (ui. We will have an edge for each flight.M.M.Seattle (arrive 6 P. "’-. and (5) with the other (since there wouldn’t be enough maintenance time in San Francisco between flights (4) and (5))..) (4) Philadelphia (depart 11 A. So under our specific rules (a) and (b) above. and then later for a flight segment j. For example. assuming an hour for intermediate maintenance time.M. or in (b) we might require that the flight segment you insert be sufficiently profitable on its own.M. and (uj. we can easily determine for each pair i. and (6) as proposed above.M. PHL 11 . via a different solution: One plane serves flights (!).) (5) San Francisco (depart 2:15 P. each flight must be served by one of the planes. and then later for flight j as we. We will see that flow techniques adapt very naturally to this problem.) .LA~S 5 SE/~A6 PHL 7 PIT 8"’. (3).) ." SFO // 2:15 Figure 7. there is a way to serve all six flights using two planes. and (6) (splicing in PIT-PHL and SFO-LAS). (b) An expanded graph showing which flights are reachable from which others.9 Airline Scheduling 389 Los Angeles (depart 12 noon) . The solution is based on the following idea. However. (Of course. and (6) by having the plane sit in Washington. and flight j is reachable from flight i..) in between flights (3) and (6). PHL 7 PIT 8 PHL 11 SFO 2 (a) SFO 2:15 SEA 3:15 BOS 6 DCA 7~’~)~A 8 LAX II -’--.M. If we use one of the planes for flights (1).) . vj) is the edge representing flight j. (3). using at most k planes total. vi) is the edge representing flight i. provided that (a) the destination of i is the same as the origin of j. the length of maintenance time needed in (a) might depend on the airport. and there’s enough time to perform maintenance on the plane in between. and then inserting the flight The goal in this problem is to determine whether it’s possible to serve all m flights on your original list. and upper and lower capacity bounds of 1 on these edges to require that exactly one unit of flow crosses this edge.M.Seattle (arrive 3:!5 P. (3) Washington DC (depart 8 A.) . In other words. or (b) you can add a flight segment in between that gets the plane from the destination of i to the origin of j with adequate time in between. For example. In order to do this.17 (a) A small instance of our simple Airline Scheduling Problem. between flights (!) and (3).388 BOS 6 DCA 7 DCA 8 Chapter 7 Network Flow LAX II LAS 5 SEA 6 7. one can easily imagine more complex rules for teachability.Los Angeles (arrive 1I A. based on network flow. (4). abstracting away from specific roles about maintenance times and intermediate flight segments: We will simply say that flight j is reachable from flight i if it is possible to use the same plane for flight i. It is possible to use a single plane for a flight segment i.17 and assume we have k = 2 planes. but also a specification of the pairs (i. we wouldn’t be able to serve all of flights (2). then we wil! have an edge from ui to uj .San Francisco (arrive 2 P. you could use a single plane for flights (1). DC.Las Vegas (! P. while the other serves (2). Formulating the Problem We can model this situation in a very general way as follows.) Note that each segment includes the times you want the flight to serve as well as the airports. (4).) (6) Las Vegas (depart 5 P. you need to find a way of efficiently reusing planes for multiple flights. For example. j) for which a later flight j is reachable from an earlier flight i. Units of’flow will correspond to airplanes.ll.j whether flight j is reachable from flight i. and (5) (splicing in an LAX-SFO flight).

The resulting circulation satisfies all demand.) o There is an edge (s. for each path P we create in this way. Conversely. the situation is easier here since the graph has no cycles. And these issues don’t even begin to cover the fact that serving any particular flight segment is not a hard constraint. (If we have extra planes. All other nodes will have a demand of 0. each such edge that carries flow has exactly one unit of flow on it. in this way. and they are genuinely used in practice. 7. It follows by conservation that (ui. (Any plane can begin the day with flight i. Since al! other edges have a capacity bound of 1. that our formulation here is really a toy problem. Extensions: Modeling Other Aspects of the Problem Airline scheduling consumes countless hours of CPU time in rea! life. o For each flight i. o G will also have a distinct source node s and sink node t.42). we can assign a single plane to perform all the flights contained in this path. and the circulation is integer-valued. We extend this to a flow network by including a source and sink. we send k .54) There is a way to perform all flights using at most k planes if and only if there is a feasible circulation in the network G. We can apply this construction to each edge of the form (s. Ultimately. Such a construction of edges is shown in Figure 7.52).) Finally. t). there is an edge (ui. and that there is a unique edge out of vi that carries one unit of flow. /~ Analyzing the Algorithm (7. and we send one unit of flow on each such path P. there is an edge (vp t) with a lower bound of 0 and a capacity of 1. Now. Proof. and lower bound conditions. If we continue in this way. the node s will have a demand of -k. u~) carrying one unit of flow. and the node t will have a demand of k. First of all.) o For each i and j so that flight j is reachable from flight i. there is an edge (vi. each consisting of edges that carry one unit of flow. Vi) with a lower bound of ! and a capacity of 1. running an airline efficiently in real life is a very difficult problem. the graph G will have the two nodes ui and vi.590 Chapter 7 Network Flow 7. Consider an edge (s. and while crews are also reused across multiple flights.10 Image Segmentation A central problem in image processing is the segmentation of an image into various coherent regions.10 Image Segmentation We now convert this to a schedule using the same kind of construction we saw in the proof of (7. consider a feasible circulation in the network G.k’ units of flow on the edge (s. we are making up an optimal schedule for a single day (or at least for a single span of time) as though there were no yesterday or tomorrow. ai) with a lower bound of 0 and a capacity of 1. the message is probably this: Flow techniques are useful for solving problems of this type. where we converted a flow to a collection of paths. The node set of the underlying graph G is defined as follows. al! these planes need to be staffed by flight crews. The set of flights performed by each individual plane defines a path P in the network G. you may have an image representing a picture of three people standing in front of a complex background scene. in fact we also need the planes to be optimally positioned for the start of day N + 1 at the end of day N. vj). we construct a path P from s to t. rather. A . capacity. vi) carries one unit of flow.) o For each i. We mentioned at the beginning. Indeed. Suppose that k’ units of flow are sent on edges other than (s. however. suppose there is a way to perform all flights using k’ <_ k planes. there is an edge (s. we know that there is a feasible circulation with integer flow values. In fact. For example. we produce/~’ paths from s to t. a unit of flow can traverse (ui. we don’t need to use them for any of the flights. we now give the full construction in detail. since human beings and airplanes experience fatigue at different rates. o For each i. vi) and then move directly to (uj. t). in this way. (Any plane can end the day with flight j. and so we can pick and choose among many possible flights to include in our schedule (not to mention designing a good fare structure for passengers) in order to achieve this goal. so that each edge on this path carries one unit of flow. t) with lower bound 0 and capacity k. Third. our solution above is a general approach to the efficient reuse of a limited set of resources in many settings. Our algorithm is to construct the network G and search for a feasible circulation in it. Second. At the same time. The edge set of G is defined as fo!lows. (The same plane can perform flights i and j. [] 391 with capacity 1. We now prove the correctness of this algorithm.) o For eachj. By (7.17(b). it ignores several obvious factors that would have to be taken into account in these applications. uj) with a lower bound of 0 and a capacity of 1. the real goal is to optimize revenue. To satisfl] the full demands at s and t. First. (Each flight on the list mast be served. ai) that carries one unit of flow. a whole different set of constraints operates here. it ignores the fact that a given plane can only fly a certain number of hours before it needs to be temporarily taken out of service for more significant maintenance.

~j~B bj is the same as the sum Q . SO We can write (a) (b) Figure 7. However. In fact. and use E to denote the set of all pairs of neighboring pixels. we have an undirected graph G. The sum E~A IZ~ "Jr. and a likelihood bi that it belongs to the background.]) of neighboring pixels. However. it is natural to picture the pixels as constituting a grid of dots.18(a). We will declare certain pairs of pixels to be neighbors. B).’ or what we mean by the "neighbor" relation. there is a separation penalty Pij >-. whereas for the minimum-cut problem we want to work with a directed graph. B) that maximizes q(A. we should be more inclined to label i as "background" too. and.~ The Problem One of the most basic problems to be considered along these lines is that of foreground/background segmentation: We wish to labe! each pixel in an image as belonging to either the foreground of the scene or the background.j)~E IAr~{i. we need to deal with values ai and bi on the nodes. as shown in Figure 7. we are seeking to maximize an objective function rather than minimizing one.0 for placing one of i or j in the foreground and the other in the background. We can now specify our Segmentation Problem precisely. we will assume that these likelihood values are arbitrary nonnegative numbers provided as part of the problem.10 Image Segmentation 393 natural but difficult goal is to identify each of the three people as coherent objects in the scene. moreover. or how they were determined. For each pixel i. Third. Of course." for example. it is not crucial precisely what physical properties of the image they are measuring.Ej~B a. It turns out that a very natural model here leads to a problem that can be solved efficiently by a minimum cut computation. The problem.18 (a) A pixel graph.392 Chapter 7 Network Flow 7. in terms of the likelihood and separation parameters: It is to find a partition of the set of pixels into sets A and B (foreground and background. so we are free to define these notions in any way that we want. ~ Designing and Analyzing the Algorithm We notice right away that there is clearly a resemblance between the minimumcut problem and the problem of finding an optimal labeling. respectively) so as to maximize (i. then. We deal with the fact that our Segmentation Problem is a maximization problem through the following observation. and to the background otherwise. for each pair (i. Let’s address these problems in order. we have a likelihood ai that it belongs to the foreground. and the neighbors of a pixel to be those ~hat are directly adiacent to it in this grid. We will be deliberately vague on what exactly we mean by a "pixel. In this way.j}l=l Thus we are rewarded for having high likelihood values and penalized for having neighboring pairs (i. E). there is no source and sink in the labeling problem. First. Second.i(ai + bi). Let Q = y~.~A b~ . is to compute an optimal labeling--a partition (A. we obtain an undirected graph G = (V. For our puI~oses. Beyond this. decisions that we make about the neighbors of i should affect our decision about i. If many of i’s neighbors are labeled "background. (b) A sketch of the corresponding flow graph. Thus.]. Not all edges from the source or to the sink are drawn. Let V be the set of pixels in the underlying image that we’re analyzing. .j) with one pixel in A and the other in B. /. In isolation. and that they specify how desirable it is to have pixe! i in the background or foreground. this makes the labeling "smoother" by minimizing the amount of foreground/background boundary. we would want to label pixel i as belonging to the foreground if ai > hi. any graph G will yield an efficiently solvable problem. there are a few significant differences.

10 Image Segmentation 394 Chapter 7 Network Flow 395 (i. as we did in the undirected Disjoint Paths Problem. We will see that this works very well here too.13aE Thus we see that the maximization of q(A. each with capacity p~j.7.18(b). For each pixel i. j). B) obtained by deleting s* and t* maximizes the segmentation value q(A. B). we have an optimal algorithm in our model of foreground/background segmentation. if we want to minimize q’(A. o Edges (i. is something that we know how to solve efficiently. This also gives us a way to deal with the values ai and bi that reside at the nodes (whereas minimum cuts can only handle numbers associated with edges). we add an edge (s. Now. we model each neighboring pair (i. ’A So everything fits together perfectly. then the other must go from the t-side to the s-side). Specifically. where j ~ B. the partition (A. Et) shown in Figure 7. Thus. Specifically. Thus. B~). where i ~ A. an s-t cut (A.j) and (j. j) and (]. And this latter problem. The flow network is set up so that the capacity of the cut (A.])~ =q(. and use a~ and bi to define appropriate capacities on the edges between pixel i and the source-and sink respectively. j~B b. t) with capacity hi. of course.I|I=I . Finally. at most one of these two oppositely directed edges Can cross from the s-side to the t-side of the cut (for if one does. this edge contributes bi to the capacity of the cut. i). correspond to the three kinds of terms in the expression for q’(A. B) (since we have argued earlier that this is equivalent to maximizing q(A. on an example with four pixels. B) that we are trying to minimize. Let’s consider how the capacity of the cut c(A. i) with capacity a~ and an edge (i. we define the following flow network G’ = (V’. For a minimum cut (A’. IANIi. (7. B) corresponds to a partition of the pixels into sets A and B.i As for the missing source and the sink. this edge contributes Pii to the capacity of the cut. B) exactly measures the quantity q’(A. since in any s-t cut. B)).55) The solution to the Segmentation Problem can be obtained by a minimum-cut algorithm in the graph G’ constructed above.jll~l (i. Figure 7. (i.B). B) relates to the quantity q~(A. B): The three kinds of edges crossing the cut (A. we will attach each of s and t to every pixel. If we add up the contributions of these three kinds of edges. we get P~ IAnli. together with two additional nodes s and t. we add directed edges (i. Note how the three types of terms in the expression for q’(A. B) into three natural categories. B). The node set Vt consists of the set V of pixels. to take care of the undirected edges. B) is the same problem as the minimization of the quantity Pij. we work by analogy with our constructions in previous sections: We create a new "super-source" s to represent the foreground. We can group the edges that cross the cut (A. For each neighboring pair of pixels i and j.19 An s-~ cut on a graph constructed from four pLxels. j) with two directed edges. B) are captured by the cut. and a new "super-sink" t to represent the background. as we have just defined them (edges from the source. o Edges (s. B). o Edges (i.)) where i ~ A andj ~ B. i). . and edges involving neither the source nor the sink). t). Figure 7. we just have to find a cut of minimum capaciW. through solving this minimum-cut problem. this edge contributes aj to the capacity of the cut. edges to the sink.19 illustrates what each of these three kinds of edges looks like relative to a cut.

we can also avoid bringing in the notion of infinite capacities by ~ The Problem Here’s a very general framework for modeling a set of decisions such as this. The Project Selection Problem is to select a feasible set of projects with maximum profit. for example. . defined analogously to the graph we used in Section 7. E). To form the graph G’. then we must have j ~ A. B’) is a minimum cut in this graph. we’re talking here about actual mining. i) with capacity pi. as well as the Max-Flow Min-Cut Theorem. and each edge (i. which can either be positive or negative. the revenue from the high-speed access service might not be enough to justify modernizing the routers. others negative. they’ll also be in a position to pursue a lucrative additional project with their corporate customers. the question is: Which projects should be pursued. What makes these types of decisions particularly tricky is that they interact in complex ways: in isolation. where you dig things out of the ground.this additional project will tip the balance. j) ~ E. There is an underlying set P of projects. P U {t}) is C = so the maximum-flow value in this network is at most C. The Open-Pit Mining Problem is to determine the most profitable set of blocks to extract. and each proiect i E P has an associated revenue p~. and maybe .{s} obeys the precedence constraints. but this ii~ turn would enable two other lucrative projects--and so forth. and we model this by an underlying directed acyclic graph G = (P.396 Chapter 7 Network Flow 7.) Certain proiects are prerequisites for other proiects. The algorithms of the previous sections. We will refer to requirements of this form as precedence constraints. that is. for this block considered in isolation. A set of projects A c_ p is feasible if the prerequisite of every project in A also belongs to A: for each i E A. Some of these net values will be positive. then A = A’.3 Open-pit mining is a surface mining operation in which blocks of earth are extracted from the surface to retrieve the ore contained in them. subject to the precedence constraints. The profit of a set of projects is defined to be 3 In contrast to the field of data mining. Pi of each block is estimated: This is the value of the ore minus the processing costs. and the net value. we also have j ~ A. but there is no problem in doing this: it is simply an edge for which the capacity condition imposes no upper bound at all. and there can be many projects that have project j as one of their prerequisites. This problem also became a hot topic of study in the mining literature. The conceptually cleanest way to ensure this is to give each of the edges in G capacity of oo. here it was called the Open-Pit Mining Problem. that the telecommunications giant CluNet is assessing the pros and cons of a project to offer some new type of high-speed access service to residential customers. However. We haven’t previously formalized what an infinite capacity would mean. which has motivated several of the problems we considered profit(A) = ~ P~tEA earlier. We will set the capacities on the edges in G later. (In other words. but it must be weighed against some costly preliminary projects that would be needed in order to make this service possible: increasing the fiber-optic capacity in the core of their network. However. if the node i E A has an edge (i. The nodes of G are the projects. For each node i ~ P with pi > 0. and the expenses needed for activities that can support these projects. For each node i ~ P with pi < 0. t) with capacity -p~.20. We want to ensure that if (A’. ~ Designing the Algorithm Here we will show that the Project Selection Problem can be solved by reducing it to a minimum-cut computation on an extended graph G’. Suppose.10 for image segmentation. hotvever.j) to indicate that project i can only be selected if project j is selected as well. Note that a project f can have many prerequisites. carry over to handle infinite capacities. The idea is to construct G’ from G in such a way that the source side of a minimum cut in G’ will correspond to an optimal set of projects to select. In the end. the entire area is divided into a set P of blocks. each of the lucrative opportunities and costly infrastructure-building steps in our example above will be referred to as a separate proiect. starting in the early 1960s.11 Project Selection Large (and small) companies are constantly faced with a balancing act between projects that can yield revenue. once the company has modernized the ronters. Before the mining operation begins. and buying a newer generation of high-speed routers. we can already see that the capacity of the cut ([s}. The full set of blocks has precedence constraints that essentially prevent blocks from being extracted before others on top of them are extracted. j) ~ E. we add a new source s and a new sink t to the graph G as shown in Figure 7. we add an edge (s. and which should be passed up? It’s a basic issue of balancing costs incurred with profitable opportunities that are made possible.11 Project Selection 397 7. we add an edge (i. and there is an edge (i. Marketing research shows that the service will yield a good amount of revenue. This problem falls into the framework of project selection--each block corresponds to a separate project. And these interactions chain together: the corporate project actually would require another expense.

A possible minimum-capacity cut is shown on the right. B’). then the set A = A’. as shown in Figure 7. [] Figure 7. so the cut with minimum capacity corresponds to the set of projects A with maximum profit.11 Project Selection 399 ects with (7.profit(A). B’) is a cut with capacity at most C. i¢A and pi>O Using the definition of C. and those entering the sink t. and the edges leaving the source s contribute Pi. and we declare A’-{s} to be the optimal set of projects.C -.20 The flow graph used to solve the Project Selection Problem. B’) of capacity at most C are in oneto-one correspondence with feasible sets of project A = A’. The capacity of such a cut (A’. Now we can prove the main goal of our construction. is c(A’.{s}.56) The capacity of the cut (A’. Because A satisfies the precedence constraints. We have therefore proved the following. B’) ---. as defined from a project set A satisfying the precedence constraints. Next. This implies that such cuts define feasible sets of projects.398 Chapter 7 Network Flow 7. and consider the s-t cut (A’. that the minimum cut in G’ determines the optimum set of projects. recall that edges of G have capacity more than C = Y~. B’) in G’.20. (7. The edges entering the sink t contribute -Pi i~A and pi<O Projects Projects to the capacity of the cut. j) E E crosses this cut.57) If (A’. Edges of G’ can be divided into three categories: those corresponding to the edge set E of G.Let A’ = A U {s} and B’ = (P-A) U {t}. B’). those leaving the source s.~ Analyzing the Algorithm First consider a set of projects A that satisfies the precedence constraints. and so these edges cannot cross a cut of capacity at most C. B’) is c(A’. giving each of these edges a capacity of C + 1 would accomplish this: The maximum possible flow value in G’ is at most C. If the set A satisfies the precedence constraints. and so no minimum cut can contain an edge with capacity above C.{s} satisfies the precedence constraints. which is Projects with positive value subset of proiects i~A and pi<O i~A and pi>O i~A as claimed. the edges in E do not cross the cut (A’. We can now state the algorithm: We compute a minimum cut (A’. we can rewrite this latter quantity as C~i~A and pi>0 Pi" The capacity of the cut (A’. and hence do not contribute to its capacity. B’). it will not matter which of these options we choose. we see that the cuts (A’. /. then no edge (i.~i~A Pi.i~P:pi>0 Pi. In the description below. B’). . B’) = C . Putting the previous two claims together. Proof. simply assigning each of these edges a capacity that is "effectively infinite:’ In our context. independent of the cut (A’. B’) is the sum of these two terms. The capacity of the cut can be expressed as follows. The capacity value C is a constant. We now turn to proving that this algorithm indeed gives the optimal solution.

so then the winner of the Baltimore-Toronto game will end up with the most wins. Boston: 90. whether it is possible to choose outcomes for all the remaining games in such a way that the team z ends with at least as many wins as every other team in S. and 273 -T. And finally. suppose we have a set S of teams. possibly in a fie)? If you think about it. The question is: Can Boston finish with at least as many wins as every other team in the division (that is. . Boston still has four games against each of the other three teams. they still (And hence one of the teams in T must end with strictly more than m wins. more complex illustration of how the averaging argument in (7. Toronto}. x. that there is always a short "proof" when a team has been eliminated. There are four basebal! teams trying to finish in first place in the American League Eastern Division. Baltimore: 88. Toronto: 91. Toronto. Also. The remaining games are as follows. and for each x ~ S. This means that one of them must end up with more than 91 wins. finish in first place. we prove the following clean characterization theorem for baseball elimination--essentially. Some reporter asked him to figure out the mathematics of the pennant race. Cumulatively.59) works. and omit Baltimore.. What are the myriad possibilities? Who’s got the edge?" "The hell does he know?" "Apparently not much. (7. y ~ S. Baltimore. but is it actually eliminated? The answer is yes. Together New York and Toronto already have 177 wins. Toronto: 87. Underworld have to play gx~. So now you might start wondering: (i) Is there an efficient algorithm to determine whether a team has been eliminated from first place? And (ii) whenever a team has been eliminated from first place. But this means that Baltimore and Toronto will both beat New York. the other three teams have 274 wins currently. and the following situation arises late one September. you realize that the answer is no. and their three games against each other will produce exactly three more wins. Boston: 79.y~T /.12 Baseball Elimination Over on the radio side the producer’s saying. > mlT[. Currently. is there an "averaging" argument like this that proves it? In more concrete notation.= 91 -. but now the current number of wins is New York: 90. Boston can finish with at most 92 wins. Finally.7. We will use maximum-flow techniques to achieve the following two things. and now consider the set of teams T = {New York.12 Baseball Elimination 401 400 Chapter 7 Network Flow 7.not enough by itself to prove that Boston couldn’t end up in a multi-way tie for first. There are five games left in the season: These consist of all possible pairings of the four teams above.59) Suppose that team z has indeed been eliminated. Suppose we have the same four teams as before. There is a set of teams T c_ S so that ~ wx + ~ gx~. we are given a specific team z.) As a second. each team has the following number of wins: New York: 92. Then there exists a "proof. and Boston. consider the following example. Clearly. He picked the Dodgers to eliminate the Giants last Friday. its current number of wins is wx. New York and Toronto still have six games left to play against each other. You know. Clearly. and !~ > 91. Here’s an argument that avoids this kind of cases analysis. things don’t !ook good for Boston.. "See that thing in the paper last week about Einstein?. One argument is the following. of this fact of the following form: z can finish with at most m wins. First. we give an efficient algorithm to decide whether z has been eliminated from first place--or. Second.. So it’s crucial for the averaging argument that we choose the set T consisting just of New York and Toronto. Baltimore has one more game against each of New York and Toronto. Interestingly. for two teams x. for a final total of 277. games against one another.~ The Problem Suppose you’re a reporter for the Algorithmic Sporting News. their six remaining games will result in a total of 183. let’s call them New York. and so Boston can’t finish in first. to put it in positive terms. But 277 wins over three teams means that one of them must have ended up with more than 92 wins. except for New York and Boston. the other teams win this number or that number. one team wins so many of their remaining games. Boston has been eliminated. this is a total of 273. To see this. Boston must win both its remaining games and New York must lose both its remaining games.’" --Don DeLillo. first note that Boston can end with at most 91 wins. Baltimore: 91. in this instance the set of all three teams ahead of Boston cannot constitute a similar proof: All three teams taken togeher have a total of 265 wins with 8 games left among them.

Let T be the set of teams x for which vx ~ A. Then the maximum s-t flow in G has value g’ < g*. (We note that the construction stil! works even if this edge is given only gx7 units of capacity. it can still achieve at least a tie for first place. B’) that we would obtain by adding u~ to the set A and deleting it from the set B. and let g* = ~x. if there is a flow of value g*. On the other hand. We will now prove that T can be used in the "averaging argument" in (7. units of capacity. Suppose that z has been eliminated from first place.wx games. We now want to carefully a~ocate the wins from all remaining games so that no other team ends with more than m wins. uxy) (wins emanate from s).21. and (A. there is no flow of value g*. Balt-Tor Bait Figure 7. then ux3. we have shown (7. an edge of the form (Ux~. ~ A. in Figure 7. but ux3. and if we interpret this characterization in terms of our application. B) of capacity g’.59). Then the edge (Ux~. and hence the cut (A.402 Chapter 7 Network Flow 7. should have at least gx3.59). we will prove (7. uxy)--for this edge (s. Conversely. then it is possible for the outcomes of all remaining games to yield a situation where no team has more than m wins. y ~ S’ with a nonzero number of games left to play against each other. The ith win can pass through one of the two teams involved in the ith game. vy) (only x or y can win a game that they play against each other). o Edges (s. Characterizing When a Team Has Been Eliminated Our network flow construction can also be used to prove (7. This illustrates a general way in which one can generate characterization theorems for problems that are reducible to network flow. Finally. whereas g* = 6 + 1 + 1 = 8. a node vx for each team x ~ S’. Consider the cut (A’.wx wins can pass through team x. we can use these outcomes to define a flow of value g*. Let’s suppose that this leaves it with m wins. we construct the following flow network G. which is based on our second example.12 Baseball Elimination 403 ~ Designing and Analyzing the Algorithm We begin by constructing a flow network that provides an efficient algorithm for determining whether z has been eliminated. First. Clearly. The capacity of (A’. t) a capacity of m .t:~. Vx) and (ux~. our analysis will be the cleanest if we give it infinite capacity. as shown in Figure 7.59). We then impose a capacity constraint saying that at most m . B’) is simply the capacity of (A. The idea is that the Max-Flow Min-Cut Theorem gives a nice "if and only if" characterization for the existence of flow. Then. if there are outcomes for the remaining games in which z achieves at least a tie. We want wins to flow from s to uxy at saturation. via the following basic idea. consider the node u. B) is a minimum cut. Allocating wins in this way can be solved by a maximum-flow computation. on to vx.y~s’ gx~-the total number of games left between all pairs of teams in S’. ~ . B) would have infinite capacity. In summary.{z}. ~ B. u~y) a capacity of We want to ensure that team x cannot win more than m . and hence. t) (wins are absorbed at t). Let’s consider what capacities we want to place on these edges. As the minimum cut indicates. We have a source s from which all wins emanate. We include nodes s and t. Uxy) used . B) is a minimum cut of capacity less than g*. First. but ux3.’if team z wins all its remaining games. minus the capacity g~ of the edge (s.21 The flow network for the second example. and suppose one of x or y is not in T. This contradicts the assumption that (A. by examining the minimum cut in this network. we get the comparably nice characterization here. he set T= {NY. suppose both x and y be!ong to T.59). So if one of x or y is not in T. B). ~ B. so we give (s. For example. so we give the edge (vx. and so Boston has been eliminated.wx.21. o Edges (ux~. let S’ = S . More concretely. and o Edges (vx. Thus we can test in polynomial time whether z has been eliminated.) Now. Toronto} roves Boston is liminated. vx) would cross from A into B. we should have z win all its remaining games.59) will become a little more complicated. the indicated cut shows that the maximum flow has value at most 7. but the proof of (7. so that it has the ability to transport a!l the wins from ux3. and a node Uxy for each pair of teams x.60) Team z has been eliminated if and only if the maximum flow in G has value strictly less than g*. so there is an s-t cut (A. We h~ive the following edges. if there’s any way for z to end up in first place. in fact. Proof of (7.

and rather than !ooking for any augmenting path (as was sufficient in the case without costs). again contradicting our assumption that (A.y}~T ~ The Problem A natural way to formulate a problem based on this notion is to introduce costs. we see that the nodes for New York and Toronto ~re on the source side of the minimum cut. and. these two teams indeed constitute a proof that Boston has been eliminated. Uxy). we use the cheapest augmenting path so that the larger matching wil! also have minimum cost. B’) has smaller capacity than (A. Recall the construction of the residual graph used for finding augmenting paths. for example. * 7.y~T and hence For example. it is a minimum-cost perfect matching.13 A Further Direction: Adding Costs to the Matching Problem Let’s go back to the first problem we discussed in this chapter. we will call a path P in GM an augmenting path. then we seek an augmenting path to produce a matching of size i + 1. applying the argument in the proof of (7. {x. it is very useful to have an algorithm that finds a perfect matching "of minimum total cost. there are a large number of possible perfect matchings on the same set of objects. x~T x. and we’d like a matching that minimizes the average distance each truck drives to its associated house. B). Since we know that c(A. y) ~ E is oriented from x to y if e is not in the matching M and from y to x if e ~ M. We will show that when the algorithm concludes with a matching of size n. the algorithm will iteratively construct matchings using i edges. and the goal is to find a perfect matching of minimum cost. Any directed s-t path P in the graph GM corresponds to a matching one larger than M by swapping edges along P. The high-level structure of the algorithm is quite simple. we know that edges crossing from A to B have one of the following two forms: o edges of the form (vx. where at least one of x or y does not belong to r (in other words. that is. we consider a bipartite graph G = (V.404 Chapter 7 Network Flow 7. We will use GM to denote this residual graph. Thus we have established the following conclusion. It may be that we incur a certain cost to perform a given job on a given machine. B) is a minimum cut. this means that (A’. t). as we saw earlier.node set. We add two new nodes s and t to the graph. . Note that all edges going from Y to X are in the matching M. Thus.13 A Further Direction: Adding Costs to the Matching Problem 405 to cross from A to B. But in many settings. If we have a minimum-cost matching of size i. is partitioned as V = X U Y so that every edge e ~ E has one end in X and the other end in Y. y} ~ r). E) whose. But since gx~ > O. B) in terms of its constituent edge capacities.21. B) is a minimum cut: uxy ~ A if and only if both x. Formaliy. Thus we have x~T {x. and we say that we augment the matching M using the path P. By the conclusion in the previous paragraph. Perfect matchings in a bipartite graph formed a way to model the problem of pairing one kind of obiect with another--iobs with machines. then Uxy ~ A. for each value of i from ! to n. as usual. the edges in P from X to Y are added to M and all edges in P that go from Y to X are deleted from M.y~T ~ Designing and Analyzing the Algorithm We now describe an efficient algorithm to solve this problem. B) = g’ < g*. So. based on the fact that (A. that is. each house is at a given distance from each fire station. and we’d like to match jobs with machines in a way that minimizes the total cost. where x ~ T. based on the idea of augmenting paths but adapted to take the costs into account. each edge e has a nonnegafive cost ce. As before. Bipartite Matching. this last inequality implies x. Furthermore.59) to the instance in Figure 7. we say that the cost of the matching is the total cost of a!l edges in M. Let M be a matching. An edge e = (x. For a matching M. and now it does not cross from A’ to B’. We add edges (s. if both x and y belong to T. and o edges of the form (s. t) for all nodes y ~ Y that are unmatched. while the edges going from X to Y are not. Now we just need to work out the minimum-cut capacity c(A. cost(M) = ~e~v~ Ce" The Minimum-Cost Perfect Matching Problem assumes that IXI = IYI = n. and we’d like a way to express the idea that some perfect matchings may be "better" than others. In short. x) for all nodes x ~ X that are unmatched and edges (y. Or there may be n fire trucks that must be sent to n distinct houses. y ~ T.

7. To understand prices.13 A Further Direction: Adding Costs to the Matching Problem ¯ 406 Chapter 7 Network Flow 407 Now we would like the resulting matching to have as small a cost as possible. This way the "net cost" of assign~g person x to do job y becomes p(x) + ce -p(y): this is the cost of hiring x for a bonus ofp(x). These prices will help both in understanding how the algorithm runs. The prices will turn out to serve as a compact proof to show this. (7. Observe that this set of edges corresponds to a set of node-disjoint directed cycles in GM. and let M’ be a perfect matching of smaller cost. Note that in this case the node s has no leaving edges. . and hence M is not of minimum cost. Or even worse. Consider the set of edges in one of M and M’ but not in both. We will use cost(P) to denote the cost of a path P in GM. and hence no cycle in GM contains s or t. we can use (7. Let M’ be the matching obtained [tom M by augmenting along P. I[ there are no negative-cost directed cycles C in GM. The resulting new perfect matching M’ has cost cos~ (M’) = cost(M) + cost(C). y) we have p(x) + Ce >_ P(Y) (that is. y) ~ M we have p(x) + ce = p(y) (every edge used in the assignment has a reduced cost of 0). The following statement summarizes this construction. Proof. Maintaining Prices on the Nodes It will help to think about a numericalprice p(v) associated with each node v.63) to conclude that it has minimum cost.. every edge has a nonnegative reduced cost). the converse of this statement is true as well. For this pm]~ose. I[ there is a negative-cost directed cycle C in GM.. Our plan is thus to iterate through matchings of larger and larger size. However. Then IM’I = IMI + 1 and cost(M’) = cost(M) + cost(P). is this algorithm even meaningful?.p(y). But how can we be sure that the perfect matching we find is of minimum cost?. the cost ce is a cost associated with having person x doing job y. the prices (bonuses and rewards) wi!l be a way to think about our solution. or value gained by taking care of iob y (no matter which person in X takes care of it). and they will also help speed up the implementation. and an edge e oriented from Y to X will have cost -ce (as including this edge in the path means that we delete the edge from M). and when we terminate with a perfect matching. it must be that at least one of these cycles has negative cost. y) and denote it by ~ = p(x) + ce . y). To achieve this. we will search for a cheap augmenting path with respect to the fo!lowing natural costs. To see this. In this way. (7. Given this statement. and use the paths to augment the matchings. The cost of the set of directed cycles is exactly cost(M’) . On the other hand. an edge e oriented from X to Y will have cost ce (as including this edge in the path means that we add the edge to M).63) Let M be a perfect matching. The edges leaving s and entering t will have cost 0. For an edge e = (x. then M is a minimum-cost perfect matching. consider the following scenario.cost(M). the new residual graph still has no negative cycles?. We wil! call this the reduced cost of an edge e = (x. and t has no entering edges in GM (as our matching is perfect). and then cashing in on the reward p(y). it helps to keep in mind an economic interpretation of them. (if) for all edges e = (x.62) Let M be a perfect matching." With this in mind. and (iii) for all edges e = (x. We can only find minimum-cost paths if we know that the graph GM has no negative cycles. then M is not minimum cost. it is natural to suggest an algorithm to find a minimum-cost perfect matching: We iterafively find minimum-cost paths in GM. it is important to keep in mind that only the costs ce are part of the problem description. having him do iob y for a cost of ce. Now we will think of the price p(x) as an extra bonus we pay for person x to participate in this system. Proof. the cost for assigning person x to do iob y will become p(x) + ce. our computation of a minimum-cost path will always be well defined. we will think of the price p(y) for nodes y ~ Y as a reward. iust the same way we used directed paths to obtain larger matchings. like a "signing bonus. people not asked to do any job do not need to be paid). Augmenting M along C involves swapping edges along C in and out of M. ¯ More importantly. understanding the role of negative cycles in GM is the key to analyzing the algorithm. Specifically. maintaining the property that the graph GM has no negative cycles in any iteration. Assume that the set X represents people who need to be assigned to do a set of iobs Y. Suppose the statement is not true. First consider the case in which M is a perfect matching. Analyzing Negative Cycles In fact. One issue we have to deal with is to maintain the property that the graph GM has no negative cycles in any iteration. we say that a set of numbers {p(v) : u ~ V} forms a set of compatible prices with respect to a matching M if (i) for all unmatched nodes x ~ X we havep(x) = 0 (that is. we use the cycle C for augmentation. (7. How do we know that after an augmentation. but cost(C) < 0. so in fact a perfect matching M has minimum cost precisely when there is no negative cycle in GM. Assuming M’ has smaller cost than M.61) Let M be a matching and P be a path in GM from s to t.

4O8 Chapter 7 Network Flow 7. and a residual graph used to increase the size of the matching. let p be compatible prices. w). we have the following fact. and so clearly cost(C) is nonnegafive.ct to the costs ~. and thus we get the desired equation on such edges. and hence the prices are no longer compatible. For a perfect matching M.M(y) = dp. To keep things compatible.M(Y) for an unmatched node y ~ Y. which only requires time O(m log n)--almost a full factor of n faster. To get some intuition on how to do this. However.that is.M(X) + ~ as desired. Observe that the definition of compatible prices implies that all edges in the residual graph GM have nonnegafive reduced costs. However.(x’. as shown in Figure 7. Now. having the prices around allows us to compute shortest paths with respect to the nonnegafive reduced costs ~.63). But if the graph in fact has no negative-cost edges. There is a second. Can we update all prices and keep the matching and the prices compatible on all edges? Surprisingly. suppose we use Diikstra’s Algorithm to find the minimum cost dp. and hence they satisfy dp. the (nonreduced) cost of the path from s to t through y is dp. The only edge entering x’ is the directed edge 0~. These edges are along the minimum-cost path from s to t. we get the required inequality for all other edges since all edges e = (x.13 A Further Direction: Adding Costs to the Matching Problem 409 Why are such prices useful? Intuitively.M(U) of a directed path from s to every node u ~ X U Y subje. Increasing the reward p(y) decreases the reduced cost of edge e’ to negative. consider an unmatched node x with respect to a matching M.M(Y) + P(Y). node y may be matched in the matching M to some other node x’ via an edge e’ = (x’.M(X) + ~.M(y) -~. and p be compatible prices. [] Updating the Node Prices We took advantage of the prices to improve one iteration of the algorithm.M(x’) = dp. we can increase p(x’) by the same amount. When you have a graph with negative-cost edges but no negative cycles. However. To keep prices nonnegative. arriving at an equivalent answer. this may not imply that the matching has the smallest possible cost for its size (it may be taking care of expensive jobs). (7. while on all other edges the reward is no bigger than the cost. compatible prices suggest that the matching is cheap: Along the matched edges reward equals cost. we need not only the minimum-cost path (to get the next matching).M(V) + p(v) is a compatible set of prices for M’. you can compute shortest paths using the Bellman-Ford Algorithm in O(mn) time. and let M’ be a matching obtained by augmenting along the minimum-cost path from s to t. (7. To see why GM can have no negative cycles. algorithmic reason why it is usefnl to have prices on the nodes. We know that each term on the right-hand side is nonnegafive. In order to be ready for the next iteration. this can be done quite simply by using the distances from s to all other nodes computed by Dijkstra’s Algorithm. We can use one run of Dijkstra’s Algorithm and O(n ) extra time to find the minimum-cost path from s to t. If the new matching M’ includes edge e (that is.22. In summary.M(y) < dp.22 A matching M (the dark edges). then we will want to have the reduced cost of this edge to be zero. in our economic interpretation. Proof. Then p’ (v) = dp. cost(c) = E = E eEC since all the terms on the right-hand side corresponding to prices cancel out. we extend the definition of reduced cost to edges in the residual graph by using the same expression ~ =p(v)+ ce -p(w) for any edge e = (v. . Indeed. then you can use Diikstra’s Algorithm instead. x’). we will increase the price p(y). or by decreasing the price p(x) by the same amount.64) Let M be a matching.y) ~M must satisfy dp. consider first an edge e =. and hence dp. However. y). and so we find the minimum cost in O(n) additional time. Finally. Next consider edges (x. y). then GM has no negative cycles.22. and an edge e = (x. but also a way to produce a set of compatible prices with respect to the new matching. this will imply that M is of minimum cost by (7. if e is on the augmenting path we use to update the matching). the assignment of person x to job y. the prices p we used with matching M may result in a reduced cost ~ > 0 -. this change might cause problems on other edges. We can arrange the zero reduced cost by either increasing the price p(y) reward) by ~. where ~=p(y)+ce -p(x’). we claim that if M is any matching for which there exists a set of compatible prices. as shown in Figure 7. Given the minimum costs dp. we have Figure 7. y) in M’-M. For a partial matching. y) ~ M.6S) Let M be a matching. note that for any cycle C. may not be viewed as cheap enough. To prove compatibility. In our case.

y). and p(y)--. a designated source s ~ V. But can we find a perfect matching and a set of prices so as to achieve this state of affairs. In fact.a set of compatible prices with respect to Mr via (7. and let p be a compatible set of prices. Now suppose we pick a specific edge e ~ E and increase its capacity by one unit. [] Endwhile The final set of compatible prices yields a proof that GM has no negative cycles. Extensions: An Economic Interpretation of the Prices To conclude our discussion of the Minimum-Cost Perfect Matching Problem. that is. the flow f can’t be that far from maximum. we have v(x. y) for each edge e ~ (x. y’). it’s not hard to show that the maximum flow value can go up by at most 1. (7.63).win ce for y~Y e into y While M is not a perfect matching Find a minimum-cost s-[ path P in GM using (7. and by (7. each buyer x would want to buy the house y that has maximum value v(x. We consider the following scenario. this implies that M has minimum cost.P(~) > v(x. define p(x) = 0 for all x ~ X.y)~4 v(x. Then the matching M and the set ofprices {P(y) = -p(y):y ~ Y} are in equilibrium. We initialize M to be the empty set. even after we add 1 to the capacity of edge e.68) Consider the flow network G’ obtained by adding 1 to the capacity of e. the house y that maximizes v(x. How can we convince her to buy instead the house that our matching M al!ocated~. defined by a flow value fe on each edge e. and substituting the values of p and c. We can find such a perfect matching by using our minimum-cost perfect matching algorithm with costs ce = -v(x. with every buyer ending up happy? In fact. Intuitively. Suppose we set a price P(y) for each house y. Subtracting these two inequalities to cance! p(x). y) ~ M. and Y is a set of n houses that they are all considering. so we need to figure out how to use the flow f that we are given. and let e’ = (x. for y a Y. Since each buyer wants one of the houses.66) The minimum-cost perfect matching can be found in the time required i Solved Exercises Solved Exercise 1 Suppose you are given a directed graph G = (V. With these prices in mind. we haven’t changed the network very much. we have p(x) + ce = p(y) and p(x) + ce. we get the desired inequality in the definition of equilibrium. the minimum-cost perfect matching and an associated set of compatible prices provide exactly what we’re lookin.64) with prices p Augment along P to produce a new matching Mr Find. You are also given an integer maximum s-t flow in G. and a designated sink t ~ V.65) perfect matching M and house prices P are in equilibrium if. one could argue that the best arrangement would be to find a perfect matching M that maximizes ~(x. Proof. Consider an edge e = (x. y’) . where m is the number of edges in G and n is the number of nodes.410 Chapter 7 Network Flow Solved Exercises 411 Finally. The value of the maximum flow in G’ is either v(f) or v(f) + 1. E). that is. y) . (7. Since M and p are compatible. We will use prices to change the incentives of the buyers.g for. y) denote the value of house y to buyer x. to be the minimum cost of an edge entering y. for all edges (x. y) if e = (x. with a positive integer capacity ce on each edge e. y) ~ M and all other houses y’. Show how to find a maximum flow in the resulting capacitated graph in time O(m + n). a buyer will be interested in buying the house with maximum net value. y). and define p(y). Start with M equal to the empty set Define p(x)=0 for x~X. where ce = ~v(x. the person buying the house y must pay P(Y).P(y’). Let v(x. y) to her. so as to get it underway. we develop the economic interpretation of the prices a bit further. after all. (7. We say that a . Solution The point here is that O(m + n) is not enough time to compute a new maximum flow from scratch. Note that these prices are compatible with respect to M = ¢. The question we will ask now is this: Can we convince these buyers to buy the house they are allocated? On her own. y) -P(Y). y). We summarize the algorithm below.67) LetM be aperfect matchingofminimum cost. Assume X is a set of n people each looking to buy a house. > p(y’). we have to consider how to initialize the algorithm.

etc. or two or more days of the July 4th weekend. (This may include certain days from a given vacation period but not others. with the possible exception of e (in case e crosses (A. There are k vacation periods (e. We have a node ui representing each doctor attached to a node ve representing each day when he or she can Holidays Solved Exercise 2 You are helping the medical consulting firm Doctors Without Weekends set up the work schedules of doctors in a large hospital. there is some s-t cut (A. So it is enough to show that the maximum-flow value in G’ is at most v(]~) + 1.23 (a). as the set of all vacation days. For each vacation period j. but not the Thursday. in particular. $o either way. UjDj.Solved Exercises 413 412 Chapter 7 Network Flow ProoL The value of the maximum flow in G’ is at least v(f). we try to find a single augmenting path from s to t in the residual graph G}. and so the capacity of (A. the July 4th weekend. and doctor i has a set of vacation days Si when he or she is available to work. and each doctor should be scheduled for at most c days total.) The algorithm should either return an assignment of doctors satisfying these constraints or report (correctly) that no such assignment exists. we will refer to the union of all these days. each doctor should be assigned to work at most one of the days in the set Dj. Starting with the feasible flow 1~ in G’. Here’s how this works. Otherwise the angmentation succeeds. . It is also integer-valued. B) in the new flow network G’ is at most v t) + 1. however.. Doctors Doctors Holidays Gadgets Source Sink Source Sink (a) (b) Figure 7. each spanning several contiguous days. B) in the new flow network G’? All the edges crossing (A. and in this case we know that/~ is a maximum flow. we produce a maximum flow after a single augmenting path compntation. producing a flow f’ of value at least ~(/~) + 1. and only days when he or she is available.. This takes time O(m + n). By the Max-Flow Min-Cnt Theorem. The shaded sets correspond to the different vacation periods. But ce only increased by 1. The construction is pictured in Figure 7.. Solution This is a very natural setting in which to apply network flow. [] Statement (7. The complication comes from the requirement that each doctor can work at most one day in each vacation period. the week of Christmas. B) have the same capacity in G’ that they did in G. make sure that they have at least one doctor covering each vacation day. There are n doctors at the hospital. For a given parameter c. (In other words. we know by (7. They’ve got the regular dally schedules mainly worked out. So to begin.. for example. subject to the following constraints. B)). a doctor may be able to work the Friday. In this case. Now. or Sunday of Thanksgiving weekend. although a particular doctor may work on several vacation days over the course of a year. each doctor should be assigned to work at most c vacation days total. B) in the original flow network G of capacity v(]:). (b) The flow network is expanded with "gadgets" that prevent a doctor from working more than one day fTom each vacation period. so.g. the Thanksgiving weekend . ). he or she should not be assigned to work two or more days of the Thanksgiving weekend. Saturday. Let Dj be the set of days included in the jth vacation period.68) that f’ must be a maximum flow. let’s see how we’d solve the problem without that requirement.68) suggests a natural algorithm. in the simpler case where each doctor i has a set Si of days when he or she can work. since at a high level we’re trying to match one set (the doctors) with another set (the vacation days).23 (a) Doctors are assigned to holiday days without restricting how many days in one holiday a doctor can work. since 1: is still a feasible flow in this network. Now one of two things will happen. Now we ask: What is the capacity of (A. they need to deal with all the special cases and.) Give a polynomial-time algorithm that takes this information and determines whether it is possible to select a single doctor to work on each vacation day. Either we will fai! to find an augmenting path.

25? Again. The construction is pictured in Figure 7.414 Chapter 7 Network Flow Exercises 415 work. and also say what its capacity is. assigned days can "flow" through doctors to days when they can work. we will show how to use the circulation to construct a schedule for all the doctors. ui. This way. which is O(nd). and we look for a feasible circulation. This gadget serves to "choke off" the flow from ai into the days associated with vacation period ]. this flow network? The capacity of each edge appears as a label next to the edge. The capacity of each edge appears as a label next to the edge. that each doctor can work at most one day from each vacation period. and each day is covered by one doctor. t. Figure 7. not maximum flow. and we look for a feasible circulation. we put a demand of +d on the sink and -d on the source.24 What are the m2mimum s-t cuts in this flow network? Figure 7. First. suppose there are d vacation days total. We include a new node wq with an incoming edge of capacity 1 from the doctor node ui. the algorithms in the text are phrased in terms of circulations with demands.23 (b). at most one in each vacation period. and it sends d units of flow out of s and into t. Figure 7. and the numbers in boxes give the amount of flow sent on each edge. wil. (Edges without boxed numbers have no flow being sent on them. and the numbers in boxes give the amount of flow sent on each edge.) But now we have to handle the extra requirement. We attach a super-source s to each doctor node ui by an edge of capacity c. the resulting schedule has each doctor work at most c days. First.69) There is a way to assign doctors to vacation days in a way that respects all constraints if and only if there is a feasible circulation in the flow network we have constructed.26 shows a flow network on which an s-t flow has been computed. (b) What is the minimum capacity of an s-t cut in the flow network in Figure 7. If doctor i works on day g of vacation period ]. Because of the capacities. then we have doctor i work on day g. then we send one unit of flow along the path s. the capacity of each edge appears as a label next to the edge.27 shows a flow network on which an s-t flow has been computed.52). To do this. then we can construct the following circulation. Proof. there is a feasible circulation in which all flow values are integers. re.26 What is the value of the depicted flow? Is it a maximum flow? What is the minimum cut? .) (a) What is the value of this flow? Is this a maximum (s. and with outgoing edges of capacity 1 to each day in vacation period ] when doctor i is available to work. and we attach each day node ve to a supersink t by an edge with upper and lower bounds of 1.]) consisting of a doctor i and a vacation period j. (a) List a~ the minimum s-t cuts in the flow network pictured in Figure 7. plus the time to check for a single feasible circulation in this graph. Conversely. the resulting circulation respects all capacities. (Edges without boxed numbers--specifically. we put a demand of +d on the sink and -d on the source. we do this for all such (i.t) flow in this graph? (b) Find a minimum s-t cut in the flow network pictured in Figure 7. As before. the four edges of capacity 3--have no flow being sent on them. Figure 7.25 What is the minimum capacity of an s-t cut in 3. suppose there is a feasible circulation. (7.26. this edge has a capacity of 1. g) pairs. Exercises 1. The correctness of the algorithm is a consequence of the following claim. so it meets the demands. and the lower bounds on the edges from the days to the sink guarantee that each day is covered. if there is a way to assign doctors to vacation days in a way that respects all constraints. so that at most one unit of flow can go to them collectively. and we add a "vacation gadget" as follows. (Recall that once we’ve introduced lower bounds on some edges.24. The total running time is the time to construct the graph. we take each pair (i.) (a) What is the value of this flow? Is this a maximum (s. For this direction of the proof. Since the assignment of doctors satisfied all the constraints. We now construct the following schedule: If the edge (wq. re) carries a unit of flow.t) flow in this graph? K Figure 7. by (7. The capacity of each edge appears as a label next to the edge. Fina!ly.

we wish to connect it to exactly one of the base stations. Sometimes this is possible and sometimes it isn’t. If it is false.) The floor plan in (b) is not ergonomic. . 3). for all edges e out of s. Decide whether you think the folloxomg statement is true or false. (x~. give a cotmterexample.27. y~). give a short explanation. Let G be an arbitrary flow network. together with n light fixture locations and n switch locations. a sink t. then (A.ties {ce : e ~ E}. We’ll suppose there are n clients. Let G be an arbitrary flow network. yi).. but this is not possible in Figure 7. B) is still a minimum s-t cut with respect to these new capacities {l+ce:e~E}. Our choice of connections is constrained in the following ways. It is possible to wire switches to fLxtures in Figure 7. Suppose you’re a consultant for the Ergonomic Architecture Commission. Consider the two simple floor plans for houses in Figure 7. If it is false.416 Chapter 7 Network Flow Exercises 417 10 (a) Ergonomic Figure 7. b. Consider. You may assume that you have a subroutine with O(1) running time that takes two line segments as input and decides whether or not they cross in the plane. with a source s. where the ith wall has endpoints (xi. The running time should be polynomial in m and n. Consider a set of mobile computing clients in a certain town who each need to be connected to one of several possible base stations. a sink t. A floor plan will be represented by a set of m horizontal or vertical line segments in the plane (the walls). There are also k base stations. because we can wire switches t6 fixtures in such a way that each fLxture is visible from the switch that controls it. They’re really concerned about designing houses that are "userfriendly. You’d like to be able to wire up one switch to control each light fixture. the position of each of these is specified by (x. switch 2 to b. a one-floor house with n light fixtures and n locations for light switches mounted in the wall. Now suppose we add 1 to eve’ry capacity. 2. 6. Give an algorithm to decide if a given floor plan is ergonomic. (This can be done by wiring switch 1 to a.a positive integer capacity ce on every edge e. with a source s. then f saturates every edge out of s with flow (i. Let’s call a floor plan. Decide whether you think the following statement is true or false. 5. (b) Find a ~um s-t cut in the flow network pictured in Figure 7. give a counterexample. and let (A. and a positive integer capacity ce on every edge e. ergonomic if it’s possible to wire one switch to each fixture so that every fixture is visible from the switch that controls it. There are three light fixtures (labeled a. y) coordinates as well.28(a) so that every switcd~ has a line of sight to the fixture. and they come to you with the following problem. If it is true. c) and three switches (labeled 1.28. For each client. for example. and switch 3 to c. B) be a mimimum s-t cut with respect to these capaci.e. A fixture is visible from a sw~tch if the line segment joining them does not cross any of the walls. Each of the n switches and each of the n fLxtures is given by its coordinates in the plane.28 The floor plan in (a) is ergonomic. with the position of each client specified by its (x." and they’ve been having a lot of trouble with the setup of light fixtures and switches in newly designed houses. 4. give a short explanation.28(b).27 What is the value of the depicted flow? Is it a maximum flow? What isthe minimum cut? (b) Not ergonomic -- Figure 7. If f is a maximum s-t flow in G. y) coordinates in the plane. because no such wiring is possible. in such a way that a person at the switch can see the light fixture being controlled. we have f(e) = ce). and also say what its capacity is. If it is true. and .

dA. the arrival of spring typically resnlts in increased accidents and increased need for emergency medical treatment. (8o. blood of type AB has both. since major unexpected events often require the movement and evacuation of large numbers of people in a short amount of time. one doesn’t want to overload any one of the hospitals by sending too many patients its way.4 (a) Let So. The typical distribution of blood types in U. and sAB denote the supply in whole units of the different blood types on hand. blood type O A B AB supply 50 36 11 8 demand 45 42 8 3 Is th6 105 units of blood on hand enough to satisfy the 100 units of demand? Find an allocation that satisfies the maximum possible number of patients. and a person cannot receive blood with a particnlar antigem if their own blood does not have this antigen present. Concretely. and patients with type AB can receive anY of the four types. B. dB. Over the next week. provide an explanation for this fact that would be understandable to the clinic administrators. and each of the n people needs to be brought to a hospital that is within a half-hour’s driving time of their current location (so different people will have different options for hospitals. 10 percent type B. A person’s own blood supply has certain antigens present (we can think of antigens-as a kind of molecnlar signature). and 3 percent type AB. Given the positions of a set of clients and a set of base stations.S. decide whether every client can be connected simnltaneously to a base station. 42 percent type A. The hospital wants to know if the blood supply it has on hand would be enough if 100 patients arrive with the expected type distribution. this explanation should not involve the words flow. and dAB for the coming week.) Network flow issues come up in dealing with natural disasters and other crises. for example. depending on where they are right now). who have not taken a course on algorithms. blood of type B has the B antigen. There is a total of 105 units of blood on hand. Use an argument based on a minimum-capacity cut to show why not all patients can receive blood. patients with type B can receive only B or O. and a sink t ~ V. Consider the problem faced by a hospital that is trying to evaluate whether its blood supply is sufficient. Give a polynomial-time algorithm to evaluate if the blood on hand wonld suffice for the projected need. B. and O. Consider the following scenario. or graph in the sense we use them in this book. Statistically. and AB. patients is roughly 45 percent type O. 10. they expect to need at most 100 units of blood. You are also given a maximum s-t flow in G. Thus. capacity ce on each edge e. The flow f is acyclic: There is no cycle in G on which all edges carry positive flow. and blood of type O has neither. (b) Consider the following example. a source s ~ V. Suppose you are given a directed graph G = (Vo E). patients with type 0 can receive only O. this principle underpins the division of blood into four types: A. subject to the range and load conditions in the previous paragraph. Due to large-scale flooding in a region.Exercises 419 418 Chapter 7 Network Flow There is a range parameter r--a client can only be connected to a base station that is within distance r. Blood of type A has the A antigen. At the same time. patients with type A can receive only blood types A or 0 in a transfusion. with a positive integer 4 The Austrian scientist Karl Landsteiner received the Nobel Prize in 1930 for his discovery of the blood types A. Assume that the hospital knows the projected demand for eachblood type do. cut. The basic rule for blood donation is the following. Give a polynomial-time algorithm that takes the given information about the people’s locations and determines whether this is possible. There are k hospitals in the region. There is also a load parameter L--no more than L clients can be connected to any single base station. Also. . O. and they want to collectively work out whether they can choose a hospital for each of the injured people in such a way that the load on the hospitals is balanced: Each hospital receives at most [n/k] people. and the supply on hand. as well as the range and load parameters. SB. The paramedics are in touch by cell phone. The table below gives these demands. The flow f is also integer-valued. which often requires blood transfusions. paramedics have identified a set of n injured people distributed across the region who need to be rushed to hospitals. 8. Your goal is to design a polynomial-time algorithm for the following problem. defined by a flow value fe on each edge e. AB. SA.

X. A certain collection of nodes X c V: are designated as populated nodes. and normegative node capacities {cu > 0} for each v ~ V. without overly congesting any edge in G. and there is no limit on how much flow is allowed to pass through a Suppose we have exactly the same problem as in (a). everyone has scheduling conflicts with some of the nights (e. show how to decide in polynomial lime whether such a set of evacuation routes exists. We say that a flow is feasible if it satisfies the usual flow-conservation constraints and the node-capacity constraints: fin(v) _< Cu for all nodes. Of course. Also. we consider the variant of the Maximum-Flow and Minimum-Cut problems with node capacities. and S. after you’ve looked at a bit of out-put from it. 12. However. Give a polynomial-time algorithm to solve this problem. etc. so that on every instance of the Maximum-Flow Problem. let’s label the people .). in a standard s-t Maximum-Flow Problem. Suppose you and your friend Alanis live. Do you believe this? The crux of their claim can be made precise in the following statement. You are given a flow network with unitcapacity edges: It consists of a directed graph G = (V. In other words.420 Chapter 7 Network Flow Exercises node.g. sink t ~ V. show how to decide in polynomial time whether such a set of evacuation routes exists. the flow though a node v is defined as fin(v). we want evacuation routes from the popnlated nodes to the safe nodes. E . we assume edges have capacities. where m is the number of edges in G and n is the number of nodes. concerts. and show that the analogue of the Max-Flow Min-Cut Theorem holds true." With this new condition.) In case of an emergency. they claim. X. The bug turns out to be pretty easy to find.2 other people. 11. There is an absolute constant b > 1 (independent of the particular input flow network). and give a proof of either the statement or its negation. In other words. 15.1. E) be a directed graph. in which the answer is yes to the question in (a) but no to the question in (b). Given a flow f in this graph. it searches for s-t paths in a graph df consisting only of edges e for which f(e) < ce.) It’s hard to convince your friends they need to reimplement the code.. you should find a set of edges F ___ E so that IFI = k and the maximum s-t flow in G’ = (V. 14. A set of evacuation routes is defined as a set of paths in G so that (i) each node in X is the taft of one path. (ii) the last node on each path lies in S. Over the next n nights. Decide whether you think this statement is true or false. it may choose them in any fashion it wants. The goal is to delete k edges so as to reduce the maximum s-t flow in G by as much as possible. but we want to enforce an even stronger version of the "no congestion" condition (iii). at a popular off-campus cooperative apartment. the Upson Collective. You are also given a parameter k. In addition to its blazing speed. E) (picture a network of roads). provide an example with the same G. Define an s-t cut for node-capacitated networks. and so their implementation builds a variant of the residual graph that only includes the forward edges. and it terminates when there is no augmenting path consisting entirely of such edges. in fact. so deciding who should cook on which night becomes a tricky task. We’ll call this the Forward-Edge-OnlY Algorithm. you realize that it’s not always finding a flow of maximum value. so that someone cooks on each of the nights. (a) Given G. the Forward-Edge-Only Algorithm is guaranteed to find a flow of value at least 1/b times the maximum-flow value (regardless of how it chooses its forward-edge paths). with source s ~ V. each of you is supposed to cook dinner for the co-op exactly once. a source s ~ V. 421 Now suppose we pick a specific edge e* ~ E and reduce its capacity by 1 unit. and ce = 1 for every e ~ E. your friends hadn’t really gotten into the whole backward-edge thing when writing the code. and S. 13. together with n . For concreteness. and a certain other collection S c V are designated as safe nodes. exams. and (iii) the paths do not share any edges. Consider the following problem. Your friends have written a very fast piece of maximum-flow code based on repeatedly finding augmenting paths as in Section 7. Show how to find a maximum flow in the resulting capacitated graph in time O(m + n). and a sink t ~ V. Let G = (V. Such a set of paths gives a way for the occupants of the populated nodes to "escape" to S. We define the Escape Problem as follows. that it never returns a flow whose value is less than a fLxed fraction of optimal. Give a polynomial-time algorithm to find an s-t maximum flow in such a node-capacitated network. (Note that we do not try to prescribe how this algorithm chooses its forward-edge paths. In this problem. E). provided that it terminates only when there are no forward-edge paths. (Assume that X and S are disjoint. Thus we change (iii) to say "the paths do not share any nodes. We are given a directed graph G = (V.F) is as small as possible subject to this.

Gk.and p~ to cook on night dk. to decide in only O(n2) time whether there exists a feasible dinner schedule for the co-op. each belonging to at least one demographic group in Xi. and assigned no one to cook on night de. The network is designed to carry traffic from a designated source node s to a designated target node t. Connecticut... However. and for person Pi. But for the other two people. The network administrators are running a monitoring tool on node s.. the maximum s-t flow in G has value k. advertiser i wants its ads shown only to users who belong to at least one of the demographic groups in the set X~. So if a user has told Yahoo! that he or she is a 20-year-old computer science major from Cornell University. dn}.. Now consider the problem of designing a good advertising policy-a way to show a single ad to each user of the site.. You want to fix Alanis’s mistake but without having to recompure everything from scratch. :. After great effort. G1. if he or she is a 50-year-old investment banker from Greenwich. p~ and pi.. there is someone cooking on each night.. a site like Yahoo! can show each user an extremely targeted advertisement whenever he or she visits the site. A feasible dinner schedule is an assignment of each person in the coop to a different night. and ff p~ cooks on night dj. * For a number r~. there are n users visiting the site. Suppose at a given minute. using her "almost correct" schedule. can at least ri of the n users... (If one exists. you notice a big problem.. we know that userj (forj = 1. G2 . then (a) Describe a bipartite graph G so that G has a perfect matching if and only if there is a feasible dinner schedule for the co-op.. New York. Unfortunately. n . and if so.) The site has contracts with m different advertisers. Further. (These . (b) Your friend Alanis takes on the task of trying to construct a feasible dinner schedule.) 16. Show that it’s possible. * For a subset X~ _c [G1 . people liked to claim that much of the enormous potential in a company like Yahoo! was in the "eyeballs"--the simple fact that millions of people look at its pages every day. they believe the attacker has destroyed only k edges. you discover that she has accidentally assigned both pg . for example.. n) belongs to a subset Ui _c (G1 .. she constructs what she claims is a feasible schedule and then heads off to class for the day.. the size of a minimum s-t cut). Because we have registration information on each of these users. But deciding on which ads to show to which people involves some serious computation behind the scenes. the minimhm number needed to separate s from t (i. for each i = 1. in which the capacity of each edge is 1 and in which each node lies on at least one path from s to t... If you issue the command ping(u). be shown an ad provided by advertiser i?) Give an efficient algorithm to decide if this is possible. (So ping(t) reports that no path currently exists. advertiser ~ wants its ads shown to at least r~ users each minute. dk and de.e.. the site can present a banner ad for apartments in Ithaca. when everything is running smoothly in the network. Now. to actually choose an ad to show each user. Gk} of the demographic groups: The problem is: Is there a way to show a single ad to each user so that the site’s contracts with each of the m advertisers is satisfied for this minute? (That is. For reasons that we won’t go into here... so we w~ model the network as a directed graph G = (v. Back in the euphoric early days of the Web. and we’ll assume they’re correct in believing t~s.2 of the people at the co-op are assigned to different nights on which they are available: no problem there. the current situation (and the reason you’re here) is that an attacker has destroyed some of the edges in the network.. G1 cad be equal to all residents of New York State. m. you should also output it. on the other hand. 2. and the other two days. dn} when they are not able to cook. You’ve been called in to help some network administrators diagnose the extent of a failure in their network. E). for a given node u. the site can display a banner ad pitching Lincoln Town Cars instead. so that each person cooks on exactly one night.422 Chapter 7 Network Flow Exercises 423 the nights {d1 .. and G2 cad be equal to all people with a degree in computer science. to show a certain number of copies of their ads to users of the site. by convincing people to register personal data with the site. there’s a set of nights Sic {dl . 2 . Suppose that the managers of a popular Web site have identified k distinct demographic groups groups can overlap..... when you look at the schedule she created. on the other hand. so that there is now no path from s to t using the remaining (surviving) edges. Gk} of the demographic groups. which has the following behavior. it will tell you whether there is currently a path from s to v. 17.. Here’s what the contract with the ith advertiser looks like. in a way that TV networks or magazines conldn’t hope to match..

. this. Lk. L’a . we can obtain a maximum matching even if we’re constrained to cover all the nodes covered by our initial matching M.. Give an algorithm that accomplishes this task using only O(k log n) pings. and let M be any matching in G.. You could do this by pinging every node in the network. with the following properties. L~ satisfying properties (A) and (B).29 An instance of Coverage Expansion. You’ve periodically helped the medical consulting firm Doctors Without Weekends on various hospital scheduling issues. . . and y2 is still covered by M’. So here’s the problem you face: Give an algorithm that issues a sequence of ping commands to various nodes in the network and then reports the full set of nodes that are not currently reachable from s... and each is asked to provide a list of days on which he or she is wiJJing to work.Report (correctly) that there is no set of lists L~. L~ . . for i = 1. on day i. (c) Let G be a bipartite graph. Y2) and (x2. and they’ve just come to you with a new problem. L~ . it causes exactly pi doctors to be present on day i.. give a polynomial-time algorithm that takes the numbers Pl.. (a) Describe a polynomial-time algorithm that implements this system..K1 is the size of the largest matching M’ so that every node y that is covered by M is also covered by M’. Example. E). Consider Figure 7. through judicious use of the ping command. Prove that in fact K~ = K~.. and k. The system produced by the consulting firm should take these lists and try to return to each doctor j a list L. we want to decide if there is a matching M’ inG so that (i) M’ has k more edges than M does.. L~ exists. Thus doctor j provides a set Li of days on which he or she is willing to work. If M is a matching in G... and each edge has one end in X and the other in Y. For a given number k.. but in any solution M’. 2 .Return lists L~. Y2). (a) Consider the following problem. specified by G. We are given G and a matching M In G. . so that doctor j only works on days he or she finds acceptable. of course. but unfortunately) that no acceptable set of lists L~. L~ that satisfies both properties (A) and (B). the hospital has determined the number of doctors they want on hand. M. Suppose we are asked the above question with k = !. M’ has one more edge than M. Specifically... the edges of M do not form a subset of the edges of M’. Consider the following two quantities.. Y4). and the lists L1 . L~. (B) If we consider the whole set of lists L~ . and we will say that M’ is a solution to the instance. (You should include an analysis of the running time and a brief proof of why it is correct. thus. In issuing this sequence. Give a polynomial-time algorithm that takes an instance of Coverage Expansion and either returns a solution M’ or reports (correctly) that there is no solution. and k. and suppose M is the matching consisting of the edge (xl. and does one of the following two things.. is a subset of Li. 19... Clearly K1 _< K2. with input G.K2 is the size of the largest matching M" in G. There are k doctors. n. Pa .Exercises 424 Chapter 7 Network Flow 425 ping(s) always reports a path from s to itself. so that the following situation happens.) Note: You may wish to also look at part (b) to help in thinking about (b) Give an example of an instance of Coverage Expansion.. We can let M’ be the matching consisting (for example) of the two edges (xl. (A) L. M.) Since it’s not practical to go out and inspect every edge of the network.29.. but you’d like to do it using many fewer pings (given the assumption that only k edges have been deleted). that is. since K2 is obtained by considering all possible matchings In G. As usual. and so it often happens that the system reports (correctly. We call this the Coverage Expansion Problem. we say that a node y ~ Y is covered by M if y is an end of one of the edges in M.. Figure 7. or .. We consider the Bipartite Matching Problem on a bipartite gr.. they have a requirement that exactly Pi doctors be present... and (ii) every node y ~ Y that is covered by M is also covered by M’. (b) The hospital finds that the doctors tend to submit lists that are much too restrictive.aph G = (V. they’d like to determine the extent of the failure using this monitoring tool. 18. pn.. we say that V is partitioned into sets X and Y. your algorithm is allowed to decide which node to pIng next based on the outcome of earlier ping operations. The instance has a solution. Then the answer to ~s instance of Coverage Expansion is yes.. For each of the next n days.

it causes exactly Pi doctors to be present on day i. to which a laptop can connect when it is in range. 2 Report (correctly) that there is no set of lists Lv L2 .. c4.. Example. ’ satisfying properties (A*) and (B). contractors involved in the experiment. there are n = 4 conditions labeled q. L~. c2. Describe a polynomial-time algorithm that implements this revised system. Each balloon can make at most two measurements. and balloon 4 comes from the third subcontractor... so for each balloon i = 1 . m. Each of the balloons is produced by one of three different sub.. and the system now should try to return to each doctor j a list L~ with the following properties. you need to try having laptops make contact with access points in such a way that each laptop and each access point is involved in at least one connection. Lk ’ that satisfies both properties (A*) and (B). Your friends are involved in a large-scale atmospheric science experiment. c4}. 21. they plan to take each measurement from at least k different balloons.. Unfortunately. (Note that a single balloon should not measure the same condition twice. cs. p). the sets Si for each of the ra balloons. .426 Chapter 7 Network Flow Exercises 427 Thus the hospital relaxes the requirements as follows. then e is within range ofp (i. c4. . and ¯ balloon 4 measure conditions c~. Pn. laptop e is within range of a set S~ of access points. It should take the numbers Pl." You hadn’t heard anything about subcontractors before. and there are rn = 4 balloons that can measure conditions. not all balloons are capable of measuring all conditions. and they have a set of m balloons that they plan to send up to make these measurements. (B) (Same as before) If we consider the whole set of lists L~ . (a) Give a polynomial-time algorithm that takes the input to an instance of this problem (the n conditions... plus the new requirement about subcontractors... 2 . with the properties that (i) If (LP) ~ T. it turns out there’s an extra wrinkle they forgot to mention . and $3 = $4 = {q. You’re helping to organize a class on campus that has decided to give all its students wireless laptops for the semester. and to your surprise they reply. (i~) Each access point appears in at least one ordered pair in T... However. You show your friends a solution computed by your algorithm from (a).Return lists L1.. ¯ balloon 3 measure conditions q. Then our previous solution no longer works. p ~ Se).. Pz . A requirement of the experiment is that there be no condition for which all k measurements come from balloons produced by a single subcontractor.. They add a new parameter c > 0. balloons 2 and 3 come from the second subcontractor... while each balloon only measures at most two conditions. The laptops are currently scattered across campus. 20.. Suppose that k = 2.. .. q. c4.. Thus there is a collection of n wireless laptops. "This won’t do at all--one of the conditions is only being measured by balloons from a single subcontractor....Then one possible way to make sure that each condition is measured at least k = 2 times is to have ¯ balloon I measure conditions ¯ balloon 2 measure conditions cz. Lk. suppose balloon 1 comes from the first subcontractor.L . they have a set Si of conditions that balloon i can measure. c3. c2.. as both of the measurements for condition c~ were done by balloons from the second subcontractor. n. or Lk ..) They are having trouble figuring out which conditions to measure on which balloon. for i = 1. Finally. we will also assume that every access point p has at least one laptop within range of it. c3}. and use balloons 3 and 4 to each measure conditions c3. and do one of the following two things. They need to get good measurements on a set S of n different conditions in the atmosphere (such as the ozone level at various places). . c4. c2. to make the results more reliable.e.. For example. To make sure that all the wireless connectivity software is working correctly. and the parameter k) and decides whether there is a way to measure each condition by k different balloons. (ii) Each laptop appears in at least one ordered pair in T.. and the parameter c > 0. Thus we will say that a test set T is a collection of ordered pairs of the form (e. subject to the limitation that S~ = Sz = {q.. Explain how to modify your polynomial-time algorithm for part (a) into a new algorithm that decides whether there exists a solution satisfying all the conditions from (a).. We will assume that each laptop is within range of at least one access point (so the sets S~ are nonempty).. (A*) L~ contains at most c days that do not appear on the list Lj. the lists L1 ._ we could use balloons 1 and 2 to each measure conditions c1. for a laptop e and access point p.. there is also have a collection of n wireless access points.

access point 3) would form a test set of size 3. Here’s one way to divide the nodes of G into three categories of this sort. Suppose you’re looking at a flow network G with source s and sink t. n. Let mij denote the entry in row i and column j. negative.. so we have to be careful in how we try making this idea precise. or zero. in other words. Little Ida. an s-t of capacity strictly less than that of all other s-t cuts). and nonnegative edge capacities {ce}. another will occasionally make communal grocery runs to the nearby organic food emporium.. laptop I is within range of access points 1 and 2. This can be easily made to happen as follows: If it turns out that i owes j a positive amount x.428 Chapter 7 Network Flow Exercises 429 This way. all the diagonal entries of M are equal to 1. B). some nodes are clearly on the "sink side" of the main bottlenecks. then we will subtract off y from both sides and declare a~j = x . and laptop 3 is within range of access points 2 and 3. for certain ordered pairs (i. by trying out all the connections specified by the pairs in T. and for each ordered pair (i. Let G = (V. with the expectation that everything will get balanced out fairly at the end of the year. Over the course of a year. or centra!. we have ~ ~ B--that is. Give an algorithm that takes a flow network G and classifies each of " its nodes as being upstream. after a!l the swapping. Suppose you live in a big apartment with a lot of friends. so that everyone departs on good terms. sink t ~ V. (a) Give an example of a matrix M that is not rearrangeable. certain people will write checks to others. We will reqt~e that for any two people ~ and j. Swapping two columns is defined analogously. B). but for which at least one entry in each row and each column is equal to !. there is at least one minimum s-t cut (A. for all minimum s-t cuts (A. 23.. o We say a node v is central if it is neither upstream nor downstream. we have v ~ A--that is. B’) for which v ~ B’. G can have many minimum cuts. The problem is: Given the sets Se for each laptop (i. The running time of your algorithm should be within a constant factor of the time required to compute a single maximum flow. (a) Give an example of an instance of this problem for which there is no test set of size n. decide whether there is a test set of size at most k. access point 2). In terms of all these quantities. We say that M is rearrangeable if it is possible to swap some of the pairs of rows and some of the pairs of columns (in any sequefice) so that. 24. laptop 2 is within range of access point 2. (b) Give a polynomial-time algorithm that determines whether a matrix M with 0-1 entries is rearrangeable. Example. minus the sum of the amounts that i owes everyone else. ~ We say a node v is downstream if. i will write a check to j for an amount bi~ > O. (Recall that we assume each laptop is within range of at least one access point. downstream. . and a number k. we can be sure that each laptop and each access point have correctly functioning software. (Note that an imbalance can be positive. for all minimum s-t cuts (A. we now define the imbalance of a person ~ to be the sum of the amounts that i is owed by everyone else. A diagonal entry is one of the form mii for some i. and at least one minimum s-t cut (A’.. and a thud might sometimes use a credit card to cover the whole bill at the local Italian-Indian restaurant. v lies on the source side of every minimum cut. B) for which v ~ A. there are many occasions when one of you pays for an expense shared by some subset of the apartment. (laptop 2. and each access point p has at least one laptop within range of it. E) be a directed graph. and you want to be able to express something like the following intuitive notion: Some nodes are clearly on the "source side" of the main bottlenecks. which laptops are within range of which access points). 2 ..y while ag = 0.e. v lies on the sink side of every minimum cut. For example. 25. 22. one of you may pay the whole phone bill in a given month.e. Swapping rows i and j of the matrix M denotes the following action: we swap the values m~k and mjk for k = 1. it’s now the end of the year and time to settle up. accumulated over the course of the year.j) there’s an amount a~. * We say a node v is upstream if. Then the set of pairs (laptop 1o access point 1).) (b) Give a polynomial-time algorithm that takes the input to an instance of this problem (including the parameter k) and decides whether there is a test set of size at most k. and some nodes are in the middle.l _> 0 that i owes ]. andj owes i a positive amount y < x.j). Let M be an n x n matrix with each entry equal to either 0 or 1. (laptop 3. at least one of the quantities aij or aji is equal to 0. Give a polynomial-time algorithm to decide whether G has a unique minimum s-t cut (i.. There are n people in the apartment. with source s ~ V.) In order to restore all imbalances to 0. Suppose that n = 3. In any case. However.

Now consider the case in which the phone at pl moves due east a distance of 4 units. and the morning commute from Woodside to Palo Alto seems like the only option. Finally. is equal to the imbalance of i. Pk}. there is always a consistent reconciliation in which at most n. a sequence p~. traveling continuously for a total of z units of distance due east. A~ may not be an integer. suppose the carpool plan lasts for d days.(3. 0). Suppose we have phones at Pl = (0. the eventual office-hour schedule will consist of a subset of some. when p~ passes the point (2. The course administrator enters a collection of nonoverlapping one-hour time intervals I1.. 1) and b2 ----. specified as points Pl .. from the giant microwave towers you sometimes see sprouting out of corn fields and cow pastures. pi~ with p~ ~ St-and that a fair driving schedule is one in which each p~ is chosen as the driver on at most days.. and we reassign pl to bz and Pz to b~ during the motion (for example. but . so they want to make sure that any carpool arrangement they agree upon is fair and doesn’t overload any individual with too much driving. (You should assume that all other phones remain stationary during this travel. they all hate to drive. Finally.. (a) Prove that for any sequence of sets ~ .. We say that the total driving obligation of p~ over a set of days is the expected number of times that p1 would have driven. Some of your friends with jobs out West decide they rea!ly need some 26.-i¯ Ideally.430 Chapter 7 Network Flow We will say that a set of checks constitutes a reconciliation if. Some sort of simple round-robin scheme is out. So they decide to carpool to work. unfortunately. by giving a polynomial-time algorithm to compute such a reconciliation. the total value of the checks received by i. p~ . 27.. Exercises 431 motion: We begin by assigning Pl to b~ and P2 to b2. to handle aspects of course planning that are not currently covered by the software. Suppose that the owner of the cell phone at point Pl decides to go for a drive.. Show that. you should report a point on the traveling phone’s path at which full connectivity cannot be maintained. Example.. Their initial prototype works as follows. then the straight-line distance between the points Pi and b1 is at most A... So let’s say that a driving schedule is a choice of a driver for each day--that is. Unfortufiately. 1). specified as points bl . so we say that a reconci~ation is consistent if... Pn in the plane. for each person i. Give a polynomial-time algorithm to decide whether it is possible to keep the set of phones fully connected at all times during the travel of this one cell phone. for any set of amounts a~j. 1). We are given the locations of n base stations. had a driver been chosen uniformly at random from among the people going to work each day.. we are given’a range parameter A > 0. and A = 2.. The office hour schedule will be the same from one week to the next.~S~ I~. Let the people be labeled {Px . You can tell that cellular phones are at work in rural communities.. More concretely. and * If a phone at p~ is assigned to a base station at bj. 0)). because none of them goes to work every day. I2 . We call the set of cell phones fully connected if it is possible to assign each phone to a base station in such a way that * Each phone is assigned to a different base station. it is the case that aij > O. and on the ith day a subset St _ S of the people go to work. ending at (4.. ff it is not possible. They’re beginning with a module that helps schedule office hours at the start of the semester. Then it is possible to keep the phones fully connected during this extra time each day to sit in front of their laptops. we may have to update the assignment of phones to base stations (possibly several times) in order to keep the set of phones fully connected. 0) and P2 = (2. and so the subset of them in the car varies from day to day.. Let’s consider a very simplified model of a cellular phone network in a sparsely populated area. (b) Give an algorithm to compute a fair drixring schedule with ~g time polynomial in k and d. Here’s one way to define fairness. A group of students has decided to add some features to Cornell’s on-line Course Management System (CMS). You should try to mak~ your algorithm run in O(n3) time if possible.. whenever i writes a check to j. you should report a sequence of assignments of phones to base stations that will be sufficient in order to maintain full connectivity. minus the total value of the checks written by i. Then the above definition of the total driving obligation ~ for pj can be written as A~ = ~:.. As this cell phone moves. bn in the plane. Ik when it would be possible for teaching assistants (TAs) to hold office hours..) If it is possible. 28.... there e~xists a fair driving schedule. you and your friends feel it is bad form for i to write j a check if ~ did not actua!ly owe j money.1 checks get written. We are also given the locations of n cellular phones. we have base stations at bl = (1. so it’s enough to focus on the scheduling problem for a single week.. Sa. we’d like to require that p~ drives at most A~ times.

So what they want to be able to do is to specify an office-hour density parameter for each day of the week: The number d~ specifies that they want to have at least d~ office hours on a given day i of the week. Due to small but fundamental incompatibilities between the two systems. rare applications. the course administrator specifies. so that each TA is available for each of his or her officehour slots. In particular. but we’re keepIng this example simple. Suppose there are five possible time slots for office hours: 11 =Mon 3-4 P. or Thursday. There are two TAs. and the second TA to hold office hours In time slots I2 and I5. there’s no way to port application 1 to the new system. they observe that it’s good to have a greater density of office hours closer to the due date of a homework assignment.0. and we want exactly c = 3 office hours per week total. or . (In general. time on Tuesday. there’s a problem .M.Constructs a valid schedule for office hours. 2 . (Another solution would be to have the first TA hold office hours in time slots 11 and I4. (There should be only one TA at each office hour. and c.. we add the constraint that we want at least one office hour on Wednesday and at least one office hour on Thursday.. Nevertheless..ify than this. rtmning on their old system.432 Chapter 7 Network Flow generally. which they are currently running out of their parents’ garages in Santa Clara. and the second TA hold office hours in time slot Is.M. 12 = Tue 1-2 P. Then the previous solution does not work. suppose that in our previous example. at any time on Monday or Wednesday afternoons.M. and the parameters a. TA availabiliW might be more complicated to spec. {1. the TA schedules. I4 = Wed 3-4 and I5 = Thu 10-11 A.) Finally. Then each of the TAs enters his or her weekly schedule. 29.... then. b. Unfortunately.. They have a collection of n soft. and they’re facing the following problem.) Example. accruing the associated benefit and Incurring the expense of the interaction between applications on different systems. Some of your friends have recently graduated and started a small company. So. Finally. will incur an expense if they move one of i or j to the new system but not both. and they would like a total of exactly c office hours to be held over the course of the week. then the company. but now some of the benefits b~ for . One possible solution would be to have the first TA hold office hours in Be slot I1. n} for which the sum of the benefits minus the expenses of moving the applications in S to the new system is maximized. any apphcation can potentially be moved. ff the situation were really this simple. but there is a possible solution In which we have the first TA hold office hours in tLme slot 11. they expect a net (monetary) benefit of b~ > 0. The problem.) Give" a polynomial-time algorithm that computes office-hour schedules under this more complex set of constraints. n}. ff any. They’re in the process of porting all their software from an old system to a new. So this is the question they pose to you: Which of the remaining applications. 30..Reports (correctly) that there is no valid way to schedule office hours. and the second would be able to hold office hours at any. for parameters a. (b) This office-hour scheduling feature becomes very popular. should be moved? Give a polynomial-time algorithm to find a set S ~ {2. is how to assign each TA to some of the officehour time slots. it will have to remain on the old system. The algorithm should either construct a schedule or report (correctly) that none exists. revved-up system. each TA should hold between a = 1 and b = 2 office hours. achieving a total benefit of ~... and so that the right number of office hours gets held. the first would b