## Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

**Math 381 Course Notes
**

University of Washington

Prof. Sara Billey

Winter Quarter, 2011

c R. J. LeVeque

2

Acknowledgments: These notes are based on many quarters of Math 381 at the University of

Washington. They have developed in a wiki fashion. The original notes were compiled by Randy

LeVeque in Applied Math at UW. Tim Chartier and Anne Greenbaum contributed signiﬁcantly to the

current version in 2001-2002. Jim Burke contributed a portion of the chapter on linear programming.

Sara Billey contributed a portion of the chapter on Stochastic Processes. We thank them for their

eﬀorts. Since this document is still evolving, we appreciate the input of our readers in improving the

text and its content.

Sara Billey adds:

I have gotten a lot of inspiration for this course from the following sources:

• ”Operations Research” by Wayne Winston.

• ”Mathematics in Service to the Community: Concepts and Models for Service-learning in the

Mathematical Sciences” (Maa Notes #66) edited by Charles R. Hadlock.

• ”Discrete Mathematical Models: With Applications to Social, Biological, and Environmental

Problems” by Fred Roberts.

• ”Freakonomics” by Steven Levitt and Stephen Dubner

• ”Three Cups of Tea: One Man’s Mission to Promote Peace . . . One School at a Time” by Greg

Mortenson and David Oliver Relin

• ”Mountains beyond mountains” by Tracy Kidder

• My friend Amy Smith: inventor of low-tech devices for use in developing countries. See

http://en.wikipedia.org/wiki/Amy_Smith

http://www.ted.com/index.php/talks/amy_smith_shares_simple_lifesaving_design.html

The following information is the bibtex citation for these notes if you want to cite them as a reference:

@Book{381notes,

author = {Sara Billey, James Burke,Tim Chartier, Anne Greenbaum and Randy LeVeque},

title = {Discrete Mathematical Modeling: Math 381 Course Notes},

year = {2010},

publisher = {University of Washington},

}

Math 381 Notes — University of Washington —Winter 2011

Contents

1 Introduction to Modeling 1

1.1 Lifeboats and life vests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 The modeling process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3 The goals of 381 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Graph Theory, Network Flows and Algorithms 9

2.1 Graphs and networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.2 The traveling salesman problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.2.1 Brute force search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.2.2 Counting the possibilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.2.3 Enumerating all possibilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.2.4 Branch and bound algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.3 Complexity theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.4 Shortest path problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.4.1 Word ladders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.4.2 Enumerating all paths and a branch and bound algorithm . . . . . . . . . . . . . 14

2.4.3 Dijkstra’s algorithm for shortest paths . . . . . . . . . . . . . . . . . . . . . . . . 14

2.5 Minimum spanning trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.5.1 Kruskal’s algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.5.2 Prim’s algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.5.3 Minimum spanning trees and the Euclidean TSP . . . . . . . . . . . . . . . . . . 16

2.6 Eulerian closed chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.7 P, NP, and NP-complete problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3 Stochastic Processes 25

3.1 Basic Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.2 Probability Distributions and Random Variables . . . . . . . . . . . . . . . . . . . . . . 26

3.2.1 Expected Value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

3.3 The Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

3.4 Buﬀon’s Needle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4 Monte Carlo Methods 33

4.1 A ﬁsh tank modeling problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.2 Monte Carlo simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4.2.1 Comparison of strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4.3 A hybrid strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

4.4 Statistical analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

4.5 Fish tank simulation code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

i

ii CONTENTS

4.6 Poisson processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

4.7 Queuing theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

5 Markov Chains and Related Models 55

5.1 A Markov chain model of a queuing problem . . . . . . . . . . . . . . . . . . . . . . . . 55

5.2 Review of eigenvalues and eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

5.3 Convergence to steady state . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

5.4 Other examples of Markov Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

5.4.1 Breakdown of machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

5.4.2 Work schedules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

5.4.3 The ﬁsh tank inventory problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

5.4.4 Card shuﬄing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

5.5 General properties of transition matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

5.6 Ecological models and Leslie Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

5.6.1 Harvesting strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

5.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

6 Linear Programming 73

6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

6.1.1 What is optimization? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

6.1.2 An Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

6.1.3 LPs in Standard Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

6.2 A pig farming model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

6.2.1 Adding more constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

6.2.2 Solving the problem by enumeration . . . . . . . . . . . . . . . . . . . . . . . . . 82

6.2.3 Adding a third decision variable . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

6.2.4 The simplex algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

6.3 Complexity and the simplex algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

7 Integer Programming 89

7.1 The lifeboat problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

7.2 Cargo plane loading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

7.3 Branch and bound algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

7.4 Traveling salesman problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

7.4.1 NP-completeness of integer programming . . . . . . . . . . . . . . . . . . . . . . 95

7.5 IP in Recreational Mathematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

7.5.1 An Example Puzzle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

7.5.2 Formulation of the IP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

7.6 Recasting an IP as a 0-1 LP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

7.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

8 Evolutionary Trees 101

8.1 The problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

8.2 Maximum parsimony . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

8.3 Maximum likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

8.4 Searching the forest for the optimal tree . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

9 Forest Resource Management 107

9.1 Timber harvest scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

9.2 Spatial structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

9.3 Building roads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

9.4 Some other issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

Math 381 Notes — University of Washington —Winter 2011

Chapter 1

Introduction to Modeling

1.1 Lifeboats and life vests

Recently there was an article in the Seattle Times about Washington State Ferries and the fact that

they do not carry enough lifeboats for all passengers. They do carry enough life vests to make up the

diﬀerence. It would be better to carry more lifeboats since your chances of surviving are better in a

lifeboat than in cold water. However, lifeboats take up much more space and so there is a tradeoﬀ —

the capacity of the ferries would be greatly reduced if they had to carry enough lifeboats for everyone.

Suppose the XYZ Boat-builders have come to us with the following problem. They are designing a

new ferry boat and want to determine the number of lifeboats and life vests to equip it with. They give

us the following information:

• Each vest holds 1 person and requires 0.05 m

3

of space to store.

• Each boat holds 20 people and requires 2.1 m

3

of space to store.

• We must have capacity for 1000 people in one or the other.

• The total space to be devoted to this equipment is at most 85 m

3

.

How many of each should they install?

One might go through the following steps in determining a solution:

1. First see if it’s possible to accommodate everyone in lifeboats, which would be best. This would

require (1000/20) ×2.1 = 105m

3

of space, so the answer is no, we must use some combination.

2. Make sure there’s enough space for life vests for everyone, otherwise the problem is impossible:

1000 ×.05 = 50m

3

, so it would be possible to provide only life vests and have space left over. But

for the best solution we’d like to use more of the available space and provide as many boats as

possible.

3. Figure out how many boats and vests are used if we exactly use all the available space and provide

exactly the necessary capacity. We can set this up as a mathematical problem. There are two

unknowns

x

1

= number of vests

x

2

= number of boats

1

2 Introduction to Modeling

and two equations to be satisﬁed, one saying that the total capacity must be 1000 people and the

other saying that the total volume must be 85m

3

. This is a linear system of equations:

C

1

x

1

+C

2

x

2

= C

V

1

x

1

+V

2

x

2

= V

(1.1)

where

C

1

= 1 (capacity of each vest)

C

2

= 20 (capacity of each boat)

C = 1000 (total capacity needed)

V

1

= 0.05 (volume of each vest)

V

2

= 2.1 (volume of each boat)

V = 85 (total volume available)

4. Solve the mathematical problem. Equation (1.1) is just a 2 ×2 linear system so it’s easy to solve,

obtaining

x

1

= 363.6364 (number of vests)

x

2

= 31.8182 (number of boats)

5. Interpret the mathematical solution in terms of the real problem. Clearly we can’t actually provide

fractional vests or boats, so we’ll have to ﬁx up these numbers. The only way to do this and keep

the volume from increasing is to reduce the number of boats to 31 and increase the number of

vests to keep the capacity at 1000. Since 31 ×20 = 620 this means we need 380 vests. The total

volume required is 84.1m

3

.

6. Present the answer to the XYZ Company. At which point they say: “You must be crazy. Every-

body knows that the lifeboats are arranged on symmetric racks on the two sides of the boat, and

so there must be an even number of lifeboats”. Fix up the solution using this new information.

There must be 30 boats and 400 vests.

7. Reconsider the solution to see if we’ve missed anything. Notice that the original mathematical

solution had x

2

almost equal to 32. This suggests that if we had just a little more space (85.2m

3

),

we could increase the number of boats to 32 instead of 30, a substantial improvement. Suggest

this to XYZ. They respond: “Well sure, the 85m

3

ﬁgure was just an approximation. We can easily

ﬁnd another 0.2m

3

if it means we can ﬁt in two more boats.

1.2 The modeling process

This simple example illustrates some of the issues involved in mathematical modeling:

• The problem is usually not presented as a standard mathematical problem, but must be worked

into this form.

• There may be many diﬀerent ways to model the same problem mathematically. In the above

example we reduced it to a linear system. But for more complicated problems of this same type

this would not be possible and we would have to deal directly with the fact that we are given

inequality constraints (e.g., the total volume must be no more than 85m

3

) rather than equalities.

The problem is better set up as a linear programming problem, a type of discrete optimization

problem in which we try to maximize the number of boats subject to two sets of constraints (an

upper bound on volume and a lower bound on capacity). Actually it is an integer programming

problem since we also have the constraint that the solution must be a pair of integers. There are

standard methods for solving these types of problems.

1.3 The goals of 381 3

• Once the “mathematical problem” has been solved (e.g., the linear system above), we need to

interpret the answer to this problem in terms of the original problem. Does it make sense?

• Often the information presented is not complete or necessarily accurate. Here, for example, we

weren’t told to begin with the information that the number of boats had to be even, or that the

available volume was only approximate.

• Often the problem is far too hard to model in detail and solve exactly. This example was quite

straightforward in this regard, but typically one must make decisions about what aspects to

consider and which can be safely ignored (or must be ignored to get a solvable problem). Often

there is no “right answer”, though some answers are much better than others!

Mathematical modeling is an ongoing process of overcoming these issues described above. We

summarize the process with the following picture:

Real world

prediction

Mathematical

Solution/Prediction

Mathematical

Problem/Model

Identify a real world

collect real world

data

Translation

T

e

s

tin

g

P

r

e

d

ic

tio

n

Interpretation

goal. Collect

Figure 1.1: The Modeling Method

From the picture we see that the Modeling Method is a cyclic process. We can always continue to

improve our model when new information comes in. The most important step is “Testing”. Whenever

we create a mathematical model, we need to assess the validity of the predictions. Did the proposed

solution work better? If so, by how much? If not, revise the model.

1.3 The goals of 381

The goals of Math 381 are the following:

• Learn about the modeling process. How does one take an ill-deﬁned “real world problem” and

reduce it to a mathematical problem that can be solved by standard techniques? What issues are

involved in dealing with real data and complex problems, where we can often only hope to model

some small piece of the total problem?

• In order to have any chance of making a model that reduces to a “standard mathematical problem”,

we must learn what some of these standard problems are. For example, in the above problem we

recognized early on that we could set it up as a linear system of equations, and we knew that this

would be a good thing to do since there are standard methods for solving such systems (such as

Gaussian elimination). We need some more tools in our tool bag in order to do modeling. We

will see a variety of standard problems and how each can arise in many diﬀerent contexts. These

will be introduced in the course of looking at a number of modeling problems.

• In order to learn about the modeling process, we need to study real world problems that eﬀect

real people. We will seek out and solve problems related to the community around us. The course

culminates in a ﬁnal modeling project. Final projects can be inspired by some of the challenges

faced by non-proﬁt organizations, government agencies, small businesses, or the university. This

4 Introduction to Modeling

is a service-learning course for which you can receive S credit on your transcript by satisfying the

requirements.

• The project also must be written up in an appropriate manner. This is an important part of any

real-world modeling project and an important skill to master. During the quarter you also will

read some papers written by others where modeling problems are presented and do some shorter

writing assignments. This will build your written communication skills. You will receive W credit

on your transcript for this course.

• Another goal of the course is to give you experience in working together with others on a mathe-

matical problem. Discussion will be encouraged in class and collaboration is encouraged outside

of class. The projects must be done in groups. This is the way most modeling is done in the real

world: by teams of mathematicians, or more commonly by a mathematician working together

with community members, scientists, engineers, or specialists in another discipline.

• Math 381 concentrates on discrete modeling, which means the mathematical problems typically

involve a discrete set of values or objects (e.g. the life vests, lifeboats, people, boxes, airplanes,

DNA sequences, dollars, ﬁsh tanks, etc.) Moreover there are typically a discrete set of possibilities

to choose between. Trying out all of the possibilities to see which is best might be one possible

approach, but is typically very ineﬃcient. We want to use mathematical techniques to make the

job easier, cheaper, or more beneﬁcial.

For example, it is clear that the solution to the problem above must consist of at most 1000 vests

(since this would accommodate everyone) and at most 40 lifeboats (since this would use all the

available space). One could try all possible combinations and ﬁnd the right mix by trial and error,

but as mathematicians we recognize that it is much easier to simply solve a 2 ×2 linear system.

There are many standard mathematical problems and techniques which are useful in discrete

modeling. Some of these are:

– Linear programming and integer programming,

– Combinatorics and combinatorial optimization,

– Graph theory,

– Network ﬂows and network optimization,

– Dynamic programming,

– Queuing theory,

– Stochastic processes and Markov chains,

– Monte Carlo simulation,

– Diﬀerence equations and recurrence relations,

– Discrete dynamical systems,

– Game theory,

– Decision analysis,

– Time series analysis and forecasting.

Many modeling problems are better solved with continuous models, in which case diﬀerent mathe-

matical techniques are used, for example diﬀerential equations. Such models are studied in AMath

383.

• The main emphasis of this course will be on the modeling process, taking a problem and setting

it up as a mathematical problem, and interpreting the results of the model. Along the way we

also will learn something about techniques for solving some of the standard problems listed above,

e.g., solving a linear programming problem by the “Simplex Method”. This is necessary in order

1.3 The goals of 381 5

to solve the modeling problem, but the study of methods for solving these problems is not the

main emphasis of this course.

There are other courses which concentrate on understanding the mathematical aspects of these

problems and the methods which have been developed to solve them. Some of these are:

– Math 461: Combinatorics

– Math 462: Graph Theory

– Math 407: Linear optimization

– Math 408: Nonlinear optimization

– Math 409: Discrete optimization

– Math 491, 492: Stochastic processes

– CSE 373: Data structures and algorithms

– CSE 417: Algorithms and Complexity

– Industrial Engineering 324,5,6: Operations Research

This course will provide an introduction to many of these topics which might motivate you to

take some of these other courses and pursue the mathematics in more depth.

There are also other related courses in other departments, including Industrial Engineering and

Management Science. See the description of the Operations Research Option in the ACMS B.S.

Degree Program for a list of some of these:

http://www.ms.washington.edu/acms

6 Introduction to Modeling

1.4 Exercises

1. Describe three problems that arise in everyday life which could be attacked using discrete math-

ematical modeling. Consider recent events in the news, life on campus, games you like to play,

small business issues, and clubs that you belong to. Write approximately one paragraph for each

problem.

2. Consider the problem about “Lifeboats and Lifevests.” Practice breaking down a modeling prob-

lem into the 4-stages of the “Mathematical Modeling Cycle” by carefully describing each stage of

this modeling problem. In particular, state the goal in real world terms and translate that goal

into a mathematical goal stated in terms of variables and equations. Deﬁne all of your notation

and assumptions so that a reader who has not read our course notes would be able to follow your

description.

3. Farmer Jane is a local organic farmer. She owns 45 acres of land. She is going to plant each acre

with pumpkins or strawberries. Each acre planted with pumpkin yields $200 proﬁt; each with

strawberries yields $300 proﬁt. The labor and fertilizer used for each acre are given in the table

below:

Pumpkin Strawberry

Labor : 3 workers 2 workers

Fertilizer : 2 tons 4 tons

One hundred vollunteer workers and 120 tons of composted fertilizer are available.

(a) Help Jane decide how many acres of each crop to plant in order to maximize proﬁt.

(b) Describe each step of the modeling cycle with regard to Jane’s farming situation.

(c) What new information might come to light during the testing phase of the modeling cycle?

4. Suppose we are trying to design a ferry schedule for a set of islands, similar to the San Juan

Islands in the Puget sound. There are 8 islands with diﬀerent personalities. Some are big tourist

centers, others are “bedroom communities”, with many people commuting to mainland jobs each

day, and others are still fairly rural. What are some of the issues you would need to address in

designing a ferry schedule for this set of islands? Write down some questions you would need to

answer before starting, and some ideas as to how you might go about modeling this and what

sorts of strategies you might consider in designing a schedule. Can you identify any mathematical

problems that would naturally arise?

5. On our class website, you can ﬁnd a network map of the United States from the Rand McNally

Road Atlas which shows distances and estimated driving times between various cities in the west.

(a) Suppose you want to drive from San Francisco to Denver. Using the data on the network

map, what is the best route to take? (First think about what “best” means!)

(b) Suppose you are planning your vacation for next summer and would like to visit a number

of diﬀerent places:

• Reno, Nevada

• Yellowstone National Park

• Denver, Colorado

• Salt Lake City

• San Francisco, California

• Flagstaﬀ, Arizona

Your trip will begin and end in Seattle. What is the best route to take? You don’t need to

ﬁnd the answer, but think about what is involved in solving this problem and write down

some of your thoughts about translating this goal into a mathematical problem. Again, what

1.4 Exercises 7

is meant by “best”? Are there other constraints you might want to consider before trying to

answer the question?

6. Read “Decision Rules for the Academy Awards Versus Those for Elections” by William Gehrlein

and Hemant Kher. Interfaces Vol. 34, No. 3, May-June 2004, pp. 226-234. This paper is included

in the appendix of the course notes.

(a) You have been asked to evaluate the voting strategy used by the Academy Awards applied

to job applications reviewed by a committee. Write a one paragraph summary of the voting

strategy used for the Academy Awards. Write a second paragraph indicating the pros and

cons of the this strategy in evaluating applications.

(b) Explain in your own words why the most popular movie of the year is not chosen to receive

the “Best Picture” award by the Academy. Use the vocabulary you have learned from this

article.

8 Introduction to Modeling

Math 381 Notes — University of Washington —Winter 2011

Chapter 2

Graph Theory, Network Flows and

Algorithms

2.1 Graphs and networks

Imagine you are planning a trip around the United States based on a map showing distances between

various cities and points of interest. The map has a labeled point for each city and lines connecting

nearby cities are labeled by the distance between the two cities. One might ask a question like “How

should we drive if we want to visit every city on the map?” This is an example of a network ﬂow problem

from graph theory. More generally, a graph is a collection of vertices or nodes which are connected by

edges.

A graph is connected if there is a way to get from any vertex to any other vertex by following edges.

Figure 2.1 shows a few types of (connected) graphs that arise in many diﬀerent applications:

• A directed graph (or digraph) has arrows on edges indicating what direction one can move between

nodes.

• A network has weights on the edges, often indicating the cost of moving from one vertex to the

neighbor, e.g. the distance or driving time in the map example. A network could also be directed,

for example if some routes on the map are one-way streets.

• A tree is a connected graph which contains no circuits. A family tree is one familiar example.

Graph Directed graph Network Tree

2

3

6

4

12

3

6

4

2

8

5

10

Figure 2.1: Some common graphs.

The vertices and edges of a graph might represent any number of diﬀerent things, depending on the

application. For example, the vertices in a digraph might represent legal positions in a game, with an

arrow from Vertex A to Vertex B if you can go from Position A to Position B by a single legal move.

Or the vertices could represent individuals in a hierarchical organization, with an edge from A to B if

Person A reports to Person B. These and other examples are discussed in [Rob76].

9

10 Graph Theory, Network Flows and Algorithms

2.2 The traveling salesman problem

A famous network ﬂow problem is the traveling salesman problem (or TSP for short). This name comes

from one particular application, but the same mathematical problem arises in many contexts.

General formulation of TSP: Given a network with N + 1 vertices, having an edge between

every pair of vertices, what is the shortest path through the network which starts at a particular vertex,

visits every other node exactly once, and returns to the original vertex?

Studying the TSP gives a good introduction to the mathematical theory of combinatorics and

complexity of algorithms. Consider the TSP for a salesman who wants to start in Seattle, visit 4 other

cities, and end up back in Seattle. What is the optimal order in which to visit the other cities?

2.2.1 Brute force search

One way to solve this problem is to simply enumerate all the possible routes, determine the total length

of each, and choose the shortest. This is called the “brute force approach” since we aren’t using any

sophisticated ideas to do an intelligent search for the best route, we’re just doing the obvious thing. It

may still take some thought, however, to ﬁgure out how to systematically enumerate and check each

possibility without missing any routes.

2.2.2 Counting the possibilities

The ﬁrst question is: how many possible routes are there? This will give us some guidance when

designing an algorithm to check them all, and also will tell us something about how long it will take

to do this brute force search. Counting questions like this are one aspect of the mathematical ﬁeld of

combinatorics.

To do a count for this 4-city TSP, note that we must choose one city to visit ﬁrst and there are 4

possibilities. Once we’ve chosen that city, we must choose which city to visit second and there are only

3 possibilities since this must be a diﬀerent city. Then we have to choose the city to visit third and

there are only 2 unvisited cities left. Finally the fourth city is determined since there is only one city

left. This is illustrated as a decision tree in Figure 2.2, where the four cities are labeled A, B, C, D.

Starting from the city of origin O, we have 4 choices, then 3 choices, then 2 and 1. The total number

of distinct routes is

4! = 4 · 3 · 2 · 1 = 24.

Similarly, for the TSP with N cities (actually N +1 cities including Seattle), the number of possible

routes we must check is

N! = N · (N −1) · (N −2) · · · 3 · 2 · 1.

Note that what we have counted is the number of ways one can permute the N cities. In general for

any set of N distinct objects there are N! possible orderings, or permutations of the N objects.

With only 4 cities we could quickly compute the traveling salesman’s path length for all 24 permu-

tations by hand and determine the shortest route. But unfortunately N! grows very rapidly with N,

e.g.,

5! = 120

10! = 3, 628, 800

20! = 2, 432, 902, 008, 176, 640, 000

50! ≈ 3 ×10

64

.

For a TSP with 10 cities we would need to check about 3.6 million possibilities, still fairly easy on a

computer. To get an idea of how long this would take, consider a 500 MHz PC. This is a computer

which performs 500 million “instructions” or “cycles” every second. Here “instruction” means a low-

level machine operation and it typically takes many cycles to perform one arithmetic operation such

2.2 The traveling salesman problem 11

as an addition or multiplication. These are called ﬂoating point operations since scientiﬁc notation

is usually used to represent a large range of values on the computer. For computations of the sort

mathematicians, engineers and scientists do, a useful measure of a computer’s speed is the number

of ﬂoating point operations per second, called ﬂops. Machine speed is often measured in megaﬂops

(millions of ﬂoating point operations per second). A 500 MHz machine might do about 40 megaﬂops.

Supercomputer speeds are often measured in gigaﬂops (billions of ﬂops) or teraﬂops (trillions of ﬂops).

Returning to the traveling salesman problem, note that to compute the length of a single route

requires about N additions, adding up the length between each pair of cities on the route, as well as

more work to keep track of which route to examine next, which route is shortest, etc. Let’s not worry

too much about the exact cost, but suppose that we are working on a machine that can examine ten

million routes per second. Then solving the 10-city TSP would take about 1/3 of a second. Not bad.

But already with 20 cities it would be diﬃcult to ﬁnd the answer using this brute force approach.

We now have about 2.4 × 10

18

routes to examine, and at a rate of 10 million per second this will take

2.4 × 10

11

seconds, which is about 7,715 years. Even if we move up to a supercomputer that is a

thousand times faster, it would take almost 8 years. For a TSP with 50 cities there is no conceivable

computer that could ﬁnd the solution in our lifetimes, using the brute force approach.

Many simple problems suﬀer from this kind of combinatorial explosion of possible solutions as the

problem size increases, which makes it impossible to simply enumerate and check every possibility.

Nonetheless such problems are solved daily, using more sophisticated mathematical techniques.

2.2.3 Enumerating all possibilities

For an N-city TSP with N suﬃciently small, we could solve by enumeration. This requires some

algorithm for systematically checking every possibility. Figure 2.2 shows one way to think about enu-

merating all the possible routes for the 4-city TSP. In the rightmost column is an enumeration of all 24

routes in the lexicographic order, so called because this is the order that would be obtained by putting

the 24 strings representing diﬀerent routes into alphabetical order. Note that the 3! = 6 routes starting

with A come ﬁrst, then the 6 routes starting with B, and so on. Within each set, the routes are grouped

by which city comes second, ordered from “lowest” to “highest” in lexicographic ordering.

2.2.4 Branch and bound algorithms

Instead of evaluating the cost of every path appearing in the tree of Figure 2.2, one might be able to

rule out some paths early if their cost exceeds that of the best path discovered so far. Consider, for

example, the network in Figure 2.3.

One might ﬁrst try a greedy algorithm that takes the shortest edge to a new city at each step. This

leads to the path O-ADCB-O, which has a total cost of 15. After discovering this, any path in Figure 2.2

that uses edge BD (or, equivalently, DB) can be eliminated, since the cost of that edge alone is 16. By

continuing to check the other paths that start with O-A, one ﬁnds that the path O-ABCD-O has a cost

of only 14. Examining next the paths that begin with O-D, and discarding any partial paths whose

cost exceeds 14, one ﬁnds that O-DABC-O costs only 11, and this enables one to eliminate even more

possibilities. Paths beginning with O-C and O-B still must be checked, but as soon as the cost of a

partial path exceeds 11, that branch of the tree can be eliminated from further consideration. This type

of algorithm is sometimes called a branch and bound algorithm. The key idea is to obtain some sort of

bound on how good the shortest path within a given subtree could possibly be and, if a less costly path

has already been identiﬁed, to prune away those branches of the tree, thereby reducing the required

computation.

If we are unlucky then we might not be able to prune much of the tree and in the worst-case scenario

we have to examine the entire decision tree anyway, which is no improvement in running time. However

in practice a good branch and bound algorithm will typically be much faster than pure enumeration,

and problems are solved this way which would otherwise be intractable.

12 Graph Theory, Network Flows and Algorithms

ABCD

ABDC

ACBD

ACDB

BDCA

BDAC

BCDA

BCAD

BADC

BACD

ADBC

ADCB

ABC

ACB

ACD

ADB

ADC

ABD

AB

AC

AD

A

BAC

BAD

BCA

BCD

BDA

BDC

BA

BC

BD

CABD

CADB

CBAD

CBDA

CDAB

CDBA

DABC

DACB

DBAC

DBCA

DCAB

DCBA

CAB

CAD

CBA

CBD

CDA

CDB

DAB

DAC

DBA

DBC

DCA

DCB

CA

CB

DB

DC

DA

CD

C

B

O

D

Figure 2.2: Decision tree for enumerating all possible routes in a traveling salesman problem starting

from City O and visiting four other cities A, B, C, and D. The right column shows an enumeration of

all permutations of the 4 cities, in lexicographic order.

O

A B

C D

2

Seattle =

5

4

3

2

1

6

1

16

9

Figure 2.3: A traveling salesman network starting from Seattle and visiting four other cities A, B, C,

and D.

2.3 Complexity theory 13

Many other algorithms have been developed for the traveling salesman problem and for various

special cases of this problem. An introduction to many of these can be found in [LLAS90].

2.3 Complexity theory

For a particular mathematical problem, say the traveling salesman problem, there is often a parameter

N which measures the “size” of the problem, e.g., the number of vertices in the network. If we have

an algorithm for solving the problem, then we are interested in knowing how much work the algorithm

requires as a function of N. In particular we want to know how fast this grows as N increases. For

some algorithms the amount of work required (or time required on a computer) is linear in N, meaning

it is roughly cN where c is some constant. If we double the size of the problem then the amount of work

required roughly doubles. We say that such an algorithm requires O(N) work. The “big Oh” stands

for “order” — it takes order N work and we’re usually not too concerned about exactly what the size

of the constant c is.

For some algorithms the work required increases much faster with N. An O(N

2

) algorithm requires

work that is proportional to N

2

, which means that doubling N increases the work required by a factor

of 4. If an algorithm requires O(N

p

) work for some p then it is said to be a polynomial time algorithm.

The larger p is, the more rapidly the work grows with N. Here are some examples of polynomial time

algorithms:

• Finding the largest of N values. Checking the value of each item and keeping track of the largest

value seen takes O(N) operations.

• Solving a system of N linear equations in N unknowns using the Gaussian elimination algorithm

takes O(N

3

) operations.

• If the linear system is upper triangular, then it only takes O(N

2

) operations using “back substi-

tution”.

• Dijkstra’s algorithm for ﬁnding the shortest path between two vertices in a network (see below)

requires O(N

2

) work in general.

For most problems we will discuss with polynomial time algorithms, the value of p is typically fairly

small, like 1, 2, or 3 in the examples above. Such problems can generally be solved fairly easily on a

computer unless N is extremely large.

There are other problems which are essentially much harder, however, and which apparently require

an amount of work which grows exponentially with the size of N, like O(2

N

) or like O(N!). The traveling

salesman problem is in this category. While many algorithms have been developed which work very well

in practice, there is no algorithm known which is guaranteed to solve all TSP’s in less than exponential

time. (See Section 2.7 for more about this.)

2.4 Shortest path problems

A class of problems that arise in many applications are network ﬂow problems in which we wish to ﬁnd

the shortest path between two given vertices in a network. The problem of ﬁnding the shortest route

on a map is one obvious example, but many other problems can be put into this form even if they don’t

involve “distance” in the usual sense.

2.4.1 Word ladders

A word ladder is a sequence of valid English words, each of which is identical to the previous word in

all but one letter. It’s fun to try to construct word ladders to go from some word to a very diﬀerent

word with as few intermediate words as possible. For example, we can make FLOUR into BREAD as

follows:

14 Graph Theory, Network Flows and Algorithms

FLOUR

FLOOR

FLOOD

BLOOD

BROOD

BROAD

BREAD

Given two English words with the same number of letters, how could we determine the shortest word

ladder between them? (Assuming there is one, or determine that none exists.) This can be set up as a

network ﬂow problem.

Suppose we are considering N-letter words, then we could in principle construct a graph where the

vertices consist of all N-letter words in the English language. We draw an edge from one vertex to

another if those two words are simple mutations of each other in which only a single letter has been

changed. A word ladder then corresponds to a path through this graph from one word to the other.

In fact this graph (with nearly 6000 vertices) has been constructed for N = 5 by Don Knuth and

is available on the web (see the pointer from the class webpage). Word ladders and algorithms for

manipulating these graphs are discussed at length in [Knu93], which describes a general platform for

combinatorial computing. Finding the shortest word ladder is equivalent to the shortest-path problem

through a network if we assign every edge in the graph the weight 1.

Given a network and two speciﬁc vertices, how do we ﬁnd the shortest path between them? For

small networks this is easy to do by trial and error, particularly if the graph is drawn in such a way

that the physical distance between the vertices roughly corresponds to the weight of the edge, as is the

case for a map. But the weights may have nothing to do with physical distance and at any rate if there

are thousands or millions of vertices and edges it may be impossible to even draw the graph.

2.4.2 Enumerating all paths and a branch and bound algorithm

One approach to solving a shortest-path problem would be to enumerate all possible paths between the

start and end vertices, compute the length of each path, and choose the shortest. To enumerate all

paths, we could build a “decision tree” similar to what was illustrated in Figure 2.2 for the TSP. The

problem with this approach is that for a graph with N vertices, there may be very many possible paths

to choose between. How many? In the worst case, there would be an edge connecting each vertex with

each of the N −1 other vertices. In this case we could visit any or all of the other vertices on the way

between the start and ﬁnish, and the number of possible paths would be exponential in N, just as for

the TSP.

The set of all possible paths could be searched more eﬃciently using a branch and bound algorithm,

similar to that described for the TSP. However, for this problem there turn out to be much more eﬃcient

algorithms which are guaranteed to give the answer with relatively little work.

2.4.3 Dijkstra’s algorithm for shortest paths

Dijkstra’s algorithm is a systematic procedure for ﬁnding the shortest path between any two vertices.

Actually this algorithm solves the problem by embedding it into what appears to be a harder problem,

that of ﬁnding the shortest route to all other vertices from a given vertex. The set of all shortest routes

is built up in N steps, by ﬁnding the closest node in Step 1, the next closest node in Step 2, and so on.

By the end we know the best route to every other node and in particular to the destination node.

Each step requires examining the distance from the node that was “solved” in the previous step to

all unsolved nodes which can be reached from this node. This requires examining at most N −1 edges,

so the work required is O(N) in each of the N steps, and hence the total work is O(N

2

).

To implement it on a computer for a large network requires some suitable data structure for storing

the network with information about which vertices are connected to one another and the weight of each

edge. One way to represent a network or directed network on a computer is with a matrix. The rows

2.5 Minimum spanning trees 15

and columns correspond to the nodes of the network and the (i, j) entry of the matrix is the weight

on the edge from node i to node j (or, if there is no edge from i to j, that entry can be set to ∞). A

matrix representation of the network in Figure 2.3 is:

O A B C D

− − − − −

O | 0 2 5 4 3

A | 2 0 2 9 1

B | 5 2 0 1 16

C | 4 9 1 0 6

D | 3 1 16 6 0

We’ll go through some simple examples of Dijkstra’s algorithm in class. It is described in many

books, e.g., [HL95] and [Rob84]. A writing assignment will be to produce your own description of this

algorithm.

2.5 Minimum spanning trees

Another type of problem that arises in many applications is to connect a collection of vertices at

the lowest possible cost. Examples include highway construction and telephone line installation: the

vertices represent cities and the roads or phone lines correspond to edges constructed so that there is

some path between every pair of vertices. In graph theoretic terms, this is accomplished via a minimal

cost spanning tree.

Recall that a tree is a graph that is connected and has no circuits. If a group of vertices is connected

by edges that do contain a circuit, then it is not hard to see that one (or more) of the edges can be

removed without destroying the connectedness of the graph. This is illustrated in Figure 2.4. Since each

edge has a nonnegative cost associated with it, clearly the least expensive way to connect the vertices

will be via a graph with no circuits (i.e., a spanning tree).

Figure 2.4: Graphs with circuits and corresponding spanning trees

In a spanning tree, the number of edges is always one less than the number of vertices: e = N −1.

Each graph in Figure 2.4 has N = 5 vertices, and the two trees have e = 4 edges. There are two well-

known algorithms for ﬁnding a minimal cost spanning tree in a connected network: Kruskal’s algorithm

and Prim’s algorithm.

2.5.1 Kruskal’s algorithm

Kruskal’s algorithm begins by sorting the edges of the graph in order of increasing cost. For the network

shown in Figure 2.5 this order is: {AB}(8), {CD}(10), {CE}(12), {DE}(22), {BC}(53), {BE}(54),

{AD}(63), and {AE}(70). Each edge is then considered in turn and, if it does not form a circuit with

edges already in the good set, it is added to the good set. This is continued until the good set has N−1

edges. Since the good set has no circuits, it is a tree. If all edges have been examined and the good

set still does not have N − 1 edges, it means that the original graph must not have been connected,

and this fact can be reported to the user. The ﬁrst edges added to the tree from Figure 2.5 are {AB},

16 Graph Theory, Network Flows and Algorithms

{CD}, and {CE}. Edge {DE} is then examined, but it is found to form a circuit with edges already

in the tree, so this edge is skipped and edge {BC} is added to the tree. Now the tree has N − 1 = 4

edges, so the algorithm is ﬁnished. The edges of the minimum spanning tree are: {AB}, {CD}, {CE},

and {BC}.

A B

C

22

D

E

8

53

10

63

70 54

12

Figure 2.5: Minimum spanning tree

2.5.2 Prim’s algorithm

Prim’s algorithm for ﬁnding a minimal cost spanning tree avoids the initial sorting phase of Kruskal’s

algorithm. The two algorithms generate the same result, however, when the minimal cost spanning tree

is unique.

Prim’s algorithm begins by choosing any vertex from the graph and including it in the tree T. For

the graph of Figure 2.5, we could choose, for instance, vertex D. It then ﬁnds that edge in the graph

joining a vertex in T to one not in T which has the lowest cost: {CD} in our example. This step is

repeated until the tree has N −1 edges or until there are no more edges joining vertices of T to vertices

not in T. In the latter case, it reports that the graph is not connected. In our example problem, the

next edge to be added is {CE}. We then search for the lowest cost edge joining C, D, or E to one of

the remaining nodes, and add edge {BC} to the tree. Finally the lowest cost edge between either B,

C, D, or E and A is added, namely, {AB}.

Exercise: Try using Prim’s algorithm with diﬀerent starting vertices. Convince yourself that it generates

the same minimal cost spanning tree (or one with the same cost, if there is more than one minimal cost

spanning tree), no matter where you start. Do you see why Kruskal’s algorithm and Prim’s algorithm

generate the same result?

Both Kruskal’s algorithm and Prim’s algorithm are polynomial time algorithms. The best imple-

mentations of these algorithms run in O(E log

2

N) time, where N is the number of vertices and E the

number of edges in the original graph. The number of edges in a graph is at most

_

N

2

_

=

N(N−1)

2

,

so the work required is O(N

2

log

2

N).

2.5.3 Minimum spanning trees and the Euclidean TSP

Often the weights on the edges of a network represent actual distances between cities or other destination

points. In this case, the weight or cost of going from vertex A to vertex B is the same as that of going

from B to A, and the costs also satisfy the triangle inequality:

cost(AB) + cost(BC) ≥ cost(AC).

When the weights on a network for the traveling salesman problem represent actual distances between

points in a plane, the problem is called a Euclidean TSP.

2.6 Eulerian closed chains 17

While there is no known polynomial time algorithm for solving the Euclidean TSP, there is an easy

way to ﬁnd a route whose cost is within a factor of two of the optimal: Construct a minimum spanning

tree. Starting at the origin node, move along edges of the minimum spanning tree until you can go no

further. Then backtrack along these edges until you reach a point from which you can traverse a new

edge to visit a new node or until all of the nodes have been visited and you can return to the origin.

This is illustrated in Figure 2.6, where the minimum spanning tree is marked and the suggested path

is ODOABCBAO. The total cost of this path is twice the cost of the minimum spanning tree. Another

possible path is OABCBAODO, which also traverses each edge of the minimum spanning tree twice.

5

8

7

C

9

5

7

13

11

12

10

O

A B

D

Figure 2.6: Euclidean TSP network with minimum spanning tree. Path OD-O-ABC-BA-O costs twice

as much as the minimum spanning tree, and tour ODABCO is within a factor of two of optimal.

The path constructed in this way is not a legitimate tour for the TSP, since it visits some nodes more

than once. But because the weights satisfy the triangle inequality, any path between nodes that visits

other nodes along the way that have already been visited can be replaced by a direct route between

the given nodes, and the cost will be no greater. Thus one arrives at a tour whose cost is at most twice

that of the minimum spanning tree. For the example in Figure 2.6, this tour is ODABCO and has a

total cost of 40. Had we chosen the other path, OABCBAODO, it would have been replaced by the

tour OABCDO, which has a total cost of 34 (and turns out to be the optimal tour for this problem).

Since the minimal cost spanning tree is the lowest cost set of edges that connects all of the vertices,

any traveling salesman tour that visits all of the vertices must cost at least as much as the minimal cost

spanning tree: 25 for our example. This gives a lower bound on the cost of the optimal tour, while the

tour derived above, whose cost is within a factor of two of this value, provides an upper bound. This

information can be used within a branch and bound algorithm to hone in on the actual optimal tour.

2.6 Eulerian closed chains

Still another type of graph thoeretic problem that arises in many applications is that of ﬁnding an

Eulerian closed chain: A path that starts at a given node, traverses each edge of the graph exactly once,

and returns to the starting point. This problem might arise, for example, in scheduling snowplows or

street sweepers. If the edges of the graph represent streets, then the object is to plow or sweep each

street once but, if possible, not to backtrack over streets that have already been serviced.

This type of problem is also popular in many mathematical games. In fact, it is said that graph

theory was invented by Euler in the process of settling the famous K¨onigsberg bridge problem. The

situation is pictured in Figure 2.7. There are seven bridges and the question is whether one can start

at a particular point, cross each bridge exactly once, and return to the starting point. If you try it for

a while, you can probably convince yourself that the answer is “No”.

To see this in a more formal way, let the land masses in the problem be represented by nodes and the

bridges by edges in a multigraph (a graph that can have more than one edge between a pair of nodes).

This multigraph is shown in Figure 2.8. Note that if we start at a particular node in the multigraph and

travel along one edge to a diﬀerent node, then we must leave that node along a diﬀerent edge. If there

is only one edge through the node then this will not be possible. Moreover, if there are three or ﬁve or

18 Graph Theory, Network Flows and Algorithms

D

A

B

C

Figure 2.7: The bridges of K¨onigsberg

D

A

B

C

Figure 2.8: Multigraph representing the bridges of K¨onigsberg

any odd number of edges through that node, then in order to traverse all of these edges, we will have

to enter then leave the node until there is only one unused edge remaining. But at that point we are

stuck, because we must enter the node in order to use that edge, but then we cannot leave. The same

holds for the starting node. If there is only one edge through it, then we can leave but never return,

and if there are an odd number of edges through it, then we can leave and return a certain number of

times but eventually we will be at the starting node with one unused edge remaining to traverse.

The degree of a vertex is deﬁned as the number of edges through that vertex. Using the above sort

of reasoning Euler proved the following:

Theorem (Euler[1736]). A multigraph G has an Eulerian closed chain if and only if it is connected

up to isolated vertices and every vertex of G has even degree.

The condition that the graph be connected up to isolated vertices just means that there is no edge that

cannot be reached at all.

There is no Eulerian closed chain in Figure 2.8 since, for example, vertex D has degree 3. (In fact,

all of the vertices in this multigraph have odd degree.)

A good way to establish the suﬃciency of Euler’s theorem is by describing an algorithmfor computing

an Eulerian closed chain in an even connected multigraph. Consider, for example, the multigraph in

Figure 2.9. Start at any vertex and travel to other vertices along unused edges until you can go no

further. For example, we might start at A, go to B, C, D, back to B, and then to A. For each of

the vertices visited we use an even number of edges — one to enter and one to leave — each time

we visit the vertex, so that the number of unused edges through each vertex remains even. Moreover,

once we leave the initial vertex, it is the only one with an odd number of unused edges through it; for

every other vertex, if we enter we can also leave. Therefore our path can end only back at the original

vertex. At this point, we may not have traversed all of the edges of the multigraph, so go to any vertex

that has some unused edges through it: vertex C, for example, and repeat. Take one of the unused

edges to a new vertex and again continue until you can go no further. For example, we might go from

C to E to F to G to E to C. This path can be combined with our previous one to form the chain:

2.7 P, NP, and NP-complete problems 19

A G

B

C

D

E

F

Figure 2.9: Multigraph with an Eulerian closed chain

ABC − EFGEC − DBA. There are still two unused edges in the multigraph, so we repeat by going

from D to F and back to D. Combining this with the previous chain gives an Eulerian closed chain:

ABCEFGECD −FD−BA.

The amount of work required to ﬁnd an Eulerian closed chain is proportional to the number of edges

in the multigraph.

2.7 P, NP, and NP-complete problems

We have seen that some problems, such as the shortest path problem, the minimum spanning tree

problem, and the Eulerian closed chain problem, can be solved in a number of steps that is polynomial

in the size of the problem. Such problems are in the class P, the set of all problems that can be solved

in P-time.

Some problems, such as the traveling salesman problem, might lie in the class P but probably don’t.

If so, nobody has yet found the P-time algorithm. This problem does lie in a larger class called NP,

however. NP stands for nondeterministic polynomial, and the exact deﬁnition is a bit complicated. The

basic idea is that for a problem in this class it is at least possible to check one possible solution in

polynomial time, e.g., for the TSP it is possible to compute the length of any one possible path with

only O(N) operations. For such problems there is at least a chance one could ﬁnd a P-time algorithm

to ﬁnd the best path, if we could search eﬃciently enough.

A problem which is not in NP is more clearly hopeless. For example, suppose we consider a variant

of the TSP in which we specify a number B and we wish to ﬁnd all tours on a given set of N cities which

have total length B or less. Then it is possible to construct examples in which there are exponentially

many paths in the solution set, so even writing down the solution if we knew it would take exponentially

long and clearly there’s no hope of a P-time algorithm to solve the problem.

Note that any problem in P is also in NP, so P is a subset of NP. Is it a proper subset? In other

words, are there problems in NP which are not in P? Or does P = NP? This is one of the great

unsolved problems of theoretical computer science.

Some progress was made toward this problem when Karp [Kar72] showed that many hard problems

lie in a set of problems within NP called the NP-complete problems, which have the following property:

if any one of them can be solved in P-time, then every problem in NP can be solved in P-time and so

in fact P = NP. The TSP is NP-complete, and so are many other famous and easy-to-state problems,

such as the Hamiltonian circuit problem and the integer programming problem. For more discussion of

such problems, see for example [GJ79], [Sip97].

20 Graph Theory, Network Flows and Algorithms

2.8 Exercises

1. As a mathematical modeler, you will often need to learn new math on your own from books at

the math library or at Odegard, from journal articles and from webpages. Here is a scavenger

hunt to get you started on this process. Note, you must submit all information requested for each

item below to receive full credit so read each problem carefully.

(a) Find a webpage which outlines standard format for a bibliography. Bookmark this page for

future reference. Print out the page specifying how to write the bibliography entry for a

book and submit this with your homework. Circle the webpage address on your printout or

write it on the printout.

(b) Go to the math library and ﬁnd a textbook on “Graph Theory”. Read the ﬁrst chapter

of the book. (Choose a diﬀerent one if you don’t like the ﬁrst one you open.) Submit the

bibliography entry for this textbook in the format found in (1a). Choose one exercise from

the ﬁrst chapter. For the chosen exercise, copy the problem, solve it, and carefully write out

the solution; then submit this information.

(c) Find a journal article by Kruskal from 1956 that appeared in the Proceedings of the American

Mathematical Society. (The UW reference librarians can be very helpful looking up research

articles.) Read the ﬁrst 3 pages of the article. Submit the full bibliography reference for this

article and a paragraph description of the main problems addressed in this paper.

2. An algorithm will be presented verbally to you in class for ﬁnding the shortest path between two

given nodes in a network.

Write a description of the algorithm aimed at students at your level, or other people with similar

background, who have not taken a course in graph theory algorithms (or this class). Explain what

the goal of the algorithm is, why it is important (perhaps in the context of a real problem), and

how it works. Explain it in such a way that someone reading only your document, with no other

books or lectures on the subject, could ﬁgure out the algorithm and apply it to solve a problem.

You will probably want to use illustrations and a speciﬁc example in order to make it clear, but

try also to be fairly concise (2 pages at most).

Write these up neatly enough that others can read them. We will exchange papers and have a

chance to comment on another person’s document, so please make it easy on each other!

3. The genetic code in a DNA molecule consists of a string of symbols, each of which is one of four

letters T,C,A and G.

(a) Count the number of all DNA words having length at most 3 by listing them all.

(b) Give a formula for computing the number of DNA words of length n and justify your formula

using the Basic Counting Principles discussed in class.

4. Suppose that there are n phone booths in a region and we wish to visit each of them twice, but not

in two consecutive times. Discuss the computational complexity of a naive algorithm for ﬁnding

an order of visits that minimizes the total travel time.

5. A house has been framed and now several jobs remain to be done. Electrical wiring must be

installed, a roof must be added, drywall must be put up and it must be painted, and ﬂooring

must be installed. The electrical work must be done before the drywalling, and the drywall must

be installed before it can be painted. The roof must be on the house before either painting or

ﬂooring work can be done. The electrical work requires 4 units of time, rooﬁng requires 6 units

of time, and drywalling requires 3 units of time. Painting requires 5 units of time if ﬂooring has

already been installed and 3 units of time otherwise. Flooring requires 3 units of time if the walls

have already been painted and 2 units of time otherwise. Draw a network showing the possible

scheduling orders for the diﬀerent jobs.

2.8 Exercises 21

(a) Encode the constraints for this problem as the Hasse diagram for a partial order.

(b) Determine the fastest route from start to ﬁnish doing all jobs in series.

6. Below is a map of Oneway City, a poorly planned town with one-way streets as indicated. A

number of locations are also indicated on the map.

O

B

D

C

A

(a) Draw the complete directed graph with vertices corresponding to O, A, B, and C (ignore D

for now) and two edges between each pair of vertices, one in each direction. Label each with

the distance (number of blocks) one must travel to get from one location to the other. For

example, the distance from A to C is 1 block and from C to A is 11 blocks.

(b) Suppose a delivery van leaving from O must visit A, B, C, and D and return to O. Use the

branch and bound algorithm to ﬁnd the best route. Show the portion of the tree that you

constructed in ﬁnding this route, indicating the partial distances computed and indicating

what parts of the tree you were able to prune oﬀ. (It is pretty obvious what the best route is

in this case; I constructed it to make the branch and bound algorithm fairly simple to apply.

You might want to experiment with similar problems where the answer isn’t so obvious.)

7. The following table shows distances (in km) between some major European cities:

London Paris Madrid Frankfurt Rome Vienna

London 343 1261 634 1444 1237

Paris 343 1050 471 1117 1037

Madrid 1261 1050 1434 1377 1812

Frankfurt 634 471 1434 960 604

Rome 1444 1117 1377 960 765

Vienna 1237 1037 1812 604 765

(a) Suppose a traveling salesman wishes to start in London, visit Paris, Madrid, Frankfurt,

Rome, and Vienna, (each exactly once) and return to London. How many possible routes

are there for him to take?

22 Graph Theory, Network Flows and Algorithms

(b) Construct a minimum spanning tree for the network that this table represents. Explain which

algorithm you are using, and indicate the order in which you consider edges for inclusion in

your tree. Draw a graph showing the minimum spanning tree.

(c) Determine a lower and upper bound on the distance the salesman must travel and explain

how you found these. Find a route that is within a factor of 2 of being optimal.

8. An upper-triangular linear system has the form

_

¸

¸

_

a

11

a

12

a

13

a

14

0 a

22

a

23

a

24

0 0 a

33

a

34

0 0 0 a

44

_

¸

¸

_

_

¸

¸

_

x

1

x

2

x

3

x

4

_

¸

¸

_

=

_

¸

¸

_

b

1

b

2

b

3

b

4

_

¸

¸

_

for example, in the case of a 4 × 4 system. Such systems are easily solved by back substitution

(solve the last equation for x

4

, then the previous equation for x

3

, etc.)

How many multiplication operations does this require? (Count a division the same as a multipli-

cation, so computing x

4

= b

4

/a

44

takes 1 “multiply”, for example.)

Exactly how many “multiplies” does it take to solve a general n ×n upper-triangular system? In

particular, how many to solve an upper-triangular system of 1000 equations?

9. Let G be the graph below. How many subgraphs does G have? (Give an exact answer.)

s s

s s

a b

c d

10. Suppose you own a summer rental property and at the beginning of the summer season, you

receive a number of requests for the property. For example, assume you rent out the property

for 10 weeks in total and you have received the following 6 requests specifying which week each

renter would like to start and how many weeks they want to stay (duration)

person start week duration

A 1 1

B 1 4

C 3 2

D 5 3

E 7 2

F 9 1

Say the rate of renting the property is $100 per week normally, but if anyone rents the property

for at least 4 weeks, then they are charged $70 per week.

How should you decide which requests to accept in order to maximize your income for the summer?

Formulate this type of problem in general using a graphical model and solve the problem for the

example above to demonstrate your method.

2.8 Exercises 23

11. The job scheduling problem arises in many industries. A shop has dozens of “jobs” that need

to be done each day. For example a print shop may need to print diﬀerent sorts of posters and

pamphlets, each requiring diﬀerent types of papers and types and colors of ink. Or a machine

shop at a large aerospace company may need to manufacture many diﬀerent types of parts on the

same set of machines, each requiring a special set up. In each case there is a certain amount of

time required for each job, which might be independent of the order we do the jobs, but there

may also be substantial time required to switch from one job to another. The amount of time

spent switching might depend very much on what order they are done in. For example, switching

between two print jobs that require the same paper and ink colors might take much less time than

switching between two jobs that require cleaning out one color of ink and loading another.

(a) Formulate a general scheduling job in mathematical language and determine the optimal

order of jobs to minimize the total time spent on the entire task. For starters, suppose for

each pair of jobs i and j we know the time t

ij

required to switch from i to j. Then relate

this to a “traveling salesman problem” with t

ij

corresponding to the “distance” from Node

i to Node j. We could also specify an “origin” node corresponding to the clean state found

at the beginning of the day and required at the end of the day.

(b) For traveling salesman problems involving physical distances, one often has the “triangle

inequality”: If i, j, and k represent three cities, then the distance d

ij

from i to j naturally

satisﬁes

d

ij

≤ d

ik

+d

kj

.

The analogous inequality might not hold for t

ij

in job-scheduling problems. Try to come up

with a realistic scenario in which it would not hold.

12. Read “The Missouri Lottery Optimizes Its Scheduling and Routing to Improve Eﬃciency and

Balance” by Jang, Crowe, Raskin and Perkins. Interfaces, vol. 36, no. 4, July-Aug. 2006, pp

302-313. This paper is included in the appendix to the course notes and on the class web site.

(a) Write a one paragraph summary of the problem presented in “The Missouri Lottery . . . “

from Interfaces. Write a second paragraph summarizing the techniques and merits of the

mathematical model used. Be prepared to answer questions on the paper in class.

(b) Say you have been hired to evaluate the Missouri team who your company is considering

hiring. Write a critique (at most one page) of the paper. Include both positive and negative

aspects if any. Also, include your recommendation to your boss on hiring this team.

24 Graph Theory, Network Flows and Algorithms

Math 381 Notes — University of Washington —Winter 2011

Chapter 3

Stochastic Processes

So far we have studied deterministic processes, i.e. situations where the outcome is completely deter-

mined by our decisions. In real life however, there are many situations which are out of our control

or where things happen with a certain probability. These situations are called stochastic processes or

random processes or probabilistic processes. For example, the amount of time we have to wait for the

next bus to arrive is a stochastic process. Also, the number of salmon returning to spawn in the UW

Hatchery each year, the number of games won by the Husky volleyball team, and the daily Dow Jones

Industrial Average are other examples.

A thorough understanding of statistical modeling is beyond the scope of this course. Still, your

modeling capabilities can be enhanced by a brief introduction to some fundamental concepts from

probability and statistics including the mean, median and variance of a probability distribution, random

variables and the Central Limit Theorem.

3.1 Basic Statistics

In general terms, a statistic is a number that captures important information about a data set. If

we capture the right information, we can use the statistic to predict future outcomes. The primary

example of a statistic is the average or mean. For example, if we learn that bus number 43 comes

every 10 minutes on average during rush hour, then it helps us predict what time we will arrive at our

destination.

For the data set X = {x

i

: i = 1, 2, . . . , n}, the mean of X, denoted µ or E(X), is deﬁned as

µ =

n

i=1

x

i

n

.

To ﬁnd the median, order the values x

i

in ascending (or descending order). If n is odd, then the median

equals x

m

where m =

n+1

2

. Otherwise, the median equals the average of x

m

and x

m+1

where m =

n

2

.

To analyze the behavior of averages, consider the following data set. Let X be the number of minutes

we wait for the bus on the ﬁrst 51 days of class.

3.759 1.939 6.273 7.165 1.146 5.973

0.099 9.048 6.991 5.113 6.649

4.199 5.692 3.972 7.764 3.654

7.537 6.318 4.136 4.893 1.400

7.939 2.344 6.552 1.859 5.668

9.200 5.488 8.376 7.006 8.230

8.447 9.316 3.716 9.827 6.739

3.678 3.352 4.253 8.066 9.994

6.208 6.555 5.947 7.036 9.616

25

26 Stochastic Processes

7.313 3.919 5.657 4.850 0.589

To aid in calculating the median, the following sorted list is also included.

0.099 3.716 5.488 6.555 7.939 9.994

0.589 3.759 5.657 6.649 8.066

1.146 3.919 5.668 6.739 8.230

1.400 3.972 5.692 6.991 8.376

1.859 4.136 5.947 7.006 8.447

1.939 4.199 5.973 7.036 9.048

2.344 4.253 6.208 7.165 9.200

3.352 4.850 6.273 7.313 9.316

3.654 4.893 6.318 7.537 9.616

3.678 5.113 6.552 7.764 9.827

The mean and median of X equal 5.715 and 5.973, respectively. Consider changing the largest value

of X from to 9.994 to 100. Now, the mean equals 7.483. Such a change in the mean is expected given

the large change in a value of the corresponding data set. On the other hand, the median remains

unchanged. Our change to X altered only the value of the largest element. Therefore, the value of the

middle element, which equals the median, remained unchanged.

Suppose instead, we had changed the largest value of X from 9.994 to another random number

between 0 and 10, say 5.484. Now, the mean and median equal 5.626 and 5.947, respectively. In such a

case, the mean still changes, albeit by a small amount. Take a moment and reﬂect on why the median

changes in this case. Larger data sets (say, of size 1,000 or 10,000) exhibit similar behavior although

the change in the mean is seen in less signiﬁcant digits.

Another important statistic is the variance of the data set. The variance gives us information about

how close the data is centered around the mean µ. The variance of the data set X = {x

i

: i = 1, 2, . . . , n}

is deﬁned as

1

Var(X) =

1

n −1

n

i=1

(x

i

−µ)

2

. (3.1)

The variance for the original data set above is 6.284. This tells us we would probably never have to

wait for the bus for more than 12 minutes. On the other hand, replacing 9.994 by 100 we get a variance

of 180.

3.2 Probability Distributions and Random Variables

In order to model stochastic processes, we ﬁrst need a mathematical notation that can describe a wide

variety of situations. We will introduce the concepts of a probability distribution and a random variable

through examples.

Let Y denote any one of the six possible outcomes that can be observed when a fair die is rolled.

Since a fair die is equally likely to roll any of the values from 1 to 6, there is a 1/6

th

chance of rolling

a given number. We will write this as P(Y = y) =

1

6

for y = 1, 2, . . . , 6. Equivalently, we are deﬁning

a probability distribution function p(y) = 1/6 for y = 1, 2, . . . , 6. Such a probability distribution can be

written as

y 1 2 3 4 5 6

p(y) 1/6 1/6 1/6 1/6 1/6 1/6

In this example, Y is a random variable; it is a variable that we cannot solve for explicitly but

instead one that takes on a diﬀerent ﬁxed value each time we run a trial by rolling the die. Random

1

Some authors deﬁned the sample variance to be Var(X) =

1

n

P

n

i=1

(x

i

−µ)

2

instead.

3.2 Probability Distributions and Random Variables 27

1 2 3 4 5 6

0

0.05

0.1

0.15

0.2

0.25

0.3

y

p

(

y

)

1 2 3 4 5 6

0

0.05

0.1

0.15

0.2

0.25

0.3

y

p

(

y

)

(a) (b)

Figure 3.1: Probability histograms (a) for a uniform distribution and (b) a nonuniform distribution.

1 2 3 4 5 6

0

2

4

6

8

10

12

14

1 2 3 4 5 6

0

10

20

30

40

50

60

70

80

90

1 2 3 4 5 6

0

100

200

300

400

500

600

700

800

900

(a) (b) (c)

Figure 3.2: Histograms for tossing a fair die 50, 500 and 5000 times.

variables are very useful in probability because we can manipulate them just like ordinary variables in

equations. For example, one could write P(2 < Y ≤ 5) to mean the probability that a roll is among the

numbers {3, 4, 5}. A random variable Y is said to be discrete if it can assume only a ﬁnite or countably

inﬁnite number of distinct values. Every discrete random variable comes with an associated probability

distribution function, P(Y = y) = p(y).

An experiment where every value is equally likely to occur is said to have a uniform distribution

as its underlying probability distribution. Figure 3.1 (a) depicts a probability histogram for a uniform

distribution. In contrast, Figure 3.1 (b) is an example of a probability histogram for a nonuniform

distribution.

Another way to recognize a uniform distribution is by noting the probability distribution has the

form

p(y) =

1

total # outcomes

.

Therefore, this type of probability is closely related to the study of counting methods known as combi-

natorics.

Back to our example, say we roll the die 100 times and construct a histogram of the data. In this

histogram, the height of column i is determined by the number of times we roll i in this experiment.

Each time we roll, we observe a diﬀerent value of Y . Since Y has the uniform distribution, the histogram

we construct in this experiment should have 6 columns with similar heights. Yet, each roll of the die

is independent from the previous rolls. In other words, the die has no memory of its last roll. As any

gambler will attest, the uniform distribution does not imply that a streak of 4’s is impossible.

28 Stochastic Processes

To see this graphically, histograms of 50, 500, and 5000 such experiments are depicted in Figure

3.2 (a), (b) and (c), respectively. As the number of experiments increases, the histograms converge

toward the corresponding uniform distribution. Note that even 10,000 experiments will not produce

a histogram that exactly replicates the distribution in Figure 3.1 (a). Seen from the perspective of a

gambler, indeed, guessing the value of a single roll of a fair die is mere chance even with knowledge of

a previous outcome. Yet, from the perspective of a casino, the underlying distribution can place the

odds heavily in the “house’s” favor over many such experiments.

There are many additional probability distributions. The modeler’s challenge is to choose the one

that most closely approximates the data.

3.2.1 Expected Value

The expected value E of a discrete random variable Y with probability function p(y) is deﬁned to be

E(Y ) =

y

yp(y).

Recall that the probability function for the rolling of a fair die is p(y) = 1/6 for y = 1, 2, . . . , 6.

Therefore, in the example E(Y ) equals

6

i=1

i/6 = 3.5.

3.3 The Central Limit Theorem

Let Y

1

, Y

2

, . . . , Y

n

be independent and identically distributed random variables. Then, Y = (1/n)(Y

1

+

Y

2

+ . . . + Y

n

) is the random variable measuring the mean over n trials. What can be said about the

distribution of Y ? The Central Limit Theorem answers this question.

Theorem 3.3.1 (Central Limit Theorem) Let Y

1

, Y

2

, . . . , Y

n

be independent and identically dis-

tributed random variables. Then the sample average Y tends to a normal distribution as the sample

size tends to inﬁnity. The mean of this distribution will be the mean of the population and the variance

of this distribution will be the variance of the population divided by the sample size. (This all holds

providing the mean and variance of the population is ﬁnite and there is random sampling).

Take a moment and consider the power of this theorem. The theorem allows any population.

In fact, two populations may have widely diﬀering underlying distributions such as the uniform and

nonuniform probability distributions in Figure 3.1. Yet, if multiple, independent experiments occur for

either population, the resulting distribution of the sample average tends toward a normal distribution

as sample size increases.

Let Y and X denote any one of the six possible outcomes available from tossing a fair and weighted

die, respectively. The probability function p(y) = 1/6 for y = 1, 2, . . . , 6. The probability distribution

for X is:

x 1 2 3 4 5 6

p(x) 1/8 1/3 1/12 5/24 1/12 1/6

Figure 3.1 (a) and (b) are the probability histograms for Y and X, respectively. Suppose we throw

the fair die 50 times and record the sample average of the 50 outcomes. Repeating this 1000 times and

plotting the results produces the histogram in Figure 3.3 (a). Repeating such an experiment with the

weighted die produces the histogram in Figure 3.3 (b). Both histograms reﬂect a normal distribution.

Should this be expected? Indeed, it is by the powerful Central Limit Theorem. If this is not readily

apparent, take a moment and compare our experiments to Theorem 3.3.1.

Note that convergence is slower for highly skewed distributions. That is, larger sample sizes are

needed for a histogram of sample averages to reﬂect the underlying normal distribution of such an

experiment. We will see more about convergence of histograms for diﬀerent experiments modeled with

Monte Carlo methods in the next chapter.

3.4 Buﬀon’s Needle 29

2.8 3 3.2 3.4 3.6 3.8 4 4.2 4.4

0

50

100

150

200

250

300

2.4 2.6 2.8 3 3.2 3.4 3.6 3.8 4 4.2

0

50

100

150

200

250

(a) (b)

Figure 3.3: Histograms taken from sample averages (a) for a uniform distribution and (b) a nonuniform

distribution.

E

1 unit

T

c

1 unit

&

&

&

&

&

&

T

c

r

D

θ

1

2

sin θ

&

&

&

&

&

&

(a) (b)

Figure 3.4: Buﬀon’s Needle experiment; (a) depicts the experiment where a needle of unit length is

randomly dropped between two lines of unit length apart. In (b), D denotes the distance between the

needle’s midpoint and the closest line; θ is the angle of the needle to the horizontal.

3.4 Buﬀon’s Needle

In 1777, Comte de Buﬀon described an experiment that relates the real number π to a random event.

Speciﬁcally, a straight needle of unit length is dropped at random onto a plane ruled with straight lines

of unit length apart as depicted in Figure 3.4 (a). What is the probability p that the needle will intersect

one of the lines?

Before the time of digital computers, laptops and calculators, the term “computer” referred to a job

title. Parallel computing meant calculating with a large number of mathematicians. Comte de Buﬀon

performed the experiment repeatedly to determine p. Such hand computation limited the feasibility of

mathematical experiments before the advent of the digital computer.

Comte de Buﬀon also derived the value of p mathematically. Let D denote the distance between

the needle’s midpoint and the closest line, and let θ be the angle of the needle to the horizontal. (Refer

to Figure 3.4 (b).) Here both D and θ are random variables. The needle crosses a line if

D ≤

1

2

sin θ.

Consider the plot of

1

2

sin θ as seen in Figure 3.5. The probability of the needle intersecting a line

is the ratio of the area of the shaded region to the area of the rectangle. This is the continuous version

30 Stochastic Processes

0 3.14

0

0.5

1/2sinθ

θ

Figure 3.5: Plot of

1

2

sinθ aids in analysis of Buﬀon’s Needle.

of the uniform distribution. The area of the rectangle is clearly π/2. The area of the shaded region is

_

π

0

1

2

sin θ dθ = 1,

which implies that the probability of a hit is

p =

1

_

π

2

_ =

2

π

.

Hence, an estimate of π is

2

_

total number of drops

total number of hits

_

.

Just imagine, the next time that you drop a pencil, you have completed the ﬁrst step in a Monte

Carlo experiment in estimating π. Yet, it takes heroic patience before an accurate estimate emerges.

We will see in class that Buﬀon’s Needle experiment indeed produces an estimate of π but is slow to

converge.

3.5 Exercises 31

3.5 Exercises

Recommended Reading Assignment: Pages 710-729 from ”Operations Research: Applications

and Algorithms” by Wayne Winston. This book is also on reserve in the Engineering Library for the

class.

1. Consider rolling two fair dice, one green and one red. Let R be the random variable which

determines the face value of the red die, and G be the random variable which determines the face

value of the green die.

(a) What is the population (possible values) for R, G, and R +G?

(b) What is the probability distribution for R, G, and R +G?

2. Consider the random variables R and G as in Problem 1. What is E(R), E(G), and E(R +G)?

3. In a horse race, the odds in favor of the ﬁrst horse winning in an 8-horse race are 2 to 5. The

odds against the second horse winning are 7 to 3. What is the probability that one of these two

horses will win?

4. A card is drawn at random from an ordinary deck of 52 cards.

(a) What is the probability that it is a black ace or a red queen?

(b) What is the probability that it is a face card or a black card?

(c) What is the probability that it is neither a heart or a queen?

5. In a certain country, the probability is

49

50

that a randomly selected ﬁghter plane returns from a

mission without mishap. Mia argues that this means there is one mission with a mishap in every

50 consecutive ﬂights. She concludes that if ﬁghter pilot Woody returns safely from 49 consecutive

missions, he should return home before the ﬁftieth mission. Is Mia right? Explain why or why

not.

6. In a lottery every week, 2,000,000 tickets are sold for $1 apiece. Say 4000 of these tickets pay oﬀ

$30 each, 500 pay oﬀ $800 each, and one ticket pays oﬀ $1,200,000, and no ticket pays oﬀ more

than one prize.

(a) What is the expected value of the winning amount for a player with a single ticket?

(b) What is the expected value of the winning amount for a player with ﬁve tickets?

7. The ages of subscribers to a certain mailing list are normally distributed with mean 35.5 years

and standard deviation 4.8. What is the probability that the ages of a random subscriber is

(a) more than 35.5 years?

(b) between 30 and 40 years?

8. To prepare an aircraft for departure, it takes a random amount of time between 20 and 27 minutes.

If the ﬂight is scheduled to depart at 9:00am and preparation begins at 8:37am, ﬁnd the probability

that the plane is prepared for departure on schedule. Explain any assumptions you need to make.

9. Prove that V AR(X +Y ) = V AR(X) +V AR(Y ) if X and Y are independent random variables.

32 Stochastic Processes

Math 381 Notes — University of Washington —Winter 2011

Chapter 4

Monte Carlo Methods

Monte Carlo methods model physical phenomenon through the repeated use of random numbers. It

can seem surprising that meaningful insights can be made from such results. Yet, Monte Carlo methods

are used eﬀectively in a wide range of problems including physics, mechanics, and economics.

Buﬀon’s Needle is a classic example of Monte Carlo methods and the ability of random numbers to

generate meaningful results. Note that no essential link exists between such methods and a computer.

Yet, computers enormously enhance the methods’ eﬀectiveness. Monte Carlo methods are essentially

carefully designed games of chance that enable research of interesting phenomenon.

While Monte Carlo methods have been used for centuries, important advancements in their use as

a research tool occurred during World War II with mathematicians including Neumann and Ulam and

the beginnings of the modern computer. Today, Monte Carlo methods are used in a wide variety of

ﬁelds to model the physical world.

With our knowledge of statistics and ideas regarding the use of random numbers in simulation, we

are ready to examine problems that we can model and study with Monte Carlo methods.

4.1 A ﬁsh tank modeling problem

The XYZ Tropical Fish Store has come to us for advice on the following problem. They carry 150 gallon

ﬁsh tanks which are quite large and take up a lot of shelf space in the store. They observe that on

average they only sell about 1 each week, so they don’t want to have too many in stock at any one time.

On the other hand, these are a high-proﬁt item and if they don’t have one in stock when a customer

comes in, the customer will go elsewhere. Moreover, it takes 5 days for a new one to be delivered by

the distributor after they place an order.

They are trying to decide on the best strategy for ordering the 150 gallon tanks. They are considering

these two in particular:

1. Strategy 1: Order a new tank each time one is sold. Then they will never have more than one

in stock at a time, so they won’t waste much shelf space, but they may often be caught without

any.

2. Strategy 2: Order a new tank once a week, arriving 5 days later. Since they sell one a week on

average it may be good to have one arrive each week on average. There’s probably less chance of

being caught without one in stock, but if there aren’t any customers for several weeks in a row

they will accumulate.

The question is: which strategy is better, or is there another that is better yet?

Think about some of the questions that might naturally arise if you were going to try to determine

an answer. What questions should you ask the store owners before you even try to model it? What

33

34 Monte Carlo Methods

other information would you need to know before you could hope to evaluate the strategies? Thinking

about this should help to suggest what is important in a mathematical model.

Here are a few examples:

• How much is the proﬁt on one of these ﬁsh tanks, relative to other items in the store that would

have to be eliminated in order to stock more ﬁsh tanks? (This is one thing we would need to

know in order to make any sort of quantitative comparison of diﬀerent strategies.)

• What is the cost of having additional tanks in stock? What is the cost of losing a customer by

not having a tank available? (Besides the lost sale itself, there might be the cost of losing that

customers business in the future, if they decide to buy all their ﬁsh and supplies from the same

place they buy the tank.)

• What do you mean by “we sell one a week on average”? Is there one person in town who comes in

every Wednesday at noon and always buys one tank? Or are there 52 people in town who always

give a tank as a Christmas present, so there are lots of sales in the month of December and none

in other months? Either scenario would lead to “one a week on average” over the year, but would

lead to very diﬀerent ordering strategies.

This is a classic example of an inventory problem. Variants of this problem are constantly being

solved by businesses, which often have to decide on an ordering strategy for many diﬀerent products

simultaneously, even thousands for a grocery store, for example.

Here we consider the two simple strategies mentioned above, based on some simpliﬁed assumptions

of customer behavior. In particular, we’ll assume the sales are uniform over the year. We’ll also assume

that “an average of once a week” means that each day there is a 1/7 chance that a customer will come.

This isn’t completely realistic: surely there’s some chance that two customers will appear in the same

day. Later we’ll consider this further. Note in making these assumptions that we follow a primary rule

of model building:

It is best to start with a simple model and reﬁne it incrementally. Trying to start with a

completely realistic model is usually hopeless.

4.2 Monte Carlo simulation

The approach we will consider at this stage is a simulation of how the competing strategies might work.

This means we write a computer program which simulates several weeks of how the store’s inventory

ﬂuctuates assuming one strategy or the other, and see which pays oﬀ better. Of course we don’t know

exactly when customers arrive, but we can model the arrival of customers at random times by generating

random numbers on the computer and using these values to indicate whether a customer arrives on

each day or not. For example, in matlab, the command

rand(1)

generates 1 random number which is uniformly distributed between 0 and 1. If this number is less than

1/7 we can say that a customer comes, if it is greater than 1/7 we can say that a customer does not

come.

On the next page is a simple matlab program which simulates the strategy of ordering a new ﬁsh

tank only when the one in stock is sold. In this case there is always at most one tank in stock, which

simpliﬁes the program. The program prints out information about what happens each day and keeps

track of statistics over the course of the simulation.

The parameters have been changed here to make it possible to see more action in a short simulation.

The probability of a customer each day is 1/3 and it only takes 2 days for a new tank to be delivered

after ordering. The program has been simpliﬁed by leaving out some commands to keep track of totals

and display results.

A more complete program which also allows comparison of diﬀerent strategies can be found in

Section 4.5, and downloaded from the class webpage.

4.2 Monte Carlo simulation 35

a = 1/3; % probability of a customer each day

days_for_delivery = 2; % days from order to delivery of new tanks

stock = 1; % number of tanks in stock

deliv = -1; % number of days until delivery of tank on order

% -1 means none on order

total_cust = 0;

total_sold = 0;

total_lost = 0;

for week = 1:3

for weekday = 1:7

sold = 0;

lost = 0;

if deliv == 0

stock = stock+1; % a new tank is delivered

end

if deliv >= 0

deliv = deliv-1; % days till next delivery

end

random_num = rand(1); % generate random number

if random_num < a % use this number to tell if a customer arrived

customers = 1;

else

customers = 0;

end

if customers==1

if stock>0 % we have a tank to sell the customer

sold = sold+1;

stock = stock-1;

if deliv < 0

deliv = days_for_delivery; % we sold a tank and now order

% another one.

end

else

lost = lost+1; % we didn’t have a tank and lost a customer

end

end % if customers==1

% keep track of total statistics:

total_cust = total_cust + customers;

total_sold = total_sold + sold;

total_lost = total_lost + lost;

disp([week weekday stock customers sold lost]); % display results for

% this day

end % loop on weekday

end % loop on week

% output total statistics now....

Figure 4.1: First pass at a matlab program for Monte Carlo simulation of the ﬁsh tank modeling

problem.

36 Monte Carlo Methods

Here are the results of one sample simulation:

week day stock cust sold lost

1 1 1 0 0 0

1 2 1 0 0 0

1 3 1 0 0 0

1 4 0 1 1 0

1 5 0 0 0 0

1 6 0 0 0 0

1 7 0 1 1 0

2 1 0 1 0 1

2 2 0 0 0 0

2 3 0 1 1 0

2 4 0 0 0 0

2 5 0 1 0 1

2 6 1 0 0 0

2 7 1 0 0 0

3 1 0 1 1 0

3 2 0 1 0 1

3 3 0 0 0 0

3 4 1 0 0 0

3 5 1 0 0 0

3 6 1 0 0 0

3 7 1 0 0 0

total over entire simulation:

customers sold lost

7 4 3

It’s hard to come to any conclusion from this one short simulation. The fact that we could only

serve 4 of the 7 customers was due to the particular pattern in which they arrived. If they had appeared

diﬀerently, the numbers could have been very diﬀerent.

To get a better feel for how this strategy works in the long run, we can run the simulation for many

more weeks. For example we might run it for 104 weeks (2 years) rather than only 3 weeks. We might

also run the simulation several times over to see if we get similar results even though the exact pattern

of customer arrivals will be diﬀerent in each case. Here are the results of 10 diﬀerent simulations, each

for 104 weeks:

customers sold lost fraction served

241 147 94 0.6100

223 135 88 0.6054

255 149 106 0.5843

266 158 108 0.5940

237 144 93 0.6076

235 144 91 0.6128

255 148 107 0.5804

250 147 103 0.5880

227 138 89 0.6079

249 152 97 0.6104

Notice that we expect roughly 104 × 7/3 = 243 customers to appear over this time period (since

we’re using the rate of 1 every 3 days). In all of the above results we have roughly this number. Much

4.2 Monte Carlo simulation 37

more interesting is the fact that in each simulation we end up being able to serve about 60% of these

customers. This might be one number we want to use in comparing diﬀerent strategies.

Markov chains. By doing the above simulations we came to the conclusion that with this strategy

and particular set of parameters, we could serve about 60% of the customers who come looking for a

150 gallon tank. It turns out that we can predict this number using mathematics without doing any

simulations, if we use the theory of Markov chains. We will study this later in the quarter.

4.2.1 Comparison of strategies

Section 4.5 contains a more complex program for this Monte Carlo simulation which allows comparison

of diﬀerent strategies. The variable order_when_out is a ﬂag which determines whether to order a new

tank whenever we run out. The variable fixed_delivery determines whether there is a ﬁxed schedule

of deliveries, once every so many days. Strategy 1 corresponds to

order_when_out = 1;

fixed_delivery = 0; % no standing order

while Strategy 2 uses

order_when_out = 0;

fixed_delivery = 7; % arrivals every 7th day

Percentage of customers served is one important number to compare. More important to the busi-

ness, however, might be the expected proﬁt over time with diﬀerent strategies. This depends not only

on how many customers we serve or lose, but also on the cost of any overstock. For Strategy 1 tested

above, there is never more than 1 tank in stock and so no overstock. Strategy 2 can easily lead to

overstock.

Here are the results of a few 104-week simulations with each strategy. We also compute the “proﬁt”

for each simulation and the average proﬁt over 100 simulations with each strategy. This proﬁt is based

on the assumed proﬁt per sale, cost of losing a customer, and cost per night of overstock as illustrated

in the program.

Strategy 1:

customers sold lost fraction_served overstock end_stock profit

106 63 43 0.594 0 1 830.00

100 61 39 0.610 0 1 830.00

104 58 46 0.558 0 1 700.00

104 60 44 0.577 0 0 760.00

94 55 39 0.585 0 1 710.00

90 59 31 0.656 0 0 870.00

etc...

108 58 50 0.537 0 1 660.00

101 59 42 0.584 0 0 760.00

94 54 40 0.574 0 0 680.00

99 61 38 0.616 0 1 840.00

98 55 43 0.561 0 1 670.00

average_profit = 787.9000

38 Monte Carlo Methods

Strategy 2:

customers sold lost fraction_served overstock end_stock profit

93 92 1 0.989 4795 13 -567.50

99 98 1 0.990 1802 7 1049.00

98 95 3 0.969 2716 10 512.00

96 96 0 1.000 3690 9 75.00

99 98 1 0.990 3098 7 401.00

87 87 0 1.000 6811 18 -1665.50

106 101 5 0.953 3450 4 245.00

etc...

94 94 0 1.000 2867 11 446.50

126 103 23 0.817 493 2 1583.50

106 101 5 0.953 2023 4 958.50

131 104 27 0.794 709 1 1455.50

109 104 5 0.954 637 1 1711.50

91 88 3 0.967 6135 17 -1337.50

91 90 1 0.989 4165 15 -292.50

101 101 0 1.000 3899 4 70.50

101 100 1 0.990 3755 5 112.50

average_profit = 342.7050

Note some interesting features of these simulations:

• The percentage of customers served is generally larger with Strategy 2, usually above 90%, while

with Strategy 1 the percentage is around 60%. However, the average proﬁt is larger with Strategy

1. This is because Strategy 2 generally leads to an overstock. Customers are rarely lost but there’s

a large cost associated with this overstock which reduces proﬁt.

• The results for Strategy 1 have much smaller variance than for Strategy 2. The proﬁt for each

individual simulation is typically not too far from the average. For Strategy 2 the values are all

over the place. Some simulations give much higher proﬁts than Strategy 1 while many simulations

resulted in negative proﬁts (a net loss). This is seen more strikingly by plotting histograms of the

proﬁts over each set of 100 simulations, as shown in Figure 4.2. If the business is lucky, proﬁts

may be higher with Strategy 2, but there is a great risk of being unlucky.

• Note one problem with Strategy 2 as implemented so far: There is no mechanism to stop automatic

delivery if the stock goes too high. Since customers arrive at the same average rate as new tanks,

a high stock will tend to stay high. Figure 4.3 shows the stock of tanks as a function of day in

the course of one particular unlucky simulation. By the end of two years the stock has grown to

20 tanks.

4.3 A hybrid strategy

A better ordering strategy might be to have a standing order for a tank to arrive every N days, where

N is larger than 7, and then to also order a new tank whenever we run out. In the program N is

denoted by fixed_delivery. Figure 4.4 shows histograms of the performance of this strategy for two

diﬀerent values of N, N = 9 and N = 14.

4.3 A hybrid strategy 39

−1500 −1000 −500 0 500 1000 1500 2000

0

5

10

15

20

25

30

Strategy 1

−3000 −2500 −2000 −1500 −1000 −500 0 500 1000 1500 2000

0

5

10

15

20

25

30

Strategy 2

Figure 4.2: Histograms of proﬁt observed over 100 simulations with two diﬀerent strategies.

0 100 200 300 400 500 600 700 800

0

5

10

15

20

25

Figure 4.3: History of number of tanks in stock each day for one particular simulation with Strategy 2.

−1500 −1000 −500 0 500 1000 1500 2000

0

5

10

15

20

25

30

Hybrid Strategy with $N=9$

−1500 −1000 −500 0 500 1000 1500 2000

0

5

10

15

20

25

30

Hybrid Strategy with $N=14$

Figure 4.4: Histograms of proﬁt observed over 100 simulations with hybrid strategies.

40 Monte Carlo Methods

4.4 Statistical analysis

In comparing the results of simulations with diﬀerent strategies, we need to be able to reach sensible

conclusions from a large quantity of “experimental data”. Typically we would also like to be able to

quantify how conﬁdent we are in the conclusions we reach, how sensitive these conclusions are to the

parameters we chose in the model, etc. This is the realm of statistics, and sensible analysis of Monte

Carlo simulations generally requires a good knowledge of this ﬁeld.

One statistic we have already looked at is the average proﬁt observed for each strategy. If one

strategy gives a larger average proﬁt over many simulations than another strategy, it might be the

better strategy. On the other hand we have also observed that some strategies give results with much

greater variance between simulations than other strategies. In practice the owners of the store don’t

really care about the average behavior over hundreds of simulations, they care about what will happen

in the one real experience they will have with whatever strategy they adopt. The point of doing many

simulations is to get some experience with what range of diﬀerent outcomes they might reasonably

expect to observe in practice. (Assuming of course that the model is good enough to match reality in

some sense.)

Looking at the histograms of Figure 4.2, we see that with Strategy 1 we might be able to predict

fairly conﬁdently that the proﬁt which will actually be observed (assuming the model is good enough

...) will lie somewhere between 600 and 1000. With Strategy 2, we can only predict that it will probably

lie somewhere between -2000 and 2000. Even if the average observed with Strategy 2 were higher than

the average proﬁt observed with Strategy 1, the strategy with the smaller variance might be preferable

in practice because there would be less risk involved.

If we do N simulations and obtain N diﬀerent values y

1

, y

2

, . . . , y

N

for the observed proﬁt, then

the average (sample mean) is deﬁned as

¯ y =

1

N

N

i=1

y

i

.

The sample variance is

s

2

=

1

N −1

N

i=1

(y

i

− ¯ y)

2

,

and the sample standard deviation is

s =

√

s

2

.

We can view the N values observed as N observations of a random variable chosen from some dis-

tribution. Typically we don’t know what this distribution is, and the point of doing a Monte Carlo

simulation is to gain some information about the distribution. With a large enough sample, the sample

mean should approximate the mean of the true distribution and the sample variance should approximate

its variance.

How big a sample is “large enough”? That depends on the distribution. Comparing the histograms

of Figure 4.2, for example, we see that 100 trials gives more information about the distribution of

results in Strategy 1 than with Strategy 2. To get a better idea of the underlying distributions, we

might repeat these experiments with many more simulations. Figure 4.5 shows histograms of the proﬁts

observed when 5000 simulations are performed with each strategy. Here we get a much better sense of

the distributions.

For Strategy 1 we merely conﬁrm what we already guessed from the 100 simulations of Figure 4.2.

The distribution has a fairly “Gaussian” shape with almost all results lying between 600 and 1000.

In fact it can be shown that the distribution of observed proﬁts should be approximately normally

distributed (Gaussian). This follows from the Central Limit Theorem as we will be better able to

understand after studying the Markov Chain interpretation of this model.

The sample mean from this particular set of runs is about 783 and the standard deviation is about

83. For a normally distributed random variable, it can be shown that the value observed will be farther

than twice the standard deviation away from the mean only about 5% of the time. So if the proﬁt with

4.5 Fish tank simulation code 41

300 400 500 600 700 800 900 1000 1100 1200 1300

0

50

100

150

200

250

300

350

400

450

500

Strategy 1

−6000 −5000 −4000 −3000 −2000 −1000 0 1000 2000 3000

0

50

100

150

200

250

300

350

Strategy 2

Figure 4.5: Histograms of proﬁt observed over 5000 simulations with two diﬀerent strategies.

Strategy 1 is essentially normally distributed then we expect that the proﬁt observed should lie between

783 −2 · 83 = 617 and 783 + 2 · 83 = 949 about 95% of the time.

For Strategy 2 we see in Figure 4.5 that the distribution of proﬁts is certainly not a normal distri-

bution. Note that the proﬁt would not be expected to exceed about 104 · 20 = 2080 even with a perfect

strategy (why not?), which partly accounts for the sharp cut-oﬀ at the higher end. The fact that a very

large overstock can accumulate accounts for the wide spread to the left. The average proﬁt observed is

439 and the standard deviation is 1020. In 26% of the simulations the observed proﬁt was negative.

There are many other statistical questions that one might ask about the data obtained from these

simulations, and various ways that the model might be improved. We will discuss some of these in class.

4.5 Fish tank simulation code

% set parameters:

nsim = 100; % number of different simulations to do

nweeks = 104; % number of weeks in each simulation

ndays = 7*nweeks; % number of days in each simulation

a = 1/7; % probability of a customer each day

days_for_delivery = 5; % days from order to delivery of new tanks

order_when_out = 1; % = 1 ==> order a new tank when stock==0

% = 0 ==> don’t order when out of tanks

fixed_delivery = 12; % >0 ==> standing order for a new tank

% every so many days

% profits and losses:

saleprofit = 20; % profit from selling one tank

lostloss = 10; % loss from losing a customer

overstockloss = .50; % cost of each tank overstock per night

% initialize:

profit = zeros(nsim,1);

% print column headings:

42 Monte Carlo Methods

disp(’customers sold lost fraction_served overstock end_stock profit’)

for sim=1:nsim

% initialize:

random_nums = rand(ndays,1); % array of random numbers to use each day

total_cust = 0;

total_sold = 0;

total_lost = 0;

stock = 1; % number of tanks in stock

deliv = -1; % number of days until delivery of tank on order

% -1 means none on order

overstock = 0; % increment every night by number of excess tanks in stock

% main loop for a single simulation:

day = 0;

for week = 1:nweeks

for weekday = 1:7

day = day+1; % day in the simulation

sold = 0;

lost = 0;

if deliv == 0

stock = stock+1; % a new tank is delivered

% at the beginning of the day

end

if deliv >= 0

deliv = deliv-1; % days till next delivery

end

if (mod(7*week + day, fixed_delivery) == 0)

% A new tank is delivered every so many days regardless of stock

stock = stock+1;

end

% use random number to decide how many customers arrived

% Here assume 0 or 1:

if random_nums(day) < a

customers = 1;

else

customers = 0;

end

if customers==1

if stock>0 % we have a tank to sell the customer

sold = sold+1;

stock = stock-1;

else

lost = lost+1; % we didn’t have a tank and lost a customer

end

end

if (order_when_out & stock==0 & deliv < 0)

4.6 Poisson processes 43

% none in stock and none on order

deliv = days_for_delivery; % order another

end

if stock > 1

overstock = overstock + (stock - 1);

end

% keep track of total statistics:

total_cust = total_cust + customers;

total_sold = total_sold + sold;

total_lost = total_lost + lost;

stock_record(day) = stock; % keep track of stock on each day

end % loop on day

end % loop on week

fraction_served = total_sold / total_cust;

profit(sim) = total_sold*saleprofit ...

- total_lost*lostloss - overstock*overstockloss;

% output total statistics:

disp(sprintf(’%6.0f %6.0f %8.0f %12.3f %11.0f %8.0f %12.2f’, ...

total_cust, total_sold, total_lost, fraction_served, overstock, stock, ...

profit(sim)))

end % loop on sim

% compute and print average profit over all simulations:

average_profit = sum(profit) / nsim

disp(’ ’)

disp([’ average profit = ’ sprintf(’%10.2f’, average_profit)])

% standard deviation:

variance = sum((profit - average_profit).^2) / (nsim-1);

standard_deviation = sqrt(variance);

disp([’ standard deviation = ’ sprintf(’%10.2f’, standard_deviation)])

disp(’ ’)

% plot histogram of profits over all simulations:

hist(profit,-1500:50:2000)

axis([-1500 2000 0 30])

4.6 Poisson processes

In the Monte Carlo simulation of the ﬁsh tank inventory problem, we assumed that each day there was

a probability a of a customer coming in. The matlab code contained the following lines:

random_num = rand(1); % generate random number

if random_num < a % use this number to tell if a customer arrived

customers = 1;

44 Monte Carlo Methods

else

customers = 0;

end

This assumes that each day there is either 0 or 1 customers. This is not very realistic, since some

days we would expect 2 or even more to appear if they come at random times. Moreover, we might

want to simulate a situation in which there are typically many customers each day, instead of only one

customer every few days.

More generally let a represent the average number of customers per day. If a is very small then we

can interpret a as the probability that a customer comes on any given day, and ignore the possibility of

more than one customer. This isn’t very realistic if a is somewhat larger. If a > 1 then it is impossible

to interpret a as a probability, and clearly unrealistic to assume there will be either 0 or 1 customer.

It would be better if we could predict the probabilities p

k

that exactly k customers will appear in a

given day and then use something like:

random_num = rand(1); % generate random number

if random_num < p0

customers = 0;

else if random_num < p0+p1

customers = 1;

else if random_num < p0+p1+p2

customers = 2;

else

customers = 3; % assume negligible chance of more

end

end

end

Here we have assumed that a is small enough that the chance of more than 3 customers is negligible,

i.e., p

0

+p

1

+p

2

is very close to 1.

One way to estimate these probabilities would be to do a Monte Carlo simulation. We could split a

day up into N much smaller pieces (hours, minutes, seconds or even smaller) and assume that in each

small piece there is really no chance that more than one customer will appear, and that the probability

of one customer appearing is p = a/N. We could simulate many days and see what fraction of the days

there are k customers for each k. (Note that if N is large enough then p = a/N will be very small even

if a > 1, so it is valid to view this as a probability.)

Here are the results of a 10 trials with N = 20 and a = 1. (Generated using poisson.m from

the webpage.) Each string has 20 bits, with 0 representing “no customer during this interval” and 1

representing “a customer during this interval”. The ﬁrst “day” two customers arrived, the second day

only one, etc.

4.6 Poisson processes 45

00000010001000000000

00000100000000000000

00010100000000000000

00000000000000000000

00000000000000000100

00000000000000000000

00000000000000000000

00000000100000000010

00110000000000000000

10100000000010000000

In this short simulation, 30% of the days no customer arrived, 20% had 1 customer, 40% had 2 customers,

and there was 1 day (10%) when 3 customers arrived. If we ran another 10-day simulation we’d probably

get diﬀerent results. To estimate the probabilities more reliably we should do a simulation with many

more days. Below are the results of simulations with 10, 000 days. This was repeated 3 times and we

get slightly diﬀerent results each time.

Run 1 Run 2 Run 3 Poisson

0 customers 0.3686 0.3580 0.3606 0.3679

1 customer 0.3722 0.3717 0.3767 0.3679

2 customers 0.1837 0.1911 0.1883 0.1839

3 customers 0.0575 0.0646 0.0596 0.0613

4 customers 0 0 0.0122 0.0153

5 customers 0.0151 0.0120 0.0021 0.0031

6 customers 0.0025 0.0024 0.0004 0.0005

7 customers 0.0004 0.0002 0.0001 0.0001

Note that by running the simulation 3 times we get some reassurance that the value of 10,000 is

large enough to give values that don’t change much when we rerun the simulation. If we had run it for

only 100 days each we would see quite a bit of variation from run to run and would know that we don’t

have reliable values. Here it seems fairly clear that p

0

≈ 0.36, p

1

≈ 0.37, p

2

≈ 0.18, etc.

Poisson distribution. In fact we can compute these values without doing a simulation. It can be

shown mathematically that probabilities follow a Poisson distribution, and that for any value of a, the

average number of customers per day, the probability of observing exactly k customers in a single day

is

p

k

=

a

k

e

−a

k!

.

In particular,

p

0

= e

−a

p

1

= ae

−a

p

2

=

1

2

a

2

e

−a

p

3

=

1

6

a

3

e

−a

The values for the case a = 1 are shown in the last column of the table above, and they compare well

with the values obtained from the three 10,000-day simulations.

Note that

∞

k=1

p

k

= e

−a

_

1 +a +

a

2

2!

+

a

3

3!

+· · ·

_

= e

−a

e

a

= 1.

46 Monte Carlo Methods

The probabilities all add up to one as we should expect.

The formulas for the Poisson distribution can be derived by using a combinatorial argument based on

the probability of the event occurring in each small time period, and the number of ways it is possible

to have 0, 1, 2, . . . events occurring over the entire day. For this to be valid we must assume that

customers appear randomly with probability a/N in any small increment of 1/N day, without regard

to the past history. The arrival of customers is then said to be a Poisson process. The sequence of tests

to see whether there is a customer in each interval also called a series of Bernoulli trials.

To see how these values arise, ﬁrst consider a simpler case where we look at a sequence of only 3

Bernoulli trials, with a probability p of “success” in each trial (denoted by 1) and probability 1 − p of

“failure” (denoted by 0). There are 2

3

= 8 possible strings that can arise from these 3 independent

trials. These are listed below along with the probability of observing each:

000 (1 −p)

3

001 (1 −p)

2

p

010 (1 −p)

2

p

011 (1 −p)p

2

100 (1 −p)

2

p

101 (1 −p)p

2

110 (1 −p)p

2

111 p

3

(4.1)

Note that there are 3 ways to observe 1 customer during the entire day, and 3 ways to observe 2

customers, but only 1 way to observe 0 or 3 customers. Adding up the probabilities we ﬁnd the

following values for p

(3)

k

, the probability of observing k customers when N = 3:

0 customers (1 −p)

3

= p

(3)

0

1 customers 3(1 −p)

2

p = p

(3)

1

2 customers 3(1 −p)p

2

= p

(3)

2

3 customers p

3

= p

(3)

3

(4.2)

Note that these probabilities sum to 1. We can see this easily using the fact that

(x +y)

3

= x

3

+ 3x

2

y + 3xy

2

+y

3

with x = 1 − p and y = p. In fact we can use this type of binomial expansion to easily compute the

probabilities p

(N)

k

when N is larger without enumerating all 2

N

possible strings and counting things up.

We have

(x +y)

N

= x

N

+

_

N

1

_

x

N−1

y +

_

N

2

_

x

N−2

y

2

+· · · +

_

N

N −1

_

xy

N−1

+y

N

. (4.3)

The values

_

N

k

_

(“N choose k”) are often called the binomial coeﬃcients. The probabilities we seek

are

p

(N)

0

= (1 −p)

N

p

(N)

1

=

_

N

1

_

(1 −p)

N−1

p = N(1 −p)

N−1

p

p

(N)

2

=

_

N

2

_

(1 −p)

N−2

p

2

=

1

2

(N

2

− N)(1 −p)

N−2

p

2

etc.

(4.4)

4.7 Queuing theory 47

This is called the binomial distribution for the total number of “successes” in N Bernoulli trials.

To relate this to the Poisson distribution, recall that we want to determine the probability of

observing k customers in the case where N is very large and p = a/N, where a is a ﬁxed value

representing the average number of customers per day. The probability p

0

is given by

p

0

= lim

N→∞

p

(N)

0

= lim

N→∞

_

1 −

a

N

_

N

= e

−a

. (4.5)

Similarly,

p

1

= lim

N→∞

N

_

1 −

a

N

_

N−1

_

a

N

_

= a lim

N→∞

_

1 −

a

N

_

N−1

= ae

−a

.

(4.6)

4.7 Queuing theory

As another example of Monte Carlo simulation, we will look at an example from queuing theory. Suppose

customers arrive at a service center or cashier and have to line up to wait for service. Queuing theory

is the study of how such lines are expected to behave, and the comparison of diﬀerent strategies. For

example, if there are multiple servers we might want to compare the strategy of having independent

lines for each (as is common in grocery stores) or have a single line with the next customer going to the

ﬁrst available server (as is common in banks and airport check-in counters). Which approach minimizes

the average time a customer has to wait, for example?

There is a large literature on this subject, see for example [HL95]. Here we will look at the simplest

example of a single server and discuss how Monte Carlo simulation might be used to study the expected

length of the line.

The matlab code in Figure 4.6 shows a simple simulation of a queue. Customer arrivals are modeled

with a Poisson process with average rate a per minute. They must wait in the queue until the single

server is free. The amount of time required to serve them is also random, with average rate of b people

per minute. In this simulation this is also assumed to be Poisson process with probability of roughly

b/N that they will be ﬁnished in any small subunit of time 1/N. (This may not be very realistic. A

better model might be that there is some minimal service time plus a random length of time after that,

or some other distribution of service times.)

One could model this by splitting each minute up into N subunits and generating a random number

to determine whether a new customer arrives, and a second random number to determine whether the

customer currently being served ﬁnishes.

Instead, this simulation works by ﬁrst generating a set of arrival times and service times for each

customer, and then computing the total time that each customer must wait on the basis of these times.

The times are generated using an exponential distribution:

Probability that inter-arrival time is larger than T = e

−aT

.

This can be shown to be the proper distribution of service times for a Poisson distribution in the

following manner. Suppose we split up the time since the last arrival into units T, 2T, 3T, . . . rather

than minutes. Then in each unit the expected number of arrivals is aT. According to the Poisson

distribution, the probability of exactly 0 arrivals in the ﬁrst of these periods is thus e

−aT

, but this is

exactly the probability that the next arrival time is more than T units later.

To generate random numbers from an exponential distribution with parameter a, we can ﬁrst gener-

ate a random number r uniformly distributed in the interval [0, 1] (using rand in matlab, for example)

and then compute

−

1

a

log(r).

48 Monte Carlo Methods

Note that log is the natural logarithm command in matlab.

Figure 4.7 shows the results of three simulations using the parameters a = 1 and b = 1.5, for 1000

customers each. Two plots are shown for each simulation: on the left is the length of the queue as a

function of time, and on the right is a histogram of the total time spent by each customer (including

both the time in line and the time being served). While the 3 simulations show variation in details,

some similarities are observed. Note that although the server can handle 50% more customers than

are arriving on average, the queue length can grow quite long due to the randomness of arrivals. In all

three simulations the queue length is more than 10 at times. Also, looking at the histogram we see that

there are some customers who have to wait as long as 10 or 12 minutes in each simulation, though the

average total time is more like 2 minutes.

Figure 4.8 shows another set of simulations in which the parameter b = 1.1 is used instead, in which

case the average service time 1/b ≈ 0.91 is even closer to the average interarrival time. Again the details

vary between simulations, but it is clear that now the queues will typically grow to be of length greater

than 20 at times and the average waiting time is much longer. There are now typically some customers

who have to wait more than 20 minutes.

From comparing these sets of results it should be clear that such simulations can be eﬀectively used

to predict how badly queues may behave if the proper parameters are estimated. One can also use

Markov chain ideas to study the expected behavior of such systems and determine expected values for

quantities such as the total waiting time. In fact it can be shown that for arbitrary parameters a < b,

the expected total waiting time is 1/(b −a). Note that this is consistent with what is observed in these

simulations.

We have only looked at the simplest situation of a single queue and a single server. This model is

called an M/M/1 queue. The ﬁrst M indicates that the arrival times are exponentially distributed (or

Markovian), the second M indicates that the service times are also Markovian, and the 1 indicates there

is a single server. More complicated models would be required to model a bank or super market with

multiple lines and servers, for example.

Various modiﬁcations of this model are possible. For example, we could assume a maximum queue

length beyond which customers go away rather than joining the queue. In this case there are a ﬁnite

set of states possible (diﬀerent queue lengths) and it is possible to derive a Markov chain model of

transitions, as we will look at later.

4.7 Queuing theory 49

% Simple queuing theory simulation, M/M/1 queue

% Single server, single queue:

a = 1; % average number of arrivals per minute

b = 1.5; % average number of people served per minute

ncust = 1000;

% Notation:

% at = arrival time of a person joining the queue

% st = service time once they reach the front

% ft = finish time after waiting and being served.

%

% initialize arrays:

at = zeros(ncust,1);

ft = zeros(ncust,1);

% Generate random arrival times assuming Poisson process:

r = rand(ncust,1);

iat = -1/a * log(r); % Generate inter-arrival times, exponential distribution

at(1) = iat(1); % Arrival time of first customer

for i=2:ncust

at(i) = at(i-1) + iat(i); % arrival times of other customers

end

% Generate random service times for each customer:

r = rand(ncust,1);

st = -1/b * log(r); % service times for each

% Compute time each customer finishes:

ft(1) = at(1)+st(1); % finish time for first customer

for i=2:ncust

% compute finish time for each customer as the larger of

% arrival time plus service time (if no wait)

% finish time of previous customer plus service time (if wait)

ft(i) = max(at(i)+st(i), ft(i-1)+st(i));

end

total_time = ft - at; % total time spent by each customer

wait_time = total_time - st; % time spent waiting before being served

ave_service_time = sum(st)/ncust

ave_wait_time = sum(wait_time)/ncust

ave_total_time = sum(total_time)/ncust

% Plot histogram of waiting times:

hist(total_time,0:.5:20)

Figure 4.6: Queuing theory simulation. The code shown here is simpliﬁed and doesn’t show the com-

putation of the queue length which is plotted here. See the Handouts webpage for the full m-ﬁle.

50 Monte Carlo Methods

0 200 400 600 800 1000 1200

0

2

4

6

8

10

12

14

16

18

queue length vs. time

0 2 4 6 8 10 12 14 16 18 20

0

50

100

150

200

250

300

350

400

histogram of waiting times

a =1

b =1.5

average total time =2.1487

average service time =0.6786

0 100 200 300 400 500 600 700 800 900 1000

0

2

4

6

8

10

12

14

16

queue length vs. time

0 2 4 6 8 10 12 14 16 18 20

0

50

100

150

200

250

300

350

400

histogram of waiting times

a =1

b =1.5

average total time =2.2084

average service time =0.6693

0 200 400 600 800 1000 1200

0

5

10

15

queue length vs. time

0 2 4 6 8 10 12 14 16 18 20

0

50

100

150

200

250

300

350

400

histogram of waiting times

a =1

b =1.5

average total time =2.1234

average service time =0.67806

Figure 4.7:

4.7 Queuing theory 51

0 200 400 600 800 1000 1200

0

5

10

15

20

25

30

queue length vs. time

0 5 10 15 20 25 30 35 40

0

20

40

60

80

100

120

140

160

180

200

histogram of waiting times

a =1

b =1.1

average total time =6.2679

average service time =0.91675

0 100 200 300 400 500 600 700 800 900 1000

0

5

10

15

20

25

30

35

queue length vs. time

0 5 10 15 20 25 30 35 40

0

20

40

60

80

100

120

140

160

180

200

histogram of waiting times

a =1

b =1.1

average total time =11.3831

average service time =0.88394

0 200 400 600 800 1000 1200

0

5

10

15

20

25

queue length vs. time

0 5 10 15 20 25 30 35 40

0

20

40

60

80

100

120

140

160

180

200

histogram of waiting times

a =1

b =1.1

average total time =4.77

average service time =0.87122

Figure 4.8:

52 Monte Carlo Methods

4.8 Exercises

1. The following data yield the arrival times and service times that each customer will require for

the ﬁrst 13 customers at a single server system. Upon arrival, a customer either enters service if

the server is free or joins the waiting line. When the server completes work on a customer, the

next one in line (i.e. the one who has been waiting the longest) enters service.

Arrival Times: 12 31 63 95 99 154 198 221 304 346 411 455 537

Service Times: 40 32 55 48 18 50 47 18 28 54 40 72 12

(a) Determine the departure times of these 13 customers.

(b) Repeat (a) when there are two servers and a customer can be served by either one.

(c) Repeat (a) under the new assumption that when the server completes a service, the next

customer to enter service is the one that has been waiting the least time.

2. Customers arrive at the post oﬃce at a rate of approximately three per minute. What is the

probability that the next customer does not arrive during the next 3 minutes? Explain your

assumptions.

3. Mr. Obama is waiting to make a phone call at the oﬃce. There are two phone lines available and

both are being used by the VP and the secretary of state. If the duration of each telephone call

is an exponential random variable with λ = 1/8, what is the probability that among the three

people, Mr. Obama, the VP and the secretary, that Mr. Obama will not be the last to ﬁnish his

call?

4. Suppose that 3% of families in a large city have an annual income of over 80, 000. What is the

probability that of 60 random families, at most three have an annual income of over 80, 000?

5. Cabs carrying passengers to an airport terminal arrive according to a Poisson process with 25

arrivals per hour on average. Assume for simplicity that each cab brings exactly one passenger

who is either a man or a woman with probabilities p = .55 and q = 1 − p = .45 respectively.

What is the distribution of N

W

(t) = the number of women arriving during the time interval [0, t]?

What assumptions must one make to answer this question?

6. Consider the Bernoulli process with a fair coin. A run is a sequence of trials in a row all coming up

heads or all coming up tails. Let M

n

be the random variable that counts the maximum length of

any run of heads in n trials. For example, M

16

(HTTHHTTHHHTTTTTH) = 3 corresponding

to the run of three heads starting in position 8. What is the expected value and variance of M

n

for 1 ≤ n ≤ 64? An exact formula would be best but a good approximation from a simulation

will suﬃce.

7. Read ”America West Airlines Develops Eﬃcient Boarding Strategies” by van den Briel, Villalobos,

Hogg, Lindemann and Mule. Interfaces, vol. 35, No. 3, May-June 2005, pp. 191-201.

(a) In terms of our 4-stage modeling process, which sections of this paper correspond with each

stage and each edge of the modeling cycle?

(b) Describe strategies BF5 and OI5 in words. What does BF and OI stand for?

(c) How was OI5 found? How was BF5 found?

(d) Given a single half row of an airplane with 3 seats per side, what is the expected number of

times someone gets up to let another passenger into their seat? For example, if the people

in seats B and C are already seated when the passenger in seat A arrives, then 2 people get

up. Discussed on page 193 and again on page 199.

4.8 Exercises 53

(e) Discuss the entries in Table 1 and Table 2. What does the number 69 in the row labeled

“Economy class[xxx]” represent in words? What should we compare 69 to in Table 2?

(f) Discuss Tables 3 and 4. What conclusions can you make from this data?

(g) What is the strategy that the authors recommend and what evidence did they give that it

is best? Quote from the paper.

(h) Is the recommended strategy optimal? If not, what do you think would be better?

54 Monte Carlo Methods

Math 381 Notes — University of Washington —Winter 2011

Chapter 5

Markov Chains and Related Models

5.1 A Markov chain model of a queuing problem

To introduce the theory of Markov chains, let’s consider a simple queuing theory model in which there

is a single server and there are only three “states” which the queue can be in at any given time:

State 1: No customers are present

State 2: A customer is being served

State 3: A customer is being served and another is waiting.

Assume that the queue is never longer: if a customer arrives when the queue is in State 3 then that

customer disappears.

Suppose that arrivals and service are Poisson processes, and over a given short time interval there

is a probability p of a new customer arriving and a probability q that any customer currently being

served will ﬁnish. If the time interval is very short then these are very small and we can assume there

is negligible chance of more than one customer arriving or ﬁnishing, and also negligible chance of both

one customer ﬁnishing and another arriving in the same period.

We could perform a Monte Carlo simulation of this queue by keeping track of what state it is in,

and each time period either staying in this state or moving to a diﬀerent state depending on the value

of some random number:

random_num = rand(1);

if random_num < p

# a new customer arrives:

if state<3

state = state + 1;

end

end

if random_num > 1-q

# a customer finishes:

if state>1

state = state - 1;

end

end

Note that we’ve done the tests in such a way that if p and q are both very small then we can’t have

both happen.

Figure 5.1 shows the transition diagram between the three states. For example, if we are in State 2,

then with probability p we move to State 3 (if a customer arrives), with probability q we move to State

1 (if a customer ﬁnishes), and with probability 1 −p −q we stay in State 2 (if neither happens).

55

56 Markov Chains and Related Models

p

q

p

q

0

0

1-q

1-p-q

1-p

state 1

state 3

state 2

Figure 5.1: Transition diagram for a simple queuing theory model with three states.

If we do a single simulation then we will move around between the three states. We might be

interested in determining what percentage of the time we are in each of the three states over the long

term. We could determine this for many diﬀerent simulations and collect some statistics. If we do this

with p = 0.05 and q = 0.08, for example, we will observe that about 50% of the time we are in state 1,

about 31% in State 2, and about 19% in State 3.

Another way to get these number is to run lots of simulations simultaneously and keep track of what

percentage of the simulations are in each state at each time period. After a while we might expect to

see the same percentages as reported above (why??).

Here’s what we observe if we do 10,000 simulations, all starting in State 1, and print out how many

of the simulations are in each of the three states at each time:

time period State 1 State 2 State 3

0 10000 0 0

1 9519 481 0

2 9073 891 36

3 8685 1240 75

4 8379 1491 130

5 8096 1702 202

6 7812 1914 274

etc...

95 4949 3134 1917

96 4913 3166 1921

97 4920 3161 1919

98 4958 3148 1894

99 4982 3072 1946

100 4995 3038 1967

etc...

Initially all simulations were in State 1. After one time period a customer had arrived in 481 of the

simulations, or 4.81% of the simulations, about what we’d expect since p = 0.05. Since it’s impossible

for 2 customers to arrive in a single period, there are still no simulations in State 3 after one period.

5.1 A Markov chain model of a queuing problem 57

After 2 periods, however, it is possible to be in State 3, and in fact 36 of the simulations reached this

state. How does this agree with the number you should expect?

Notice that after many time periods, for n around 100, we appear to have reached a sort of “steady

state”, with roughly 50% of the simulations in State 1, 31% in State 2, and 19% in State 3. Of course

each individual simulation keeps moving around between states, but the number in each stays roughly

constant. Note that these are the same percentages quoted above.

What we now want to explore is whether we can determine these values without having to do

thousands of simulations. In fact we can by using the theory of Markov chains. Let v

(n)

k

be the fraction

of simulations which are in State k after n steps (for k = 1, 2, 3), and let v

(n)

be the vector with these

three components. If all simulations start in State 1 then we have

v

(0)

=

_

_

1

0

0

_

_

. (5.1)

What fractions do we expect for v

(1)

? Since each simulation moves to State 2 with probability

p = 0.05, or stays in State 1 with probability 1 −p = 0.95, we expect

v

(1)

=

_

_

0.95

0.05

0

_

_

.

Now consider the general case. Suppose we know v

(n)

and we want to determine v

(n+1)

. From the

transition diagram of Figure 5.1 we expect

v

(n+1)

1

= (1 −p)v

(n)

1

+qv

(n)

2

+ 0 · v

(n)

3

v

(n+1)

2

= pv

(n)

1

+ (1 −p −q)v

(n)

2

+qv

(n)

3

v

(n+1)

3

= 0 · v

(n)

1

+pv

(n)

2

+ (1 −q)v

(n)

3

.

(5.2)

We can write this in matrix-vector notation as

v

(n+1)

= Av

(n)

, (5.3)

where the transition matrix A is

A =

_

_

1 −p q 0

p 1 −p −q q

0 p 1 −q

_

_

=

_

_

0.95 .08 0

0.05 0.87 0.08

0 0.05 0.92

_

_

. (5.4)

The right-most matrix above is for the speciﬁc values p = 0.05 and q = 0.08. From the relation (5.3)

we can compute

v

(1)

=

_

_

.95

.05

0

_

_

, v

(2)

=

_

_

.9065

.0910

.0025

_

_

, v

(3)

=

_

_

.8685

.1247

.0069

_

_

, etc. (5.5)

Note that these values agree reasonably well with what is seen in the simulation after 1, 2, and 3 steps.

To predict the percentages seen later, we need only multiply repeatedly by the matrix A. Alternatively,

note that v

(2)

= A

2

v

(0)

and in general v

(n)

= A

n

v

(0)

. We can compute

v

(99)

= A

99

v

(0)

=

_

_

0.496525

0.309994

0.193481

_

_

, v

(100)

= A

100

v

(0)

=

_

_

0.496498

0.309999

0.193502

_

_

.

Note that again the values in the simulation are close to these values. Also notice that these two values

are very close to each other. Just as we conjectured from the simulation, we have reached a steady

58 Markov Chains and Related Models

state in which the percentages do not change very much from one time period to the next. If we kept

applying A we would ﬁnd little change in future steps, even after thousands of steps, e.g.,

v

(1000)

= A

1000

v

(0)

=

_

_

0.4961240310077407

0.3100775193798381

0.1937984496123990

_

_

, v

(10000)

= A

10000

v

(0)

=

_

_

0.4961240310076444

0.3100775193797780

0.1937984496123613

_

_

.

As n increases, the vectors v

(n)

are converging to a special vector ˆ v which has the property that

Aˆ v = ˆ v, (5.6)

so that a step leaves the percentages unchanged. Note that the relation (5.6) means that this vector is

an eigenvector of the matrix A, with eigenvalue λ = 1.

5.2 Review of eigenvalues and eigenvectors

Let A be a square m×m matrix with real elements. Recall from linear algebra that a scalar value λ is

an eigenvalue of the matrix A if there is a nonzero vector r (the corresponding eigenvector) for which

Ar = λr. (5.7)

If r is an eigenvector, then so is αr for any scalar value α. The set of all eigenvectors corresponding to

a given eigenvalue forms a linear subspace of lR

m

.

In general A has m eigenvalues λ

1

, . . . , λ

m

, which can be determined by ﬁnding the roots of a

polynomial of degree m in λ. This arises from the fact that if (5.7) holds, then

(A −λI)r = 0

where I is the m×m identity matrix. Since r is a nonzero null-vector of this matrix, it must be singular.

Hence its determinant is zero,

det(A −λI) = 0.

Computing the determinant gives the polynomial.

If the eigenvalues are distinct then the eigenvectors r

i

(corresponding to the diﬀerent λ

i

) will be

linearly independent and so we can form a matrix R with these vectors as columns which will be a

nonsingular matrix. Then Ar

i

= λ

i

r

i

leads to

AR = RΛ (5.8)

where Λ is the diagonal matrix with the values λ

i

on the diagonal. (Note that the order is important!

In general AR = RA and RΛ = ΛR since matrix multiplication is not commutative.)

Since R is nonsingular, it has an inverse R

−1

and we can rewrite (5.8) as

A = RΛR

−1

. (5.9)

(We might be able to do all this even if the eigenvalues are not distinct, but this case is a bit more

subtle.)

One use of the decomposition (5.9) is to compute powers of the matrix A. Note that

A

2

= (RΛR

−1

)(RΛR

−1

)

= RΛ(R

−1

R)ΛR

−1

= RΛΛR

−1

= R(Λ

2

)R

−1

(5.10)

5.3 Convergence to steady state 59

and similarly,

A

n

= RΛ

n

R

−1

(5.11)

for any n. This is easy to work with because Λ

n

is just the diagonal matrix with the values λ

n

i

on the

diagonal.

In matlab it is easy to compute eigenvalues and eigenvectors. For example, for the matrix A of

(5.4), we can ﬁnd the eigenvalues by doing:

>> p = .05; q = .08;

>> A = [ 1-p q 0; ...

p 1-p-q q; ...

0 p 1-q];

>> eig(A)

ans =

1.0000

0.8068

0.9332

If we also want to ﬁnd the eigenvectors, we can do

>> [R,Lambda] = eig(A)

R =

-0.8050 0.4550 0.7741

-0.5031 -0.8146 -0.1621

-0.3144 0.3597 -0.6120

Lambda =

1.0000 0 0

0 0.8068 0

0 0 0.9332

This returns the matrices R and Λ.

5.3 Convergence to steady state

Note that one of the eigenvalues of A computed above is equal to 1 and the others are both smaller

than 1 in magnitude. As we will see, this means the Markov process converges to a steady state. Recall

that

|λ

n

| →0 as n →∞ if |λ| < 1,

|λ

n

| = 1 for all n if |λ| = 1,

|λ

n

| →∞ as n →∞ if |λ| > 1,

(5.12)

So as n →∞, the matrix A

n

approaches

60 Markov Chains and Related Models

>> R * [1 0 0; 0 0 0; 0 0 0] * inv(R)

ans =

0.4961 0.4961 0.4961

0.3101 0.3101 0.3101

0.1938 0.1938 0.1938

as we can easily conﬁrm, e.g.,

>> A^100

ans =

0.4965 0.4960 0.4954

0.3100 0.3101 0.3102

0.1935 0.1939 0.1944

Now recall that when we multiply a matrix A by a vector v, we are forming a new vector which is

a linear combination of the columns of the matrix. The elements of v give the weights applied to each

column. If a

j

is the j’th column of A and v

j

is the j’th element of v, then the vector Av is

Av = a

1

v

1

+a

2

v

2

+a

3

v

3

.

In particular, if v

(0)

= [1 0 0]

′

, then Av

(0)

is just the ﬁrst column of A. Similarly, A

100

v

(0)

is the ﬁrst

column of the matrix A

100

.

Note that for the example we have been considering, all three columns of A

100

are nearly the same,

and for larger n the three columns of A

n

would be even more similar. This has an important consequence:

Suppose we started with diﬀerent “initial conditions” v

(0)

instead of v

(0)

= [1 0 0]

′

, which corresponded

to all simulations being in State 1 to start. For example we might start with v

(0)

= [0.5 0.3 0.2]

′

if half

the simulations start in State 1, 30% in State 2, and 20% in State 3. Then what would distribution of

states would we see in the long run?

After n steps the expected distribution is given by A

n

v

(0)

. After 100 steps,

>> v100 = A^100 * [.5 .3 .2]’

v100 =

0.4961

0.3101

0.1938

The same steady state as before. Since all three columns of A

100

are essentially the same, and the

elements of v

(0)

add up to 1, we expect A

100

v

(0)

to be essentially the same for any choice of v

(0)

.

Even though the steady state behavior is the same no matter how we start, the transient behavior

over the ﬁrst few days will be quite diﬀerent depending on how we start. For example, starting with

v

(0)

= [0.5 0.3 0.2]

′

instead of (5.1) gives

v

(1)

=

_

_

.499

.302

.199

_

_

, v

(2)

=

_

_

.4982

.3036

.1982

_

_

, v

(3)

=

_

_

.4976

.3049

.1975

_

_

, etc. (5.13)

5.4 Other examples of Markov Chains 61

instead of (5.5).

Another way to look at the behavior of v as we multiply repeatedly by A is to decompose the initial

vector v

(0)

as a linear combination of the eigenvectors of A:

v

(0)

= c

1

r

1

+c

2

r

2

+c

3

r

3

. (5.14)

Since the vectors r

1

, r

2

and r

3

are linearly independent, this can always be done for any v

(0)

and the

solution c = [c

1

c

2

c

3

]

′

is unique. In fact the vector c is the solution of the linear system

Rc = v

(0)

,

since this is exactly what (5.14) states, writing the matrix-vector multiply as a linear combination of

the columns.

Now consider what happens if we multiply v

(0)

from (5.14) by the matrix A:

v

(1)

= Av

(0)

= c

1

Ar

1

+c

2

Ar

2

+c

3

Ar

3

= c

1

λ

1

r

1

+c

2

λ

2

r

2

+c

3

λ

3

r

3

.

(5.15)

Multiplying by A again gives

v

(2)

= Av

(1)

= c

1

λ

1

Ar

1

+c

2

λ

2

Ar

2

+c

3

λ

3

Ar

3

= c

1

(λ

1

)

2

r

1

+c

2

(λ

2

)

2

r

2

+c

3

(λ

3

)

2

r

3

.

(5.16)

In general we have

v

(n)

= A

n

v

(0)

= c

1

(λ

1

)

n

r

1

+c

2

(λ

2

)

n

r

2

+c

3

(λ

3

)

n

r

3

. (5.17)

Again recall from (5.12), that as n gets large, (λ

j

)

n

→ 0 for any eigenvalue that is smaller than one in

magnitude. For this particular problem λ

1

= 1 while the others are smaller, and so

v

(n)

→c

1

r

1

for large n. This means the steady state is a multiple of r

1

, which is again an eigenvector of A, as we

already knew.

Note that the smaller |λ| is, the faster λ

n

goes to zero. So from (5.17) we can see how quickly we

expect to approximately reach a steady state. This depends also on how large the values c

j

are, which

depends on the particular initial data chosen. If v

(0)

happens to be a steady state already, a multiple

of r

1

, then c

2

= c

3

= 0.

5.4 Other examples of Markov Chains

5.4.1 Breakdown of machines

A certain oﬃce machine has three states: Working, Temporarily Broken, and Permanently Broken. Each

day it is in one of these states. The transition diagram between the states is shown in Figure 5.2. If the

machine is working on Day 1, what is the probability that it is still functional (i.e., not permanently

broken) a year from now? (In this example we will consider at a single machine and talk about the

probability that this machine is in each state. Alternatively we could consider a pool of many machines

and consider what fraction of the machines are in each state on a given day.)

We can answer this question by computing the 365’th power of the transition matrix:

>> A = [.90 .6 0; ...

.095 .4 0; ...

.005 0 1];

62 Markov Chains and Related Models

0.9

0.4

0

0.6

0.005 0

0

1

0.095

state 1

state 3

state 2

Figure 5.2: Transition diagram for the modeling machine breakdown. The three states are: State 1:

Working, State 2: Temporarily Broken, and State 3: Permanently Broken.

>> A^365

ans =

0.1779 0.1792 0

0.0284 0.0286 0

0.7937 0.7922 1.0000

The ﬁrst column of this matrix gives the probability of being in each state on Day 366 (with our

assumption that it is working on Day 1, so v

(0)

= [1 0 0]). We see that the probability that the machine

is permanently broken in 0.7937 and so there is about a 21% chance that the machine is still functional.

Note that the probabilities would be about the same if we’d started in the Temporarily Broken state

on Day 1 (by looking at the second column of this matrix). Why does this make sense? Why is the

third column so diﬀerent?

How about after 5 years, or 1825 days? We ﬁnd

>> A^1825

ans =

0.0003 0.0003 0

0.0001 0.0001 0

0.9996 0.9996 1.0000

so by this time the machine is almost surely dead whatever state we started in.

For this Markov chain the steady state solution is the vector [0 0 1]

′

, as we can see by computing

the eigenvalues and eigenvectors:

>> [R, Lambda] = eig(A)

5.4 Other examples of Markov Chains 63

R =

0 0.7096 -0.6496

0 -0.7045 -0.1036

1.0000 -0.0051 0.7532

Lambda =

1.0000 0 0

0 0.3043 0

0 0 0.9957

State 3 (Permanently Broken) is called an absorbing state. No matter what state we start in, we

will eventually end up in State 3 with probability 1, and once in State 3 we can never leave.

5.4.2 Work schedules

Suppose the XYZ Factory is operating every day and has a work schedule where each employee works

every other day. Say that an employee is in State 1 on work days and in State 2 on oﬀ days. Then the

transition diagram is simply

0

1

1

0

state 1 state 2

since with probability 1 the employee switches states each day. In this case the transition matrix is

A =

_

0 1

1 0

_

.

If 70% of the employees are working the ﬁrst day then we have

v

(0)

=

_

0.7

0.3

_

, v

(1)

=

_

0.3

0.7

_

, v

(2)

=

_

0.7

0.3

_

, v

(3)

=

_

0.3

0.7

_

,

and so on. In this case we do not reach a single steady state, but we do have a very regular pattern of

cycling back and forth between two states. This can again be understood by looking at the eigenvalues

and eigenvectors of the matrix:

>> A = [0 1; ...

1 0];

64 Markov Chains and Related Models

>> [R, Lambda] = eig(A)

R =

0.7071 0.7071

-0.7071 0.7071

Lambda =

-1 0

0 1

Both eigenvalues have magnitude 1. If we decompose v

(0)

in terms of the eigenvectors,

v

(0)

= c

1

r

1

+c

2

r

2

then we have

v

(n)

= c

1

(−1)

n

r

1

+c

2

r

2

.

The ﬁrst terms ﬂips sign every day and accounts for the oscillation.

In the special case where v

(0)

is a multiple of r

2

, we would expect c

1

= 0 and in this case the solution

should be truly steady. Since the elements of v must sum to 1, the only multiple of r

2

we can consider

is

v

(0)

=

_

0.5

0.5

_

.

This makes sense: if the factory starts with 50% of employees working on the ﬁrst day then 50% will

be working every day.

5.4.3 The ﬁsh tank inventory problem

The Monte Carlo simulation shown in Figure 4.1 gave results indicating that with this particular ordering

strategy and set of parameters, we could serve about 60% of the customers who come looking for a 150

gallon ﬁsh tank. We can derive this using a Markov chain. For the situation modeled in this program

the store can be assumed to always be in one of six states each day:

1. A customer arrives and there’s a tank in stock,

2. A customer arrives, there’s none in stock but one due to arrive the next day,

3. A customer arrives, there’s none in stock but one due to arrive in two days.

4. No customer arrives, but there’s a tank in stock,

5. No customer arrives, there’s none in stock but one due to arrive the next day,

6. No customer arrives, there’s none in stock but one due to arrive in two days.

Working out the probabilities of moving from each state to each of the others, we ﬁnd the transition

matrix

A =

_

¸

¸

¸

¸

¸

¸

_

0 p 0 p p 0

0 0 p 0 0 p

p 0 0 0 0 0

0 1 −p 0 1 −p 1 −p 0

0 0 1 −p 0 0 1 − p

1 −p 0 0 0 0 0

_

¸

¸

¸

¸

¸

¸

_

(5.18)

5.4 Other examples of Markov Chains 65

where p = 1/3 is the probability of a customer arriving on any given day. This matrix has three non-zero

eigenvalues including λ = 1. The eigenvector corresponding to λ = 1, when normalized so the elements

sum to 1, is found to be

ˆ v =

1

15

_

¸

¸

¸

¸

¸

¸

_

3

1

1

6

2

2

_

¸

¸

¸

¸

¸

¸

_

=

_

¸

¸

¸

¸

¸

¸

_

0.2000

0.0667

0.0667

0.4000

0.1333

0.1333

_

¸

¸

¸

¸

¸

¸

_

The components of this vector are the expected fraction of days that the store is in each of the six

possible states, in the long term steady state. From this we can see that the fraction of time a customer

arrives and is served can be computed from the components of ˆ v as

fraction served = ˆ v

1

/(ˆ v

1

+ ˆ v

2

+ ˆ v

3

) =

3

3 + 1 + 1

= 0.6.

Exactly 60%, as observed in the Monte Carlo simulation.

5.4.4 Card shuﬄing

Another sort of application of Markov chains is to model a riﬄe shuﬄe of a deck of cards. This

application received a great deal of attention some years back when it was used to determine how many

shuﬄes are necessary to randomize a deck of cards [Kol90]. The answer turns out to be 7.

The model was developed by Gilbert[Gil55], Shannon and by Reeds [Ree81], and it was used by

Bayer and Diaconis [BD92] to answer the question about randomizing an initially sorted deck of cards.

The riﬄe shuﬄe is modeled as follows: A deck of n cards is divided into two heaps, with the probability

of a heap containing k cards being given by a binomial distribution,

_

n

k

_

/2

n

. The heaps are then

riﬄed together into a single pile, with the probability of dropping the next card from a particular heap

being proportional to the number of cards in that heap.

This is generally considered to be a reasonable model of the way humans perform a riﬄe shuﬄe,

although this has been debated. With this model, the process of card shuﬄing is represented as a

Markov chain. Given the current ordering of the cards, one can write down the probability of achieving

any other ordering after the next shuﬄe. Unfortunately, with a deck of 52 cards there are 52! possible

orderings, and this would seem to render the problem intractable from a computational point of view.

The great contribution of Bayer and Diaconis was to notice that the number of “states” of this

system could be drastically reduced. Given only the number of rising sequences in the current ordering,

one could determine the probability of any given number of rising sequences after the next shuﬄe.

A rising sequence is deﬁned as a maximal sequence of consecutively numbered cards found (perhaps

interspersed with other cards) during one pass through the deck. For example, the ordering 146253

has three rising sequences: 123, 45, and 6. With this observation the number of states of the system

is reduced from n! to n, and for a deck of n = 52 cards it now becomes an easy problem to solve

numerically.

An excellent description of the problem and solution from more of a linear algebra point of view

can be found in J´onsson and Trefethen [JT98], where they also supply a MATLAB code that forms the

probability transition matrix P and a related matrix A = P −P

∞

whose kth power gives the diﬀerence

between the state of the system after k shuﬄes and that after inﬁnitely many shuﬄes. A version of

their code is available on the class web page. The entries of the probability transition matrix are

P

ji

= 2

−n

_

n + 1

2i −j

_

α

j

α

i

,

66 Markov Chains and Related Models

5 10 15 20 25 30 35 40 45 50

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

Number of rising sequences

p

r

o

b

a

b

i

l

i

t

y

Steady−State Distribution of Rising Sequences

Figure 5.3: Steady-state distribution of number of rising sequences

where α

j

denotes the number of permutations of {1, . . . , n} that have exactly j rising sequences. The

α

j

’s are known as Eulerian numbers, and they satisfy the recurrence:

A

r,k

= kA

r−1,k

+ (r −k + 1)A

r−1,k−1

,

where A

11

= 1, A

1k

= 0 for k = 1, and α

j

= A

nj

. See [Slo00] for more details on this sequence.

The eigenvalues of the probability transition matrix are known to be 2

−j

, j = 0, 1, . . . , n − 1, and

the eigenvector corresponding to eigenvalue 1 is known as well:

v = (α

1

, . . . , α

n

)

T

/n!

The entries of this vector are the probabilities of each possible number of rising sequences after the deck

has been shuﬄed inﬁnitely many times. The vector v is plotted in Figure 5.3. It can be determined

from the entries of v that, for example, with probability .9999 a well-shuﬄed deck has between 19 and

34 rising sequences.

This begins to give us a clue as to why the system behaves as it does. If the deck is initially sorted in

increasing order — so that there is one rising sequence and the initial state vector is s

0

= (1, 0, . . . , 0)

T

— then after one shuﬄe there will be at most two rising sequences. Do you see why? When you take a

deck with two rising sequences, cut it in two and riﬄe the two portions together, the resulting ordering

can have at most four rising sequences, etc. After j shuﬄes there are at most 2

j

rising sequences, so

that for j ≤ 4 it is easy to determine that the ordering is not random. Try the experiment. Shuﬄe a

deck of cards four times and count the number of rising sequences. It will be no more than 16. Then

shuﬄe several more times (at least three more times) and with very high probability you will count at

least 19 rising sequences.

Figure 5.4 shows a plot of the diﬀerence between the state vector after k shuﬄes and that after

inﬁnitely many shuﬄes:

1

2

n

i=1

|v

i

− (P

k

s

0

)

i

|. This is called the total variation norm and is the

measure used by Bayer and Diaconis. The deck is deemed to be suﬃciently well shuﬄed when this

diﬀerence drops below .5. Notice that this happens after k = 7 shuﬄes.

5.5 General properties of transition matrices

Transition matrices are special types of matrices which have some general properties which we summa-

rize here:

• Every element in a transition matrix must lie between 0 and 1 since it is a probability.

5.5 General properties of transition matrices 67

0 1 2 3 4 5 6 7 8 9 10

0

0.2

0.4

0.6

0.8

1

1.2

1.4

Number of shuffles

D

i

s

t

a

n

c

e

f

r

o

m

r

a

n

d

o

m

(

i

n

t

o

t

a

l

v

a

r

i

a

t

i

o

n

n

o

r

m

)

Figure 5.4: Convergence toward the steady-state in total variation norm

• Each column of a transition matrix sums to 1. This is because the elements in the j’th column

represent the probabilities of moving from State j to each of the other states. Since we must move

somewhere, all of these probabilities must add up to 1.

• On the other hand the rows do not need to sum to 1. (Note: In many books transition matrices

are deﬁned diﬀerently, with a

ij

being the probability of moving from State i to State j instead of

being the probability of moving from State j to State i as we have used. In this case the transition

matrix is the transpose of ours, and the rows sum to 1 instead of the columns.)

• If w is the row vector with all components equal to 1, then wA = w. This is just a restatement

of the fact that all columns must sum to 1. But this also shows that w is a left eigenvector of A

with eigenvalue λ = 1. This proves that λ = 1 is always an eigenvalue of any transition matrix.

(Recall that in general the left eigenvectors of A, row vectors ℓ for which ℓA = λℓ, are diﬀerent

from the right eigenvectors, but the eigenvalues are the same. In fact the left eigenvectors ℓ are

the rows of the matrix R

−1

, the inverse of the matrix of right eigenvectors.)

• It can also be shown that all eigenvalues of a transition matrix satisfy |λ| ≤ 1, so some components

of an eigen-expansion such as (5.14) decay while others have constant magnitude, but none can

grow exponentially.

• If λ

j

= 1 then the eigenvector r

j

can be used to ﬁnd a potential steady state of the system.

Since the elements of v must sum to 1, we must normalize r

j

to have this sum. Recall that

any scalar multiple c

j

r

j

is also an eigenvector, so we choose c

j

= 1/

i

r

ij

where r

ij

are the

elements of the vector r

j

. For example, if we use Matlab and obtain an eigenvector such as

r

1

= [−0.8050 −0.5031 −0.3144]

′

, we can compute the normalized eigenvector as

ˆ v =

r

1

−0.8050 −0.5031 −0.3144

=

_

_

0.4961

0.3101

0.1938

_

_

This is the expected steady state.

• Note that normalizing an eigenvector gives a reasonable result only if all the elements of the

eigenvector have the same sign. What can you hypothesize about eigenvectors of A corresponding

to eigenvalues λ

j

with |λ

j

| < 1?

68 Markov Chains and Related Models

5.6 Ecological models and Leslie Matrices

Similar matrix techniques can be used in other situations that are not exactly “Markov chains”. As an

example, suppose we want to study the population of a certain species of bird. These birds are born

in the spring and live at most 3 years, and we will keep track of the population just before breeding.

There will be 3 types of birds: Age 0 (born the previous spring), Age 1, and Age 2. Let v

(n)

0

, v

(n)

1

and

v

(n)

2

represent the number of females in each age class just before breeding in Year n. To model how

the population changes we need to know:

• Survival rates. Suppose 20% of Age 0 birds survive to the next spring, and 50% of Age 1 birds

survive to become Age 2.

• Fecundity rates. Suppose females that are 1 year old produce a clutch of a certain size, of which

3 females are expected to survive to the next breeding season. Females that are 2 years old lay

more eggs and suppose 6 females are expected to survive to the next spring.

Then we have the following model:

v

(n+1)

0

= 3v

(n)

1

+ 6v

(n)

2

v

(n+1)

1

= 0.2v

(n)

0

v

(n+1)

2

= 0.5v

(n)

1

(5.19)

In matrix-vector form:

v

(n+1)

=

_

_

0 3 6

0.2 0 0

0 0.5 0

_

_

_

¸

_

v

(n)

0

v

(n)

1

v

(n)

2

_

¸

_. (5.20)

Note that in this case the columns of the matrix do not sum to 1, and some of the entries do not

represent probabilities.

However, we can predict the future population by iterating with this matrix just as we did using

transition matrices for a Markov chain, and the eigenvalues and eigenvectors will give useful information

about what we expect to see in the long run. This form of matrix, based on survival rates and fecundities,

is called a Leslie matrix. (See [Jef78, SK87] for more discussion.)

Suppose we start with a population of 3000 females, 1000 of each age. Then we ﬁnd

v

(0)

=

_

_

1000

1000

1000

_

_

, v

(1)

=

_

_

9000

200

500

_

_

, v

(2)

=

_

_

3600

1800

100

_

_

, v

(3)

=

_

_

6000

720

900

_

_

,

and so on. If we plot the population in each age group as a function of Year, we see what is shown in

Figure 5.5(a). Clearly the population is growing exponentially. Figure 5.5(b) shows the fraction of the

population which is in each age class as a function of year. This settles down to a steady state.

After 20 years the population is v

(20)

= [20833 3873 1797]

′

while the percentage in each age class is

[0.7861 0.1461 0.0678]

′

. We can compute that between the last two years the total population grew by

a factor of 1.0760.

Since v

(20)

= A

20

v

(0)

, we can predict all this from the eigenvalues and eigenvectors of A. We ﬁnd

A =

0 3.0000 6.0000

0.2000 0 0

0 0.5000 0

5.6 Ecological models and Leslie Matrices 69

0 5 10 15 20 25

0

0.5

1

1.5

2

2.5

x 10

4

population of each age group

Age 0

Age 1

Age 2

0 5 10 15 20 25

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

fraction of population in each age group

Figure 5.5: Bird population with a

32

= 0.5 and initial data v

(0)

= [1000 1000 1000]

′

.

>> [R,Lambda] = eig(A)

R =

0.9796 -0.6990 - 0.6459i -0.6990 + 0.6459i

0.1821 0.0149 + 0.2545i 0.0149 - 0.2545i

0.0846 0.1110 - 0.1297i 0.1110 + 0.1297i

Lambda =

1.0759 0 0

0 -0.5380 + 0.5179i 0

0 0 -0.5380 - 0.5179i

>> R(:,1)/sum(R(:,1))

ans =

0.7860

0.1461

0.0679

We see that λ

1

= 1.0759 is greater than 1 and accurately predicts the growth rate of the total

population. The other two eigenvalues are complex and have magnitude 0.7468, so the corresponding

components in the initial vector v

(0)

die out. This transient behavior which results from the fact that

the initial population distribution (equal number of each age) is not the steady state distribution. The

ﬁrst eigenvector r

1

, normalized to have sum 1, gives the fraction expected in each age class in the steady

state. If we had started with a population that was distributed this way, then we would see smooth

behavior from the beginning with no transient. (See the example in the next section.)

70 Markov Chains and Related Models

5.6.1 Harvesting strategies

We can use this model to investigate questions such as the following. Suppose we want to harvest a

certain fraction of this population, either to control the exponential growth or as a food source (or

both). How much should we harvest to control the growth without causing extinction?

Harvesting will decrease the survival rates. We might like to decrease the rates to a point where the

dominant eigenvalue is very close to 1 in value. If we decrease the rate too much, then all eigenvalues

will be less than 1 and the population size will decrease exponentially to extinction.

For example, suppose we harvest an additional 20% of the 1-year olds, so the survival rate a

32

falls

from 0.5 to 0.3. Then we ﬁnd that the largest eigenvalue of A is 0.9830, giving the graphs shown in

Figure 5.6.

0 5 10 15 20 25

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

population of each age group

Age 0

Age 1

Age 2

0 5 10 15 20 25

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

fraction of population in each age group

Figure 5.6: Bird population with a

32

= 0.3 and initial data v

(0)

= [1000 1000 1000]

′

.

Of course it might be more reasonable to assume the population starts out with a steady-state

distribution of ages. For the new matrix the relevant scaled eigenvector is r

1

= [.7902 0.1608 0.0491]

′

and for a population of 3000 birds this suggests we should take v

(0)

= 3000r

1

= [2371 482 147]

′

. Then

we obtain the boring graphs shown in Figure 5.7.

Note that we now have exponential decay of the population and eventual extinction. However, over

the 20-year span shown here the total population is predicted to decrease from 3000 to 3000(0.983)

20

=

2129 and probably our model is not even accurate out this far. (Assuming it’s accurate at all!) To

judge this, one would have to investigate what assumptions the model is based on and how far in the

future these assumptions will be valid.

0 5 10 15 20 25

0

500

1000

1500

2000

2500

population of each age group

0 5 10 15 20 25

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

fraction of population in each age group

Figure 5.7: Bird population with a

32

= 0.3 and initial data having the expected steady-state distribution

of ages.

5.7 Exercises 71

5.7 Exercises

1. (Adapted from Ecological Simulation Primer by Swartzman and Kaluzny)

The ﬁre danger during the summer in Mount Baker National Forest is classiﬁed into one of three

danger levels. These are 1 =low, 2 =moderate, 3 =high. The probability of daily transitions

between these states is given by the following ﬂow diagram:

state 1

state 2

state 3

.4

.3

.5

.2

.5

.1

0

.5

.5

(a) Write the model in matrix form to project the ﬁre danger probability from one day to the

next.

(b) If we are in State 1 today, what is the probability that we will be in State 2 the day after

tomorrow?

(c) If the matrix you found is correct, then it has eigenvalues and eigenvectors given by

Lambda =

1.0000 0 0

0 0.0697 0

0 0 0.4303

R =

-0.4699 -0.5551 -0.7801

-0.7832 0.7961 0.1813

-0.4072 -0.2410 0.5988

Based on these, what is the equilibrium probability of being in each state?

2. (Adapted from Finite Mathematics by R. F. Brown and B. W. Brown)

In an automobile-leasing company’s study of its customers who lease a new car every year, it is

found that 30% of those who leased sedans the preceding year changed to minivans, whereas 40%

of those who had leased minivans changed to sedans. The rest of the customers repeated the type

of vehicle they had before.

(a) Sketch the transition diagram.

(b) If 20% of the cars leased to customers last year were minivans, what percentage of customers

will lease minivans this year?

(c) Write the Markov chain in matrix form and determine the 2 ×2 transition matrix A.

72 Markov Chains and Related Models

(d) Determine the eigenvalues and eigenvectors of A by hand, based on the determinant of A−λI.

(Hint: you know what one of the eigenvalues should be.)

(e) What is the equilibrium distribution of sedan and minivan leases?

3. A certain 2-year college considers students to be either “freshmen” or “sophmores” each year

depending on how many credits they have at the start of the year. The administration has

determined that 60% of the freshmen each year go on to be sophmores the next year, while 15%

remain freshmen and 25% do not return. Each year 80% of sophmores graduate, while 15% remain

sophmores and 5% drop out. This might be modeled as a 4-state Markov chain.

(a) Peter is a freshman. What is the probability that he will ever graduate?

(b) Mary is a sophmore. What is the probability that she will ever graduate?

(c) Do you think a 4-state Markov chain model really makes sense in this case? Recall that we

must assume the transition probabilities depend only on the state a student is currently in,

not their past history. Comment on whether this is realistic. How might you improve the

model by adding more states?

4. Continuing the previous problem, suppose the admissions oﬃce is setting a new policy for the

number of students to admit. They keep track of how many students graduate in Year n, call this

G

n

, and plan to admit βG

n

new freshmen in Year n+1, where β is some parameter. They might

like to take β = 1 and admit as many new students as graduate, but since not all students ever

graduate, this would lead to a decline in enrollment. Instead they plan to use this strategy with

β = 2, admitting twice as many new students as there are graduates.

(a) With this strategy, do you expect the enrollment to rise or decline in the long run?

(b) Can you ﬁnd a better value of β to use if the goal is to have a steady enrollment? (It’s ok to

use trial and error combined with matlab to ﬁnd the “right” value.)

(c) With this strategy, about what percentage of students will be freshmen each year?

Note: the matrix you want to look at in this case is similar to a Leslie matrix rather than a

Markov chain transition matrix.

Math 381 Notes — University of Washington —Winter 2011

Chapter 6

Linear Programming

6.1 Introduction

Section 6.1 of the notes is adapted from an introduction to linear programming written by Professor J.

Burke for Math 407. Some of the discussion in the original notes which introduces duality theory and

sensitivity analysis has been eliminated. Burke’s original notes can be found on the webpage

http://www.math.washington.edu/~burke/crs/407/HTML/notes.html

Other introductions to linear programming can be found in many references, including [HL95] and

[Chv83].

6.1.1 What is optimization?

A mathematical optimization problem is one in which some function is either maximized or minimized

relative to a given set of alternatives. The function to be minimized or maximized is called the objective

function and the set of alternatives is called the feasible region (or constraint region). In this course,

the feasible region is always taken to be a subset of lR

n

(real n-dimensional space) and the objective

function is a function from lR

n

to lR.

We further restrict the class of optimization problems that we consider to linear programming

problems (or LPs). An LP is an optimization problem over lR

n

wherein the objective function is a

linear function, that is, the objective has the form

c

1

x

1

+c

2

x

2

+· · · +c

n

x

n

for some c

i

∈ lR i = 1, . . . , n, and the feasible region is the set of solutions to a ﬁnite number of linear

inequality and equality constraints, of the form

a

i1

x

1

+a

i2

x

2

+· · · +a

in

x

n

≤ b

i

i = 1, . . . , s

and

a

i1

x

1

+a

i2

x

2

+· · · + a

in

x

n

= b

i

i = s + 1, . . . , m.

Linear programming is an extremely powerful tool for addressing a wide range of applied optimization

problems. A short list of application areas is resource allocation, production scheduling, warehousing,

layout, transportation scheduling, facility location, ﬂight crew scheduling, parameter estimation, . . . .

6.1.2 An Example

To illustrate some of the basic features of LP, we begin with a simple two-dimensional example. In

modeling this example, we will review the four basic steps in the development of an LP model:

1. Determine and label the decision variables.

73

74 Linear Programming

2. Determine the objective and use the decision variables to write an expression for the objective

function.

3. Determine the explicit constraints and write a functional expression for each of them.

4. Determine the implicit constraints.

PLASTIC CUP FACTORY

A local family-owned plastic cup manufacturer wants to optimize their production mix in

order to maximize their proﬁt. They produce personalized beer mugs and champaign glasses.

The proﬁt on a case of beer mugs is $25 while the proﬁt on a case of champaign glasses

is $20. The cups are manufactured with a machine called a plastic extruder which feeds

on plastic resins. Each case of beer mugs requires 20 lbs. of plastic resins to produce while

champaign glasses require 12 lbs. per case. The daily supply of plastic resins is limited to

at most 1800 pounds. About 15 cases of either product can be produced per hour. At the

moment the family wants to limit their work day to 8 hours.

We will model the problem of maximizing the proﬁt for this company as an LP. The ﬁrst step in

our modeling process is to determine the decision variables. These are the variables that represent the

quantiﬁable decisions that must be made in order to determine the daily production schedule. That is,

we need to specify those quantities whose values completely determine a production schedule and its

associated proﬁt. In order to determine these quantities, one can ask the question “If I were the plant

manager for this factory, what must I know in order to implement a production schedule?” The best

way to determine the decision variables is to put oneself in the shoes of the decision maker and then

ask the question “What do I need to know in order to make this thing work?” In the case of the plastic

cup factory, everything is determined once it is known how many cases of beer mugs and champaign

glasses are to be produced each day.

Decision Variables:

B = # of cases of beer mugs to be produced daily.

C = # of cases of champaign glasses to be produced daily.

You will soon discover that the most diﬃcult part of any modeling problem is the determination of

decision variables. Once these variables are correctly determined then the remainder of the modeling

process usually goes smoothly.

After specifying the decision variables, one can now specify the problem objective. That is, one can

write an expression for the objective function.

Objective Function:

Maximize proﬁt where proﬁt = 25B + 20C

The next step in the modeling process is to express the feasible region as the solution set of a ﬁnite

collection of linear inequality and equality constraints. We separate this process into two steps:

1. determine the explicit constraints, and

2. determine the implicit constraints.

The explicit constraints are those that are explicitly given in the problem statement. In the problem

under consideration, there are explicit constraints on the amount of resin and the number of work hours

that are available on a daily basis.

Explicit Constraints:

resin constraint: 20B + 12C ≤ 1800

6.1 Introduction 75

work hours constraint:

1

15

B +

1

15

C ≤ 8.

This problem also has other constraints called implicit constraints. These are constraints that are

not explicitly given in the problem statement but are present nonetheless. Typically these constraints

are associated with “natural” or “common sense” restrictions on the decision variable. In the cup

factory problem it is clear that one cannot have negative cases of beer mugs and champaign glasses.

That is, both B and C must be non-negative quantities.

Implicit Constraints:

0 ≤ B, 0 ≤ C.

The entire model for the cup factory problem can now be succinctly stated as

P : max 25B + 20C

subject to 20B + 12C ≤ 1800

1

15

B +

1

15

C ≤ 8

0 ≤ B, C

Since this problem is two dimensional it is possible to provide a graphical solution. The ﬁrst step

toward a graphical solution is to graph the feasible region. To do this, ﬁrst graph

the line associated with each of the linear inequality constraints. Then determine on which side of each

of these lines the feasible region must lie (don’t forget the implicit constraints!). Once the correct side

is determined it is helpful to put little arrows on the line to remind yourself of the correct side. Then

shade in the resulting feasible region.

The next step is to draw in the vector representing the gradient of the objective function at the

origin. Since the objective function has the form

f(x

1

, x

2

) = c

1

x

1

+c

2

x

2

,

the gradient of f is the same at every point in lR

2

;

∇f(x

1

, x

2

) =

_

c

1

c

2

_

.

Recall from calculus that the gradient always points in the direction of increasing function values.

Moreover, since the gradient is constant on the whole space, the level sets of f associated with diﬀerent

function values are given by the lines perpendicular to the gradient. Consequently, to obtain the location

of the point at which the objective is maximized we simply set a ruler perpendicular to the gradient and

then move the ruler in the direction of the gradient until we reach the last point (or points) at which

the line determined by the ruler intersects the feasible region. In the case of the cup factory problem

this gives the solution to the LP as

_

B

C

_

=

_

45

75

_

We now recap the steps followed in the solution procedure given above:

Step 1: Graph each of the linear constraints and indicate on which side of the constraint the feasible

region must lie. Don’t forget the implicit constraints!

Step 2: Shade in the feasible region.

Step 3: Draw the gradient vector of the objective function.

Step 4: Place a straightedge perpendicular to the gradient vector and move the straightedge either in

the direction of the gradient vector for maximization, or in the opposite direction of the gradient

vector for minimization to the last point for which the straightedge intersects the feasible region.

The set of points of intersection between the straightedge and the feasible region is the set of

solutions to the LP.

76 Linear Programming

feasible region

B

optimal value = $2625

20B+12C=1800

[B C] = [45 75]

B/15 + C/15 = 8

C

objective normal n=[25 20]’

14

15

13

12

1 4

11

10

9

8

5

6

7

4

3

2

1

2 3 5 6 7 8 9 10 11 12 13 14 15

Figure 6.1:

6.1 Introduction 77

The solution procedure described above for two dimensional problems reveals a great deal about the

geometric structure of LPs that remains true in n dimensions. We will explore this geometric structure

more fully as the course evolves. But for the moment, we continue to study this 2 dimensional LP to

see what else can be revealed about the structure of this problem.

Before leaving this section, we make a ﬁnal comment on the modeling process described above. We

emphasize that there is not one and only one way to model the Cup Factory problem, or any problem

for that matter. In particular, there are many ways to choose the decision variables for this problem.

Clearly, it is suﬃcient for the shop manager to know how many hours each days should be devoted

to the manufacture of beer mugs and how many hours to champaign glasses. From this information

everything else can be determined. For example, the number of cases of beer mugs that get produced is

15 times the number of hours devoted to the production of beer mugs. Therefore, as you can see there

are many ways to model a given problem. But in the end, they should all point to the same optimal

process.

6.1.3 LPs in Standard Form

Recall that a linear program is a problem of maximization or minimization of a linear function subject

to a ﬁnite number of linear inequality and equality constraints. This general deﬁnition leads to an

enormous variety of possible formulations. In this section we propose one ﬁxed formulation for the

purposes of developing an algorithmic solution procedure. We then show that every LP can be recast

in this form. We say that an LP is in standard form if it has the form

P : maximize c

1

x

1

+c

2

x

2

+· · · +c

n

x

n

subject to a

i1

x

1

+a

i2

x

2

+· · · +a

in

x

n

≤ b

i

for i = 1, 2, . . . , m

0 ≤ x

j

for j = 1, 2, . . . , n .

Using matrix notation, we can rewrite this LP as

P : maximize c

T

x

subject to Ax ≤ b

0 ≤ x ,

where the inequalities Ax ≤ b and 0 ≤ x are to be interpreted componentwise.

Transformation to Standard Form

Every LP can be transformed to an LP in standard form. This process usually requires a transformation

of variables and occasionally the addition of new variables. In this section we provide a step-by-step

procedure for transforming any LP to one in standard form.

minimization → maximization

To transform a minimization problem to a maximization problem just multiply the objective

function by −1.

linear inequalities

If an LP has an inequality constraint of the form

a

i1

x

1

+a

i2

x

2

+· · · +a

in

x

n

≥ b

i

.

This inequality can be transformed to one in standard form by multiplying the inequality through

by −1 to get

−a

i1

x

1

−a

i2

x

2

−· · · −a

in

x

n

≤ −b

i

.

linear equation

78 Linear Programming

The linear equation

a

i1

x

1

+· · · +a

in

x

n

= b

i

can be written as two linear inequalities

a

i1

x

1

+· · · +a

in

x

n

≤ b

i

and

a

i1

x

1

+· · · +a

in

x

n

≥ b

i

.

The second of these inequalities can be transformed to standard form by multiplying through by

−1.

variables with lower bounds

If a variable x

i

has lower bound l

i

which is not zero (l

i

≤ x

i

), one obtains a non-negative variable

w

i

with the substitution

x

i

= w

i

+l

i

.

In this case, the bound l

i

≤ x

i

is equivalent to the bound 0 ≤ w

i

.

variables with upper bounds

If a variable x

i

has an upper bound u

i

(x

i

≤ u

i

) one obtains a non-negative variable w

i

with the

substitution

x

i

= u

i

−w

i

.

In this case, the bound x

i

≤ u

i

is equivalent to the bound 0 ≤ w

i

.

variables with interval bounds

An interval bound of the form l

i

≤ x

i

≤ u

i

can be transformed into one non-negativity constraint

and one linear inequality constraint in standard form by making the substitution

x

i

= w

i

+l

i

.

In this case, the bounds l

i

≤ x

i

≤ u

i

are equivalent to the constraints

0 ≤ w

i

and w

i

≤ u

i

−l

i

.

free variables

Sometimes a variable is given without any bounds. Such variables are called free variables. To

obtain standard form every free variable must be replaced by the diﬀerence of two non-negative

variables. That is, if x

i

is free, then we get

x

i

= u

i

−v

i

with 0 ≤ u

i

and 0 ≤ v

i

.

To illustrate the ideas given above, we put the following LP into standard form.

minimize 3x

1

− x

2

subject to −x

1

+ 6x

2

− x

3

+ x

4

≥ −3

7x

2

+ x

4

= 5

x

3

+ x

4

≤ 2

−1 ≤ x

2

, x

3

≤ 5, −2 ≤ x

4

≤ 2.

6.1 Introduction 79

First, we rewrite the objective as

maximize −3x

1

+x

2

.

Next we replace the ﬁrst inequality constraint by the constraint

x

1

−6x

2

+x

3

−x

4

≤ 3.

The equality constraint is replaced by the two inequality constraints

7x

2

+ x

4

≤ 5

−7x

2

− x

4

≤ −5.

Observe that the variable x

1

is free, so we replace it by

x

1

= y

1

−y

2

with 0 ≤ y

1

, 0 ≤ y

2

.

The variable x

2

has a non-zero lower bound so we replace it by

x

2

= y

3

−1 with 0 ≤ y

3

.

The variable x

3

is bounded above, so we replace it by

x

3

= 5 −y

4

with 0 ≤ y

4

.

The variable x

4

is bounded below and above so we replace it by

x

4

= y

5

−2 with 0 ≤ y

5

and y

5

≤ 4.

Putting it all together we get the following LP in standard form:

maximize −3y

1

+ 3y

2

+ y

3

subject to y

1

− y

2

− 6y

3

− y

4

− y

5

≤ −10

7y

3

+ y

5

≤ 14

− 7y

3

− y

5

≤ −14

− y

4

+ y

5

≤ −1

y

5

≤ 4

0 ≤ y

1

, y

2

, y

3

, y

4

, y

5

.

80 Linear Programming

6.2 A pig farming model

Suppose a pig farmer can choose between two diﬀerent kinds of feed for his pigs: corn or alfalfa. He can

feed them all of one or the other or some mixture. The pigs must get at least 200 units of carbohydrates

and at least 180 units of protein in their daily diet. The farmer knows that:

• corn has 100 units/kg of carbohydrate and 30 units/kg of protein

• alfalfa has 40 units/kg of carbohydrate and 60 units/kg of protein.

Here are a few questions the farmer might be faced with in choosing a good diet for the pigs:

1. If the farmer uses only corn, how many kg/day does he need for each pig?

2. If the farmer uses only alfalfa, how many kg/day does he need for each pig?

3. Suppose corn and alfalfa each cost 30 cents / kg. How expensive would each of the above diets

be?

4. With these prices, is there some mixture of the two grains that would satisfy the dietary require-

ments and be cheaper?

5. What is the best mix if the price of alfalfa goes up to $1/kg? What if it goes up only to 60 cents

/ kg?

The problem of choosing the mix which minimizes cost can be set up as a linear programming

problem

minimize c

1

x

1

+c

2

x

2

subject to

100x

1

+ 40x

2

≥ 200

30x

1

+ 60x

2

≥ 180

x

1

, x

2

≥ 0.

(6.1)

where x

1

and x

2

are the quantity of corn and alfalfa to use (in kilograms) and c

1

and c

2

are the cost

per kilogram of each. Multiplying the objective function and constraints by −1 would allow us to put

this in the standard form discussed above.

Linear programming problems can be solved in matlab, which requires a slightly diﬀerent standard

form, since it solves a minimization problem rather than a maximization problem:

>> help linprog

LINPROG Linear programming.

X=LINPROG(f,A,b) solves the linear programming problem:

min f’*x subject to: A*x <= b

x

X=LINPROG(f,A,b,Aeq,beq) solves the problem above while additionally

satisfying the equality constraints Aeq*x = beq.

X=LINPROG(f,A,b,Aeq,beq,LB,UB) defines a set of lower and upper

bounds on the design variables, X, so that the solution is in

the range LB <= X <= UB. Use empty matrices for LB and UB

if no bounds exist. Set LB(i) = -Inf if X(i) is unbounded below;

6.2 A pig farming model 81

set UB(i) = Inf if X(i) is unbounded above.

X=LINPROG(f,A,b,Aeq,beq,LB,UB,X0) sets the starting point to X0. This

option is only available with the active-set algorithm. The default

interior point algorithm will ignore any non-empty starting point.

X=LINPROG(f,A,b,Aeq,Beq,LB,UB,X0,OPTIONS) minimizes with the default

optimization parameters replaced by values in the structure OPTIONS, an

argument created with the OPTIMSET function. See OPTIMSET for details.

Use options are Display, Diagnostics, TolFun, LargeScale, MaxIter.

Currently, only ’final’ and ’off’ are valid values for the parameter

Display when LargeScale is ’off’ (’iter’ is valid when LargeScale is ’on’).

[X,FVAL]=LINPROG(f,A,b) returns the value of the objective function at X:

FVAL = f’*X.

[X,FVAL,EXITFLAG] = LINPROG(f,A,b) returns EXITFLAG that

describes the exit condition of LINPROG.

If EXITFLAG is:

> 0 then LINPROG converged with a solution X.

0 then LINPROG reached the maximum number of iterations without converging.

< 0 then the problem was infeasible or LINPROG failed.

[X,FVAL,EXITFLAG,OUTPUT] = LINPROG(f,A,b) returns a structure

OUTPUT with the number of iterations taken in OUTPUT.iterations, the type

of algorithm used in OUTPUT.algorithm, the number of conjugate gradient

iterations (if used) in OUTPUT.cgiterations.

[X,FVAL,EXITFLAG,OUTPUT,LAMBDA]=LINPROG(f,A,b) returns the set of

Lagrangian multipliers LAMBDA, at the solution: LAMBDA.ineqlin for the

linear inequalities A, LAMBDA.eqlin for the linear equalities Aeq,

LAMBDA.lower for LB, and LAMBDA.upper for UB.

NOTE: the LargeScale (the default) version of LINPROG uses a primal-dual

method. Both the primal problem and the dual problem must be feasible

for convergence. Infeasibility messages of either the primal or dual,

or both, are given as appropriate. The primal problem in standard

form is

min f’*x such that A*x = b, x >= 0.

The dual problem is

max b’*y such that A’*y + s = f, s >= 0.

82 Linear Programming

To solve the optimization problem with costs c

1

= c

2

= 30, we can do the following:

>> f = [30;30];

>> A = -[100 40; 30 60]

>> b = -[200;180];

>> vlb = [0;0];

>> linprog(f,A,b,[],[],vlb)

Optimization terminated successfully.

ans =

1.0000

2.5000

This solution is illustrated graphically in Figure 6.2(a). If the cost of alfalfa goes up to $1.00/kg:

>> f=[30;100];

>> linprog(f,A,b,[],[],vlb)

Optimization terminated successfully.

ans =

6.0000

0

This solution is illustrated graphically in Figure 6.2(b). In class we will study the graphical interpreta-

tion of these and other related problems.

6.2.1 Adding more constraints

Suppose we add some new constraints to the problem. For example, the farmer might want to keep

the pigs from getting too fat by requiring that the total daily diet should be at most 5 kg, adding the

constraint x

1

+ x

2

≤ 5. In addition, suppose pigs should eat at most 3 kg of any particular grain, so

that x

1

≤ 3 and x

2

≤ 3. This problem can be solved as follows:

>> A = -[100 40; 30 60; -1 -1];

>> b = -[200; 180; -5];

>> vlb = [0; 0];

>> vub = [3; 3];

>> linprog(f,A,b,[],[],vlb,vub)

Optimization terminated successfully.

ans =

3.0000

1.5000

This solution is illustrated graphically in Figure 6.3.

6.2.2 Solving the problem by enumeration

Each constraint corresponds to a straight line, so the feasible set is a convex polygon. The objective

function we are minimizing is also linear, so the solution is at one of the corners of the feasible set.

Corners correspond to points where two constraints are satisﬁed simultaneously, and can be found by

solving a linear system of two equations (from these two constraints) in two unknowns x

1

and x

2

.

If we knew which two constraints were binding at the solution, we could easily solve the problem.

Unfortunately we don’t generally know this in advance.

6.2 A pig farming model 83

(a)

−1 0 1 2 3 4 5 6 7 8

−1

0

1

2

3

4

5

6

7

8

(b)

−1 0 1 2 3 4 5 6 7 8

−1

0

1

2

3

4

5

6

7

8

Figure 6.2: Graphical solution of the pig-farming problems in the x

1

-x

2

plane. In each case the solid

lines show constraints and the dashed line is the iso-cost line passing through the optimal solution,

which is one of the corners of the feasible region. Shade in the feasible region in each ﬁgure!

−1 0 1 2 3 4 5 6 7 8

−1

0

1

2

3

4

5

6

7

8

Figure 6.3: Graphical solution of the pig-farming problem with more constraints. The solid lines show

constraints and the dashed line is the iso-cost line passing through the optimal solution, which is one

of the corners of the feasible region. Shade in the feasible region!

84 Linear Programming

One approach to solving the problem would be to consider all possible corners by solving all possible

sets of 2 equations selected from the various constraints. For each choice of 2 equations we could:

• Solve the linear system for these two constraints,

• Check to see if the resulting solution is feasible (it might violate other constraints!),

• If it is feasible, compute and record the value of the objective function at this point.

After doing this for all choices of two constraints, the best value of the objective function found gives

the solution.

The problem with this, for larger problems anyway, is that there is a combinatorial explosion of

possible choices. If there are m constraints and two decision variables, then there are

_

m

2

_

=

1

2

m(m−1) = O(m

2

) possible choices.

6.2.3 Adding a third decision variable

If there are more decision variables, then the problem is even worse. For example, if we consider 3

grains instead of only 2, then we would have three variables x

1

, x

2

, and x

3

to consider. In this case the

pictures are harder to draw. Each constraint corresponds to a plane in 3-space and the feasible set is a

convex body with planar sides. Changing the value of the objective function sweeps out another set of

planes, and again the optimal solution occurs at a corner of the feasible set.

These corners are points where 3 of the constraints are simultaneously satisﬁed, so we can ﬁnd all

possible corners by considering all choices of 3 constraints from the set of m constraints. There are now

_

m

3

_

= O(m

3

) ways to do this.

6.2.4 The simplex algorithm

Practical linear programming problems often require solving for hundreds of variables with thousands

of constraints. In this case it is impossible to enumerate and check all possible “corners” of the feasible

set, which is now an object in a linear space with dimension in the hundreds.

The simplex algorithm is a method that has proved exceedingly eﬀective at ﬁnding solutions in much

less time (though, like branch and bound algorithms, in the worst case it can still have exponential

running time). The basic idea is to start by identifying one corner of the feasible region and then

systematically move to “adjacent” corners until the optimal value of the objective function is obtained.

To gain a basic understanding of the simplex method, consider the following LP

max 2x

1

−3x

2

subject to 2x

1

+x

2

≤ 12

x

1

−3x

2

≤ 5

0 ≤ x

1

, x

2

(6.2)

The corresponding feasible region is depicted in Figure 6.4.

The simplex method uses a series of elementary matrix operations. Therefore, the system of lin-

ear equalities in LP (6.2) must be reformulated into a system of linear equalities constraints. The

introduction of slack variables facilitates such a conversion of the constraints.

For instance, the inequality, 2x

1

+x

2

≤ 12, becomes 2x

1

+x

2

+x

3

= 12 with the addition of the slack

variable x

3

≥ 0. Loosely speaking, x

3

makes up the slack in the inequality constraint 2x

1

+ x

2

≤ 12.

Take a moment and verify that the equality and inequality constraints are equivalent. Such veriﬁcation

is an important part of mathematical modeling.

6.2 A pig farming model 85

0 1 2 3 4 5 6 7

0

2

4

6

8

10

12

x1

x

2

2x1 + x2 ≤ 12

x1 − 3x2 ≤ 5

Figure 6.4: Feasible region for LP.

The constraint x

1

−3x

2

≤ 5 still remains to be converted into a linear equality with a slack variable.

In an attempt to reduce decision variables, one may be tempted to use x

3

again as a slack variable,

which would form the linear equality x

1

− 3x

2

+ x

3

≤ 5. Note that our linear system (6.2) does not

imply that 2x

1

+x

2

−12 = x

1

−3x

2

−5, or loosely speaking that there are equal amounts of “slack” in

each constraint. Forcing the same slack variable in each constraint changes the problem we are solving!

Take a moment and consider the implications of using x

3

for each constraint in (6.2)!

Therefore, another slack variable x

4

≥ 0 is introduced into the second constraint, which becomes

x

1

−3x

2

+x

4

= 5. In general, each inequality constraint should use a unique slack variable.

Therefore, LP (6.2) becomes

max z = 2x

1

−3x

2

subject to 2x

1

+x

2

+x

3

= 12

x

1

−3x

2

+x

4

= 5

0 ≤ x

i

, i = 1, 2, 3, 4

(6.3)

Notice, z equals the optimum of the objective function.

Having obtained a system of linear equality constraints, our system is prepared for matrix compu-

tation. Traditionally, the simplex method forms tableaux, which are equivalent to matrices. The initial

simplex tableau for LP (6.3) is

z x

1

x

2

x

3

x

4

RHS

1 -2 3 0 0 0 ←− (z −2x

1

+ 3x

3

= 0)

0 2 1 1 0 12 ←− (2x

1

+x

2

+x

3

= 12)

0 1 -3 0 1 5 ←− (x

1

−3x

2

+x

4

= 5)

Solutions can be read directly from the tableau. Since the column associated with variable x

1

contains more than 1 nonzero value, x

1

= 0. In contrast, the column associated with x

3

contains only

one nonzero value, which implies that x

3

= 12/1 = 12. Therefore, this tableau leads to the solution

86 Linear Programming

x

1

= x

2

= 0, x

3

= 12, x

4

= 5 and z = 0. Considering more details of the method requires that we

establish additional terminology regarding a simplex tableau.

z x

1

x

2

x

3

x

4

RHS

1 -2 3 0 0 0 ←− indicators

0 2 1 1 0 12

0 1 -3 0 1 5

Basic variables are nonzero variables in the tableau and correspond to a corner point of the feasible

region. In the current tableau, x

3

and x

4

are basic variables associated with the corner point (0,0) (since

x

1

= 0 and x

2

= 0). The basic idea of the simplex algorithm is to change which variables in the system

are basic variables in order to improve the value of z. The numbers in shaded boxes in the top row of

the tableau are called indicators. Let α

i

denote the indicator in the column associated with variable

x

i

. Therefore, the algorithmic steps for the simplex algorithm as applied to the simplex tableau are as

follows:

1. Check the indicators for possible improvement – If all the indicators are nonnegative, then the

solution is optimal else go to Step 2.

2. Check for unboundedness – If α

i

< 0 and no positive element in the column associated with variable

x

i

exists, then the problem has an unbounded objective value. Otherwise, ﬁnite improvement in

the objective function is possible; go to Step 3.

3. Select the entering variable – The most negative indicator identiﬁes the entering variable. That

is, select α

i

such that α

i

< 0, and α

i

is the most negative indicator. The entering variable is x

i

and will be made nonzero. The column associated with x

i

is called the pivot column.

4. Select the departing variable – This step determines which basic variable will become zero. Find

a quotient for each row by dividing each number from the right side of the tableau by the corre-

sponding number from the pivot column. The pivot row is the row corresponding to the smallest

quotient. The pivot element is the element in the pivot row and pivot column.

5. Pivot creating a new tableau - Use elementary row operations on the old tableau so the column

associated with x

i

contains a zero in every entry except for a 1 in the pivot position. Go to Step

1.

Applying these steps to our tableau, we ﬁnd the following:

z x

1

x

2

x

3

x

4

RHS

1 -2 3 0 0 0 Quotients

0 2 1 1 0 12 12/2 = 6

0 1 -3 0 1 5 5/1 = 1 ⇒ pivot row

↑

pivot column

Verify these results. Note that x

1

is the entering variable and x

4

, the departing variable. For x

1

to

become nonzero, we apply elementary row calculations to form the following tableau:

z x

1

x

2

x

3

x

4

RHS

1 0 -3 0 2 10

0 0 7 1 -2 2

0 1 -3 0 1 5

6.3 Complexity and the simplex algorithm 87

This tableau leads to the solution x

1

= 5, x

2

= 0, x

3

= 2, x

4

= 0 and z = 10. The basic variables are

indeed x

1

and x

3

. Refer to Figure 6.4 to verify that our solution is a corner point of the feasible solution.

Moreover, the current tableau represents an increase in the optimal value! Yet, the indicators identify

that possible improvement still exists by selecting x

2

as an entering variable. The step of selecting the

departing variable is left to the reader. Elementary row operations produce the following tableau:

z x

1

x

2

x

3

x

4

RHS

1 0 0 3/7 8/7 76/7

0 0 1 1/7 -2/7 2/7

0 1 0 3/7 1/7 41/7

This tableau leads to the solution x

1

=

41

7

, x

2

=

2

7

, x

3

= x

4

= 0 and z =

76

7

. The indicators are all

positive, which represents that the optimal value z cannot be improved. Therefore, the solution has

been found.

Implementation of the simplex algorithm requires a great deal of linear algebra and bookkeeping.

Imagine performing elementary row operations on a simplex tableau with thousands of decision vari-

ables. Fortunately, software packages exist that are aimed speciﬁcally at solving linear programming

and related problems. One popular package is lindo, which is available on some campus computers.

The matlab function linprog implements the simplex algorithm. Program packages such as lindo

and matlab demonstrate the need for eﬀective computation in the ﬁeld of optimization.

6.3 Complexity and the simplex algorithm

The simplex algorithm was invented by George Dantzig in 1947 [Dan63]. Klee (from UW) and Minty

[KM72] proved that the simplex algorithm is exponential in the worst case, but is very eﬃcient in prac-

tice. Khachian [Kha79] invented the “ellipsoid” algorithm for ﬁnding solutions to linear programming

problems in 1979 that he showed runs in polynomial time. Another more eﬃcient algorithm due to

Karmarkar[Kar84] uses an interior-point method instead of walking around the boundary. This algo-

rithm appeared in 1984. In 2001, Spielman and Teng [ST02] introduced a new complexity class they

call polynomial smoothed complexity and they showed that the simplex algorithm is the classic example

in this class. Their work attempts to identify algorithms which are exponential in the worst case but

which work eﬃciently in practice like the simplex algorithm.

88 Linear Programming

Math 381 Notes — University of Washington —Winter 2011

Chapter 7

Integer Programming

7.1 The lifeboat problem

Many linear programming problems arising from real problems have the additional complication that

the desired solution must have integer values. For example, in the lifeboat problem considered in

Chapter 1, we must determine how best to equip a ferry with lifeboats and life vests, maximizing the

number of lifeboats subject to certain constraints. The solution to the resulting linear programming

problem will typically require some fractional number of lifeboats and life vests, which is not possible.

Instead we must ﬁnd the best integer solution.

Our ﬁrst guess at how to solve such a problem might be to simply round oﬀ the values that come

from the LP solution. This usually does not work, however. For example, for the values considered in

Chapter 1, the LP solution was

x

1

= 363.6364 (number of vests)

x

2

= 31.8182 (number of boats)

Trying to round this to x

1

= 364, x

2

= 32 would give an infeasible solution (requiring too much volume).

The best integer solution was found to be x

1

= 381, x

2

= 31. We had to reduce x

2

to 31 and then the

number of vests x

1

went up substantially.

Figure 7.1 shows the graphical solution to this problem with a diﬀerent set of parameters:

C

1

= 1 (capacity of each vest)

C

2

= 2 (capacity of each boat)

C = 11 (total capacity needed)

V

1

= 1 (volume of each vest)

V

2

= 3.1 (volume of each boat)

V = 15 (total volume available)

The dashed line x

2

= 3.636 shows the maximum value of the objective function x

2

obtained at the

LP solution x

1

= 3.727, x

2

= 3.636. The feasible integer points are indicated by the dark dots, and

we see that the best integer solution is x

1

= 8, x

2

= 2. In this case we can’t even use the approach of

Chapter 1, of reducing the x

2

to the nearest integer and increasing x

1

as needed.

For most IP (integer programming) problems, there is no direct way to ﬁnd the solution. Recall that

for the LP problem we know the optimum occurs at a corner of the feasible set, and the corners can be

found by solving an appropriate linear system of equations corresponding to the binding constraints.

For the IP problem typically none of the inequality constraints are exactly binding at the solution.

In simple cases one might be able to enumerate all integer points in the feasible set and check each,

but for problems with many variables and with a large range of possible integer values for each, this is

usually out of the question.

89

90 Integer Programming

0 2 4 6 8 10 12 14 16 18 20

−1

0

1

2

3

4

5

6

7

8

Figure 7.1: Graphical solution of the lifeboat problem in the x

1

-x

2

plane. The solid lines show con-

straints and the dashed line is the objective function contour line passing through the optimal solution,

which is one of the corners of the feasible region. When integer constraints are added, only the points

marked by dots are feasible.

7.2 Cargo plane loading 91

Some IP problems have a suﬃciently special form that the solution is easily found. For example,

certain transportation problems, such as in the example we considered of canneries and warehouses, have

the property that the LP solution can be shown to be integer-valued. For other classes of IP problems

there are special techniques which are eﬀective.

For many problems a branch and bound approach must be used, in which linear programming

problems are used to obtain bounds. This is illustrated in the next section for a relatively simple class

of problems called zero-one integer programming, since each decision variable is restricted to take only

the values 0 or 1. Many practical problems can be put in this form. Often each value represents a

yes-no decision that must be made.

7.2 Cargo plane loading

Suppose you have a cargo plane with 1000 cubic meters of space available, and you can choose from

the following sets of cargos to load the plane:

Cargo Volume Proﬁt

1 410 200

2 600 60

3 220 20

4 450 40

5 330 30

For each cargo the volume and proﬁt expected from shipping this cargo are listed. There are two

possible scenarios:

• We are allowed to take any amount of each cargo up to the total volume, with the proﬁt adjusted

accordingly. (E.g., taking half the cargo yields half the proﬁt.)

• We must take all of the cargo or none of it.

The second scenario is more realistic and gives a zero-one integer programming problem. The ﬁrst

scenario gives a linear programming problem with continuous variables, and is easier to solve, using the

simplex method for example. It will be useful to consider problems of the ﬁrst type in the process of

solving the integer programming problem, as discussed in the next section.

Let V

j

be the volume of the j’th cargo, P

j

the proﬁt, and V = 1000 the total volume available in

the cargo plane. Introduce the decision variable x

j

to represent the fraction of the j’th cargo that we

decide to take. In the ﬁrst scenario above, x

j

can take any value between 0 and 1, while for the actual

problem each x

j

must be either 0 or 1.

The integer programming problem then has the form

maximize

j

P

j

x

j

(7.1)

subject to

j

V

j

x

j

≤ V (7.2)

0 ≤ x

j

≤ 1 (7.3)

x

j

integer (7.4)

If we drop constraint (7.4) then we have the linear programming problem corresponding to the ﬁrst

scenario again.

For this small problem we can easily solve the integer programming problem by simply enumerating

all possible choices of cargos. If we have k cargos to choose from, then for each cargo the value x

j

can

take one of two values and so there are 2

k

possible combinations of cargo. In this case k = 5 and there

92 Integer Programming

are 32 possibilities to consider. Not all will be feasible since the volume constraint may be exceeded.

Here is the enumeration of all 32 possibilities, with the proﬁt arising from each (which is set to zero for

infeasible combinations):

cargo choice profit

1 2 3 4 5

0 0 0 0 0 0

0 0 0 0 1 30

0 0 0 1 0 40

0 0 0 1 1 70

0 0 1 0 0 20

0 0 1 0 1 50

0 0 1 1 0 60

0 0 1 1 1 90

0 1 0 0 0 60

0 1 0 0 1 90

0 1 0 1 0 0

0 1 0 1 1 0

0 1 1 0 0 80

0 1 1 0 1 0

0 1 1 1 0 0

0 1 1 1 1 0

1 0 0 0 0 200

1 0 0 0 1 230

1 0 0 1 0 240

1 0 0 1 1 0

1 0 1 0 0 220

1 0 1 0 1 250

1 0 1 1 0 0

1 0 1 1 1 0

1 1 0 0 0 0

1 1 0 0 1 0

1 1 0 1 0 0

1 1 0 1 1 0

1 1 1 0 0 0

1 1 1 0 1 0

1 1 1 1 0 0

1 1 1 1 1 0

The best combination is 10101, meaning we should take Cargos 1, 3, and 5. For a larger problem it

might not be possible to enumerate and check all possibilities. Moreover the real problem might be

more complicated, for example a shipping company might have many planes and diﬀerent sets of cargos

and want to choose the optimal way to split them up between planes.

7.3 Branch and bound algorithm

Integer programming problems are often solved using branch and bound algorithms. This is easily

illustrated for the cargo plane problem since this is a particularly simple form of integer programming,

a 0-1 integer program in which each decision variable can take only the values 0 or 1. This suggests

building up a binary tree by considering each cargo in turn. Part of such a tree is shown in Figure 7.2.

7.3 Branch and bound algorithm 93

cargo: ?????

UB: 259

cargo: 1????

UB: 259

cargo: 0????

UB: 96.4

cargo: 11???

UB 0

cargo: 10???

UB 253.6

cargo: 101??

UB 253.6

cargo: 100??

UB 253.1

cargo: 1011?

UB 0

cargo: 1010?

UB 250

cargo: 10101

profit 250

cargo: 10100

profit 220

cargo: 1001?

UB 252.7

cargo: 10011

profit 0

cargo: 10010

profit 240

cargo: 1000?

UB 230

Figure 7.2: Illustration of the branch and bound algorithm for solving the cargo-loading problem. In

each box the values of some 0-1 integer values are speciﬁed and then the linear programming problem is

solved with the other variables allowed to take any value between 0 and 1. This gives an upper bound

UB on the best that could be obtained with this partial cargo. If UB is 0 then this choice is infeasible.

The root is to the left, where the cargo is represented as ????? meaning all ﬁve decision variables are

unspeciﬁed. From here we branch to 1???? and 0????, in which we either take or do not take Cargo

1. Each of these has two branches depending on whether we take or do not take Cargo 2. Continuing

in this way we could build up a full tree whose leaves would be the 32 possible choices of cargos.

The goal of the branch and bound algorithm is to avoid having to build up the entire tree by

obtaining a bound at each branching point for the best possible solution we could ﬁnd down that branch

of the tree. In this case we want an upper bound on the best proﬁt we could possibly ﬁnd with any

selection that contains the pattern of zeroes and ones already speciﬁed, along with any possible choice

of zeroes and ones for the cargos which are still denoted by ?. It is exactly these selections which lie

further down the tree on this branch.

We want to obtain this bound by doing something simpler than enumerating all possibilities down

this branch and computing each proﬁt — this is what we want to avoid. The essential idea is to instead

solve a linear programming problem in which the variables still denoted by ? are allowed to take any

value between 0 and 1. This is more easily solved than the integer programming problem. The solution

to this problem may involve non-integer values of some x

j

, but whatever proﬁt it determines (perhaps

with these partial cargos) is an upper bound on the proﬁt that could be obtained with the integer

constraints imposed. Make sure you understand why!

For example, at the root node ?????, we obtain an upper bound on the proﬁt of any selection

of cargos by solving the linear programming problem (7.1) with the constraints (7.2) and (7.3) only,

dropping all integer constraints. The solution is x

1

= 1, x

2

= 0.9833 and x

3

= x

4

= x

5

= 0, with a

proﬁt of 259. This isn’t a valid solution to the integer problem, but any valid selection must have a

proﬁt that is no larger than this.

At the node 1???? we solve the linear programming problem (7.1) with the constraints (7.2) and

(7.3) and also the constraint x

1

= 1, by changing the bounds on x

1

to 1 ≤ x

1

≤ 1. (Or simply reduce

the number of decision variables and modify the objective function appropriately.) The solution is the

same as found at the root node, since the solution there had x

1

= 1.

At the node 0???? we solve the linear programming problem (7.1) with the constraints (7.2) and

(7.3) and also the constraint x

1

= 0. This gives an upper bound on the proﬁt of only 96.36 if Cargo 1

is not taken. Again this requires taking fractional cargos and any integer selection with x

1

= 0 will be

even worse.

94 Integer Programming

It looks more promising to take Cargo 1 than to not take it, and so we explore down this branch

ﬁrst. We next try 11???, taking Cargo 1 and Cargo 2 and solving a linear programming problem for

the other variables, but we ﬁnd this is infeasible since Cargo 1 and 2 together require too much volume.

At this point we can prune oﬀ this branch of the tree since every selection which lies down path must

be infeasible.

Continuing this way, we eventually work down to a leaf and ﬁnd an actual integer selection with an

associated proﬁt. From this point on we can also prune oﬀ any branches for which the upper bound on

the proﬁt is smaller than the actual proﬁt which can be obtained by a valid selection. So, for example,

all of the tree lying beyond the node 0???? can be pruned oﬀ without further work since the upper

bound of 96.36 is smaller than leaf values we have already found.

7.4 Traveling salesman problem

We now return to the Traveling Salesman Problem (TSP) of Chapter 2, and see how this can be set up

as an integer programming problem. Consider a graph with N vertices labeled 1 through N, and let d

ij

be the “distance” from i to j. Recall that we wish to ﬁnd the path through all nodes which minimizes

the total distance traveled. To set this up as an IP problem, let x

ij

be 1 if the edge from Node i to

Node j is used and 0 otherwise. For example, with N = 5 the path 1 →2 → 3 →5 →4 →1 would be

encoded by setting

x

12

= x

23

= x

35

= x

54

= x

41

= 1

and the other 15 x

ij

values to 0. (For a graph with N nodes there will be N(N −1) decision variables.)

Now we wish to

minimize

N

i=1

N

j=1

d

ij

x

ij

(7.5)

subject to various constraints. One constraint is that we must use exactly one edge out of each node.

This leads to

N

j=1

x

ij

= 1. (7.6)

for each i = 1, 2, . . . , N. Also, we must use exactly one edge into each node, so

N

i=1

x

ij

= 1. (7.7)

for each j = 1, 2, . . . , N.

This is not enough, for the constraints would allow things like

x

12

= x

23

= x

31

= 1 and x

45

= x

54

= 1 (7.8)

with all the other x

ij

= 0. This is not a single TSP tour, but rather two disconnected tours. To

rule out the possibility of subtours, we must add other constraints. This can be done in various ways.

One possibility is to introduce a new set of decision variables u

1

, . . . , u

N

which do not appear in the

objective function but which have the (N −1)

2

constraints

u

i

−u

j

+Nx

ij

≤ N −1, for i, j = 2, 3, . . . , N. (7.9)

Note that u

1

does not appear in these constraints.

The idea is that if we have a set of x

ij

which correspond to a valid TSP tour, then it should be

possible to choose some values u

i

so that these constraints are satisﬁed, so that all such tours are still

feasible. On the other hand for a set of x

ij

that contain subtours, such as (7.8), it should be impossible

to ﬁnd a set of u

i

satisfying (7.9) and hence such sets will no longer be feasible.

7.5 IP in Recreational Mathematics 95

To see why it is impossible to satisfy these constraints if there are subtours, consider (7.8), for

example. The second subtour deﬁned by x

45

= x

54

= 1 does not involve Node 1. The constraints (7.9)

for i = 4, j = 5 and for i = 5, j = 4 yield

u

4

−u

5

+ 5 ≤ 4

u

5

−u

4

+ 5 ≤ 4

Adding these together gives

10 ≤ 8

which cannot be satisﬁed no matter what values we assign to u

4

and u

5

. So we cannot possibly

satisfy both of the constraints above simultaneously. A similar argument works more generally: If

there are subtours then there is always a subtour that doesn’t involve Node 1 and writing down the

constraints (7.9) for each (i, j) in this subtour gives a set of equations that, when added together, leads

to cancellation of all the u

i

values and an inequality that can never be satisﬁed.

On the other hand, we wish to show that if we consider a valid TSP tour (with no subtours) then we

can ﬁnd an assignment of u

i

values that satisﬁes all these constraints. The basic idea is to set u

1

= 0

and then follow the tour starting at Node 1, setting the u

i

variables according to the order in which we

visit the nodes. This is best illustrated with an example:

1 →2 →5 → 4 →3 →1

yields

u

1

= 0, u

2

= 1, u

5

= 2, u

4

= 3, u

3

= 4.

We can show that this satisﬁes all the constraints in (7.9).

First, consider the pairs (i, j) for which x

ij

= 0. In this case (7.9) reads

u

i

−u

j

≤ N −1

This will certainly be true since all the values we assigned to u variables lie between 0 and N −1.

Next, consider the pairs (i, j) for which x

ij

= 1. This means we use the edge from Node i to j in

our tour, Node j is visited immediately after Node i, and hence u

j

= u

i

+ 1. So we can compute that

u

i

−u

j

+Nx

ij

= −1 +N = N −1

and hence the inequality in (7.9) is satisﬁed (with equality in this case).

7.4.1 NP-completeness of integer programming

We just showed that it is possible to reformulate any traveling salesman problem as an integer program-

ming problem. This can be used to prove that the general integer programming problem is NP-complete.

For if there were a polynomial-time algorithm to solve the general integer programming problem, then

we could use it to solve the TSP, and hence have a polynomial-time algorithm for the TSP.

7.5 IP in Recreational Mathematics

Integer Programming also applies to problems in recreational math such as the challenging brain teasers

found in puzzle books available at bookstores or on the World Wide Web at sites such as rec.puzzles.

Whether tinkering with such problems in your mind or applying IP, these problems oﬀer fun challenges.

96 Integer Programming

castle

x x x x

x

x

x x x x

x

x

castle

x

1

x

2

x

3

x

4

x

5

x

6

x

7

x

8

T

North

(a) (b)

Figure 7.3: The well-guarded castle puzzle involves 12 guards posted around a square castle. (a) Each

guard is depicted with a solid dot. Note that 1 soldier is posted at each corner of the castle and 2

soldiers on each wall. (b) The associated IP can be formulated with 8 decision variables.

7.5.1 An Example Puzzle

To illustrate IP’s in recreational math, we begin with a simple brain teaser based on a puzzle from the

book, The Moscow Puzzles, by B. A. Kordemsky. [Kor92]

The Well-guarded Castle

There is unrest in the land, and King Husky declares that the castle be guarded on every

side. Guards will be posted along a wall or at a corner of the square castle. When posted

on a corner, a soldier can watch 2 walls. Other soldiers are posted along the face of a wall

and watch only 1 side of the castle. The king orders 12 of his bravest soldiers to guard

the stone fortress and initially plans to post 2 guards on each wall and 1 soldier on each

corner as depicted in Figure 7.5.1 (a). Aid King Husky in securing the kingdom by ﬁnding

an arrangement of the soldiers such that there are 5 soldiers watching each wall.

Since the problem serves as a brain teaser, you may be able to derive an answer by inspection or

trial and error. Imagine a much larger problem with thousands of soldiers. In such cases, the process

needed to ﬁnd a solution could easily move beyond the scope of mental calculation. Therefore, this

puzzle serves as an illustrative example of formulating such problems as integer programming problems.

7.5.2 Formulation of the IP

The solution process begins by determining the decision variables. As in linear programming, the best

way to determine these variables is to put oneself in the shoes of the decision maker and ask the question

“What do I need to know in order to fully determine the solution?” In the case of the well-guarded

castle, we must know the number of soldiers positioned along each wall and at each corner. Hence,

we deﬁne 8 decision variables x

i

for i = 1, 2, . . . , 8, where x

i

equals the number of soldiers posted at

position i. The positions are numbered counterclockwise beginning at the southwest corner as seen in

Figure 7.5.1 (b).

Recall, the king declared that 5 soldiers watch each wall. Again, a soldier on a corner can watch 2

walls while a soldier along a wall watches only 1. This leads to the following constraints:

southern wall constraint: x

1

+x

2

+x

3

= 5

eastern wall constraint: x

3

+x

4

+x

5

= 5

northern wall constraint: x

5

+x

6

+x

7

= 5

western wall constraint: x

7

+x

8

+x

1

= 5

7.6 Recasting an IP as a 0-1 LP 97

What is the puzzle’s objective? That is, what is a suitable objective function? As you may have

noticed, there are several solutions to this puzzle. For instance, the guards can be positioned such that

x

1

= x

5

= 4 and x

2

= x

4

= x

6

= x

8

= 1, or by symmetry, x

3

= x

7

= 4 and x

2

= x

4

= x

6

= x

8

= 1.

Objective functions to maximize x

1

+ x

5

or x

3

+ x

7

could produce these answers, respectively. It is

also possible to reduce the number of guards on the corners such that x

1

= x

5

= x

3

= x

7

= 2, and

x

2

= x

4

= x

6

= x

8

= 1. Take a moment and consider what objective function could produce this

answer. As we see, the choice of an objective function determines which of such solutions is optimal! It

should be noted that in some cases, if you desire a particular solution, then additional constraints may

be needed.

Suppose King Husky wants the total number of soldiers watching the corners to be at a minimum.

Then, our objective function becomes: minx

1

+x

3

+x

5

+x

7

.

One constraint remains. The king issues 12 guards, which implies

8

i=1

x

i

= 12.

The entire model for the well-guarded castle puzzle can now be succinctly stated as

P : min x

1

+x

3

+x

5

+x

7

subject to x

1

+x

2

+x

3

= 5

x

3

+x

4

+x

5

= 5

x

5

+x

6

+x

7

= 5

x

7

+x

8

+x

1

= 5

8

i=1

x

i

= 12

0 ≤ x

i

≤ 5, i = 1, 2, . . . , 8

In class, we will look at other puzzles and pose the problems as IP problems. The interested reader

is encouraged to visit the web site, “Puzzles: Integer Programming in Recreational Mathematics”

(http://www.chlond.demon.co.uk/academic/puzzles.html) to ﬁnd additional puzzles.

7.6 Recasting an IP as a 0-1 LP

In the last section, P posed the well-guarded castle puzzle as an IP. Yet, posing an IP does not imply

that it is solved. King Husky (like an executive of a company who hires a consulting ﬁrm to model

a phenomenon) demands an answer from a model that he can trust. Presenting King Husky with a

properly posed IP and no solution might result in one’s being thrown over that mighty wall!

Solving P as an LP allows variables in the solution to contain nonzero decimal values. Such a

solution relaxes the constraint that the decision variables are integers. Hence, an IP that does not

enforce integral solutions is called a relaxed LP. Imagine presenting King Husky with the solution that

1.1 soldiers be posted on the southern wall. Even a cruel leader would recognize the foolishness of such

an action. Still, solving an IP as a relaxed LP can yield intuition on the ﬁnal solution. Yet, deciphering

the optimal or even near optimal integer solution from a real numbered solution is often a diﬃcult, if

not impossible, task.

A 0-1 LP (also called a 0-1 IP) is a linear programming problem where decision variables can be

considered as boolean variables attaining only the values 0 or 1. Moreover, every IP can be recast as a

0-1 LP. Techniques such as branch and bound can be applied to solve a 0-1 LP. Our solution can then

be recast from the 0-1 LP back to the integer solution of the corresponding IP.

Assume Q is an IP, and x

i

is an integer decision variable of Q. We introduce the decision variables

98 Integer Programming

d

i1

, d

i2

, . . . d

ik

and let

x

i

=

k

j=1

2

j−1

d

ij

= d

i1

+ 2d

i2

+. . . + 2

k−1

d

ik

,

where d

ij

= 0 or 1, j = 1, 2, . . . , k.

(7.10)

For P, the decision variables are bounded below by 0 and above by 5. Therefore, it is suitable to

let x

1

= d

11

+ 2d

12

+ 4d

13

. Note that if d

11

= d

13

= 1 and d

12

= 0 then x

1

= 3. Similarly, every

possible value of x

1

can be determined by a suitable linear combination of d

11

, d

12

, and d

13

. Note that

the decision variables d

ij

can also be viewed as the binary representation of the associated value of x

i

.

Completing similar substitutions for each decision variable in P, the corresponding 0-1 LP becomes:

min d

11

+ 2d

12

+ 4d

13

+d

31

+ 2d

32

+ 4d

33

+d

51

+ 2d

52

+ 4d

73

+d

71

+ 2d

72

+ 4d

73

subject to

8

i=1

3

j=1

2

j−1

d

ij

= 12

d

11

+ 2d

12

+ 4d

13

+d

21

+ 2d

22

+ 4d

23

+d

31

+ 2d

32

+ 4d

33

= 5

d

31

+ 2d

32

+ 4d

33

+d

41

+ 2d

42

+ 4d

43

+d

51

+ 2d

52

+ 4d

53

= 5

d

51

+ 2d

52

+ 4d

53

+d

61

+ 2d

62

+ 4d

63

+d

71

+ 2d

72

+ 4d

73

= 5

d

71

+ 2d

72

+ 4d

73

+d

81

+ 2d

82

+ 4d

83

+d

11

+ 2d

12

+ 4d

13

= 5

0 ≤ d

ij

≤ 1, i = 1, 2, . . . , 8, j = 1, 2, 3.

Such a problem can be solved using a branch and bound algorithm or linear programming software.

What if our problem did not have an upper bound on the decision variables? In such cases, the IP

is recast to a 0-1 LP with a large bound on each decision variable. Then, the bound is incremented by

a ﬁxed value (say, between 10 and 50) until the solution no longer changes at which point, the solution

is found.

Integer programming can model a diverse range of problems. In this chapter, you learned techniques

and some of the diﬃculties of solving IP problems.

7.7 Exercises 99

7.7 Exercises

1. Complete the formulation of Model 3: Toll Booth Scheduling.

2. Complete the formulation of Model 11: A Packing Problem.

3. Read ”UPS Optimizes Its Air Network” by Armacost, Barnhart, Ware, Wilson. Interfaces Vol.

34, No.1., Jan-Feb 2004, pp. 15-25. Give a short answer (1-2 sentences) for each question below.

(a) What sort of mathematical model is being used in the software they call VOLCANO?

(b) How can VOLCANO save money for UPS?

(c) What is the biggest factor in this savings?

(d) What sort of improvements on the routes did VOLCANO ﬁnd that the planners didn’t see?

Why?

(e) How has UPS actually used the suggested solution to their problem?

(f) Look at the appendix. Explain at least one equation on the conventional side and on the

composite variables side.

(g) Is this a good modeling paper? Why or why not?

(h) Which section did you ﬁnd most informative?

100 Integer Programming

Math 381 Notes — University of Washington —Winter 2011

Chapter 8

Evolutionary Trees

8.1 The problem

There are many important problems in modern genetics which require statistical and discrete mathe-

matical modeling and algorithms. We will look at just one of these, the construction of phylogenies, or

evolutionary trees, that show which species gave rise to others.

We start with a very superﬁcial (and not completely accurate) introduction to molecular biology.

The genetic material of all living organisms consist of molecules of DNA, deoxyribonucleic acid. For

our purposes, these can be viewed simply as long chains of four types of molecules strung together.

The four basic molecules (called bases) are: adenine (A), thymine (T), cytosine (C), and guanine (G).

So a gene can be represented as a sequence of these letters, e.g., AAGCGA, but typically consisting of

hundreds or thousands of bases. (Actually base pairs, since DNA is a double strand, or double helix,

which is important in the replication mechanism.)

Each gene is a blueprint for creating a particular protein. A protein is a complex molecule made

by stringing together a diﬀerent set of building blocks, the 20 amino acids, in various orders. A gene,

one strand of DNA, is a code which tells how to assemble one particular protein. Each set of three

bases determines one amino acid. For example, AAG is code for “Lysine”, while “CGA” is code for

“Arginine”, so the string AAGCGA codes for these two amino acids. Note that there are 4

3

= 64

distinct strings of 3 bases, and only 20 diﬀerent amino acids, so typically several diﬀerent triples (or

codons) code for the same amino acid. Some codons mean diﬀerent things, like “stop” at the end of a

protein.

When DNA is replicated from one generation to the next, mistakes are sometimes made. These

mutations may cause resulting proteins to not function properly. Some mutations are fatal for the

oﬀspring, but many are not and will simply be carried down to future generations. Some random

mutations lead to changes which are beneﬁcial and drive the evolutionary process. By comparing

similar sequences of DNA from diﬀerent species it is often possible to deduce information about how

the species are related and how they evolved.

To get a feel for this, let’s ﬁrst look at how some strings might evolve if random perturbations occur

in each generation (with small probability). We consider only one type of mutation, substitutions in

which one base is replaced by another. Other mutations, such as insertion or deletion of several bases

in a sequence, are also important but harder to deal with. (One important and diﬃcult problem in

this connection is the sequence alignment problem, trying to ﬁgure out how sequences from diﬀerent

organisms are best matched up.)

Consider two species starting to diverge, which start with identical sequences for some particular

piece of DNA:

Species 1 TAGCTTCATCGTTGACTTCTACTAAAAGCA

Species 2 TAGCTTCATCGTTGACTTCTACTAAAAGCA

101

102 Evolutionary Trees

Now suppose these species diverge, reproducing independently, with some small chance of mutation at

each site in each generation. This can be modeled by Monte Carlo simulation (see evolution.m on the

webpage). After 100 generations with a 0.1% chance of mutation of each base in each generation, these

have become

Species 1 CAGCTTCATCGTTGACTTCTACTAAAAGCA

Species 2 TAGCTTCATCGTTGACTTCCACTAAAAGCA

Suppose that at this point we split oﬀ another species, replicating Species 2 to obtain Species 3:

Species 1 CAGCTTCATCGTTGACTTCTACTAAAAGCA

Species 2 TAGCTTCATCGTTGACTTCCACTAAAAGCA

Species 3 TAGCTTCATCGTTGACTTCCACTAAAAGCA

and now evolve each independently for 100 generations to obtain

Species 1 CAGCTTCATCGTTGACTTCTACTAAAAGTA

Species 2 TAGCTTCATCGTTGACTTGCACTAAAAGCT

Species 3 TAGCTTCATCGTTGACCTCCACTAAAAGCA

At this point all three are diﬀerent, though we expect more similarity between 2 and 3 than between

these and Species 1. Repeat this process, replicating Species 3 as Species 4 and then evolving again to

obtain

Species 1 CAGCTTCATCGTTGACTTCTACTAAAAGTA

Species 2 TTGCTTCATCGTTGAATTGGACTAAAAGCT

Species 3 TAGATTCATCGTTGACCTCCGCTAAAAGCA

Species 4 TAGCTTCATAGTTGACCTCCTCTAAAAGCA

Finally, suppose we now replicate Species 1 as Species 5,

Species 1 CAGCTTCATCGTTGACTTCTACTAAAAGTA

Species 2 TTGCTTCATCGTTGAATTGGACTAAAAGCT

Species 3 TAGATTCATCGTTGACCTCCGCTAAAAGCA

Species 4 TAGCTTCATAGTTGACCTCCTCTAAAAGCA

Species 5 CAGCTTCATCGTTGACTTCTACTAAAAGTA

and evolve again to obtain

Species 1 CAGCTTAATCGTTCACTTCTACTAAAAGTA

Species 2 TTGCTCCAACGTTGAATTCGACTAAAGGCT

Species 3 TAGATTCATCGTTGACCTCCGCCAAAAGCG

Species 4 TAGCTTCATAGTTGACCTCCTCTTAAAGCA

Species 5 CAGCTTCATCGTTGACTTCTTATATAAGTA

The relationship between these ﬁve species is illustrated in Figure 8.1.

The basic question in determining phylogenies is the following. Given a set of DNA sequences such

as the ones above (but often much longer in practice, and usually many diﬀerent sequences are used),

can we infer the relationship between the species, and construct the proper evolutionary tree?

For a given set of species there are many possible trees. The basic idea is that we need a way to

compare two possible trees with one another, and say that which one is a better ﬁt to the given data.

This requires some sort of goodness measure that we can apply to a given tree T to obtain a value f(T).

We are looking for the tree with the best value, so we must solve an optimization problem to maximize

this function, which is deﬁned over all possible trees.

8.2 Maximum parsimony 103

(a) (b)

1 5 3 4 2 1 2 3 4 5

Figure 8.1: (a) The true phylogeny for the example in the text. (b) A distinct tree.

(a) (b)

1 5 3 4 2

C C T T T

C

T

T

T

1 2 3 4 5

C T T C T

T

T

T

T

Figure 8.2: Possible assignments of bases at each branch point for the ﬁrst site, shown for two distinct

potential trees. The tree shown in ﬁgure (a) requires only 2 mutations to explain the bases observed at

this site, while the tree shown in (b) requires 2 mutations.

There are three basic steps in ﬁnding the “optimal” tree:

1. Decide how we should measure the goodness of one tree against another, which deﬁnes the function

f(T). Below we will brieﬂy consider two possibilities, parsimony and likelihood.

2. Once we have a speciﬁc measure, we need an eﬃcient algorithm to compute a goodness value f(T)

for any potential tree T.

3. Finally, we need to eﬃciently search through all possible trees to locate the best. In general we

cannot simply enumerate them all because the number of possible trees grows exponentially in

the number of species. The problem of ﬁnding the best is typically NP-complete.

8.2 Maximum parsimony

“Maximum parsimony” is a term used by geneticists when constructing ancestral trees. Parsimony

means frugality or simplicity, and the idea here is to ﬁnd the tree which requires the least number of

mutations to explain the given data. As an example, suppose we want to compare the two possible

trees shown in Figure 8.1(a) and (b) for the ﬁve species above. Consider the ﬁrst base in each of the ﬁve

DNA strings. The bases seen can be explained, for example, by the histories indicated in Figure 8.2.

Using the tree of Figure 8.2(a), only one mutation is required to explain this data, as shown by the line

cutting one branch of the tree. (This is in fact the true history of this site from the simulation that

generated these species.)

104 Evolutionary Trees

(a) (b)

1 5 3 4 2

{C} {C} {T} {T}

{T}

{T}

{C}

{T}

{C,T}

1 2 3 4 5

{C} {C} {T} {T} {T}

{T}

{T}

{C,T}

{C,T}

Figure 8.3: Illustration of Fitch’s algorithm applied to the ﬁrst site using two diﬀerent trees.

Using the tree of Figure 8.2(b), on the other hand, there must have been at least two mutations

to produce this pattern of bases. (Perhaps more, if we assign letters to the branch points diﬀerently,

but never less.) We can deﬁne the parsimony of a tree to be the (minimum) total number of mutations

required to explain all of the variation seen. To compute the parsimony we must consider each base in

turn (all 30 of them in our example) and ﬁnd the assignment of bases to branch points which gives the

least number of mutations, and hence the parsimony.

There is an eﬃcient algorithm for doing this, Fitch’s Algorithm, which is illustrated in Figure 8.3.

This algorithm will be discussed in class, but here’s the basic idea. Keep a counter called parsimony

which is initialized to zero. For each site in the DNA sequence, we perform the following algorithm.

At each node on the tree we determine a set S of possible bases, working our way up from the leaves

toward the root. The set at each leaf consists simply of the base observed in the corresponding species.

At each node further up we compute the set S based on the sets S

1

and S

2

of its two children according

to these rules:

• If S

1

∩ S

2

= ∅ then we set S = S

1

∩ S

2

. In other words, we take the intersection of the two sets

whenever this is nonempty.

• If S

1

∩ S

2

= ∅, then we set S = S

1

∪ S

2

, the union of the two sets. In this case we also increment

parsimony by 1, since the evolutionary tree must have a mutation along one branch or the other

below this node.

While the idea of parsimony seems attractive. it has some limitations. Note in particular that it

does not give any information on the order in which branching occurred, or the time scale between

branchings. As far as the parsimony measure is concerned, the tree in Figure 8.1(a) is equivalent to

the tree shown in Figure 8.4(a). More generally it is impossible to say where the root of the tree is, so

maximizing parsimony gives only an unrooted tree. This same tree could just as well be drawn as shown

in Figure 8.4(b). All of these trees have the same parsimony. (Note all of these trees are fundamentally

distinct from the tree in Figure 8.1, which has a diﬀerent parsimony.)

Maximizing parsimony can be shown to fail in some cases, in the sense that it may converge to

the wrong tree as we include more data, by increasing the number of sequences considered from each

species. In particular, it is fallible if diﬀerent sites evolve with diﬀerent mutation rates.

8.3 Maximum likelihood

Parsimony makes no assumptions about the rates of mutations or about the time elapsed between

branching of species. The maximum likelihood approach is more closely tied to probability and the

statistical nature of the data.

8.3 Maximum likelihood 105

(a) (b)

2 1

3

4 5

1

5

2

3

4

Figure 8.4: Two trees which are equivalent to the tree shown in Figure 8.1(a) in terms of parsimony.

Maximum likelihood estimation is a standard statistical tool used in many ﬁelds, not just genetics.

The basic idea is easily explained by a standard example from coin tossing. Suppose we are given a

trick coin which has a probability p of coming up heads, but we don’t know what p is. We ﬂip the coin

100 times and observe 62 heads. What is the “most likely” value of p? The obvious answer is 0.62. We

want to derive this mathematically.

Recall that, if the true probability of heads is p, then the probability of observing exactly 62 heads

in 100 tosses is obtained from the binomial distribution as

L(p) =

_

100

62

_

p

62

(1 −p)

38

. (8.1)

We can now ask, what value of p will maximize this expression? In other words, what value of p makes

the observed data most likely to have arisen from 100 ﬂips of the coin? In this context L is called the

likelihood. We can compare the value of L with one value of p against the value of L with some other

value of p. The value of p for which L is larger is “more likely” to be correct.

For example, evaluating L from (8.1) for p = 0.62 gives L(0.62) = 0.082. This is the probability of

getting exactly 62 heads in 100 tosses with a coin which is biased to land on heads 62% of the time. This

may seem like a small likelihood, but we can’t judge this without comparing it to the likelihood of getting

exactly 62 heads if p had some diﬀerent value. We can compute, for example, that L(0.5) = 0.0045, so

it is much less probable that one would roll exactly 62 heads with a fair coin. Hence the value p = 0.62

is more likely than p = 0.5. Often statisticians look at the likelihood ratio and say, for example, that it

is about 18 times more likely that p = 0.62 than that p = 0.5 since L(0.62)/L(0.5) = 18.33.

To determine the most likely value of p, we must maximize L with respect to p. This seems tough

since it’s a complicated expression, but we can instead consider log L. Since log L is a monotonically

increasing function of L, if we ﬁnd the value of p which maximizes log L then this also maximizes L.

Taking the log converts all the products into sums and simpliﬁes things:

log L = log

_

100

62

_

+ 62 log(p) + 38 log(1 −p).

This is easy to maximize by diﬀerentiating with respect to p and setting the derivative to zero, the

standard technique from calculus. This gives

62

p

+

38

1 −p

= 0

from which we easily ﬁnd p = 0.62.

The same idea can be used in comparing two potential phylogenies. In this case, along with a

given tree structure, we also make some hypothesis about the “length” of the branches in terms of the

probability of a mutation at each site between one branching and the next. This “length” depends on

the time elapsed (number of generations) as well as the mutation rate per generation.

106 Evolutionary Trees

If we hypothesize a particular structure for the tree, then it turns out it is possible to compute the

probability that the observed genetic sequences from each species arose via random mutations based on

this structure. This probability depends on the tree, and so we can view it as a likelihood function for

that tree. We now want to choose the tree structure which will maximize this likelihood.

We won’t go into the manner in which this likelihood is computed, since it’s a bit complicated to

explain. It can be done, however, in a recursive manner starting at the leaves and working up to the

root. See [Fel81] for more details.

8.4 Searching the forest for the optimal tree

Once we have decided on a measure of goodness, parsimony or likelihood, for example, and have an

algorithm for computing this measure for any given tree, we are still faced with the daunting task of

ﬁnding the tree which maximizes this measure. Even if we work simply with unrooted trees without

trying to also estimate the length of the branches, the number of possible trees grows rapidly with the

number of species. For the example we have considered with 5 species, there are 1 · 3 · 5 · 7 = 105

distinct unrooted trees. With six species the number grows to 1 · 3 · 5 · 7 · 9 = 945 and this pattern

continues, multiplying by the next odd integer each time we add a new species. This grows faster than

the factorial function, and in general the problem of ﬁnding the optimal tree is NP-complete.

In practice various search strategies are used to try to ﬁnd the optimal tree (or at least a good tree)

without enumerating and checking all possible trees. It may be possible to start with only two species

and add species one at a time, deciding in each case where to insert the additional species in the existing

tree (as might naturally be done if a new species is discovered). This by itself wouldn’t generally lead

to the optimal tree, so it is also necessary to consider “local rearrangements” where the structure of

some small piece of the tree is rearranged to try to get a better tree. A branch and bound algorithm

can sometimes be developed to determine what paths are worth pursuing.

Math 381 Notes — University of Washington —Winter 2011

Chapter 9

Forest Resource Management

9.1 Timber harvest scheduling

Many forest resource management models lead to large-scale linear or integer programming problems.

We will explore just a few of the issues involved in timber harvesting, one particular application of

interest in the Northwest. We are all familiar with the patchwork quilt appearance of our forests due to

clear cutting. How do timber companies and the forest service decide where and when to cut? How are

decisions made about allocating land for diﬀerent uses, such as harvesting, wildlife habitat, recreation?

The reading assignment in Chapter 6 of [Dyk84] explores a relatively simple case, in which only the

age of the trees is considered, along with the value of the trees based on their age. The problem is then

to maximize proﬁt over a certain time period by harvesting at appropriate times. Once harvested, new

trees can be planted and harvested later. Because trees grow so slowly, harvesting schedules must often

be planned out far in advance, usually decades at least. Of course the models are constantly being

updated and rerun, so the solution obtained by looking over the next hundred years, say, is really only

used in the ﬁrst decade (or less), and a few years down the road a new optimal solution for the future

is obtained based on current data and predictions.

Similar scheduling problems arising in deciding when to replant, thin, or fertilize existing trees.

Sensitivity analysis is often important: what if we change the parameters or the strategy in some

way, how much eﬀect does that have? This is especially important because the data available is often

incomplete or statistical in nature, so it is interesting to see how sensitive the results are to changes in

the data.

9.2 Spatial structure

Even more interesting and challenging problems arise if we also consider the spatial arrangement of the

forests available for harvesting. There are typically many constraints that exist on which combinations

of trees can be harvested at a given time.

Figure 9.1(a) shows a hypothetical map of a forest region divided into sections based on some

characteristics such as

• The age of the trees,

• The kind of trees,

• The diﬃculty of harvesting trees, due to topography or road access,

• Ownership (a company may have leased rights for harvesting trees on land belonging to others,

who may impose constraints on when they want them harvested).

• The type of wildlife habitat provided, e.g., endangered species habitat,

107

108 Forest Resource Management

1

2

3 4 5

6

7

8

9

10

1

2

3 4 5

6

7

8

9

10

Road 1

Road 2

Road 3

Existing

Road

Road 4

Figure 9.1: (a) Map of forest sections. (b) With planned logging roads.

• Proximity to streams or rivers, since these “riparian zones” may be environmentally delicate.

Sections 3 and 8 in Figure 9.1(a) might correspond to a riparian zone, for example.

• Other environmental or social issues such as proximity to towns, parks, or trails, visibility from

highways.

Let h

ij

represent the fraction of Section i to be harvested in Decade j. Note that we have a zero-one

IP problem if each decision variable is forced to be 0 or 1.

A variety of new constraints may be imposed based on the characteristics of the sections or their

spatial location. A few examples are:

• Government regulations may limit the size of an area that is clearcut at any one time. For example,

it may be possible to harvest Section 4 or Section 5 in Figure 9.1(a), but not both at the same

time. This would lead to a linear constraint of the form

h

4j

+h

5j

≤ 1

for each decade j. If the time between harvesting adjacent sections is required to be longer, we

might have a constraint like

h

4j

+h

5,j−1

+h

5j

+h

5,j+1

≤ 1

for each j, stating that there must be at least one decade between the two harvest periods. This

is again a linear constraint which can be used in an IP approach.

• Protecting wildlife habitat may put constraints on what Sections can be harvested, or on what

combination of sections can be harvested based on proximity and habitat combinations.

9.3 Building roads

A new set of interesting constraints arise if we couple the harvesting problem together with the problem

of building new logging roads to reach the areas being harvested. Figure 9.1(b) shows the previous map

with a set of proposed roads superimposed, coming oﬀ of an existing road at the bottom of the map.

The roads are broken up into pieces which can be built independently, and 4 of these are numbered on

the map. We can then introduce decision variables r

ij

which is 1 if Road i is built in Decade j and 0

otherwise. Each section of road also has a cost C

ij

of building it in Decade j. Note that building Road

4 requires a bridge across the river in Section 3, so C

4j

is likely to be high.

9.4 Some other issues 109

In order to harvest a given section it is necessary to ﬁrst build roads reaching this section, giving

a new set of constraints. For example, Road 2 must be built in order to harvest Section 6, and so we

require

h

61

≤ r

21

h

62

≤ r

21

+r

22

h

63

≤ r

21

+r

22

+r

23

and so on for each decade under consideration. There is a set of constraints of this form for each section.

Another set of constraints arises from the fact that we cannot build Road 2 until we have built Road

1. So we need

r

21

≤ r

11

r

22

≤ r

11

+r

12

r

23

≤ r

11

+r

12

+r

13

and so on for each decade. There is a set of constraints of this form for each road.

The problem is now to maximize

i,j

P

ij

h

ij

− C

ij

r

ij

subject to all of these various constraints. It

is clear that the number of decision variables and constraints can be very large in scheduling problems

of this form. Real problems for millions of acres can easily involve 10,000 variables and hundreds of

thousands of constraints.

9.4 Some other issues

A number of other issues arise in the context of forest management. One question is how to split up

land into the sorts of sections indicated in the map above, especially if there are large areas that are

fairly homogeneous but must be split up into manageable pieces, or pieces of a size that one is allowed

to harvest at one time. Or conversely there may be lots of small parcels of similar forest scattered

around that must be aggregated into a single unit to make a tractable mathematical model. One big

question is whether to use a top-down or bottom-up approach to aggregating:

• Traditional models start at top, splitting up land into large quantities that should be harvested

at one time. Must then ﬁnd lots of small parcels of this type of land to actually cut.

• This can lead to a solution that can’t be implemented. The big model doesn’t see all the low-level

constraints.

• Bottom-up approach would look for perhaps the best 20 solutions for each district, then feed these

into a larger-scale model to choose the best combination of these.

• Working upward guarantees implementability but may not give the optimal solution.

Forest ﬁre protection is another consideration, and also leads to other sorts of mathematical modeling:

• What is the best cutting pattern to minimize risk of ﬁres?

• How sensitive is the harvesting schedule to disruption by ﬁre? If one area is destroyed before

being harvested does it throw everything oﬀ? (Sensitivity analysis.)

• Simulation of forest ﬁre events using historic data, weather data, fuel levels, etc.

110 Forest Resource Management

• Where should ﬂeets of tanker planes be stationed to maximize ability to ﬁght a ﬁre anywhere over

a large region. (An important question in Canada, for example, where there are still huge areas

of forest.)

Other sorts of problems arise in milling trees or the manufacturing wood products, e.g.,

• How to cut trees into logs, saw logs into lumber? Need an algorithm that can decide in real time

at the mill how to cut each tree as it arrives.

• Value of each size lumber will change with time as orders are ﬁlled.

• This may feed back into optimization in the woods — what sort of trees should be cut next?

Perhaps particular trees must be found depending on order desired.

• This could also tie into crew scheduling problems since diﬀerent cutting crews have diﬀerent

specialties.

• There are also truck scheduling problems between forests and mills. and questions about where

to optimally locate new mills based on where cutting will be done.

• Trimming models are used in deciding how best to peel logs for veneer.

Math 381 Notes — University of Washington —Winter 2011

Bibliography

[BD92] D. Bayer and P. Diaconis. Trailing the dovetail shuﬄe to its lair. Ann. Appl. Prob., 2:294–313,

1992.

[Chv83] Vasek Chvatal. Linear Programming. W. H. Freeman, New York, 1983. T57.74.C54.

[DA93] A. Dolan and J. Aldous. Networks and Algorithms, An Introductory Approach. John Wiley

& Sons, Chichester, 1993.

[Dan63] George Dantzig. Linear programming and extensions. Princeton University Press, Princeton,

NJ, 1963.

[Dyk84] D. P. Dykstra. Mathematical Programming for Natural Resource Management. McGraw-Hill,

1984.

[Fel81] J. Felsenstein. Evolutionary trees from dna sequences: a maximum likelihood approach. J.

Mol. Evol., 17:368–376, 1981.

[Gil55] E. Gilbert. Bell labs technical report. ., 1955.

[GJ79] M. R. Garey and D. S. Johnson. Computers and Intractability: A guide to the Theory of

NP-Completeness. W. H. Freeman, San Francisco, 1979.

[HL95] F. S. Hillier and G. J. Lieberman. Introduction to operations research. McGraw-Hill, New

York, 1995.

[Jef78] J. N. R. Jeﬀers. An Introduction to Systems Analysis: with Ecological Applications. University

Park Press, London, 1978.

[JT98] G. J´onsson and L. N. Trefethen. A numerical analyst looks at the ‘cutoﬀ phenomenon’ in

card shuﬄing and other markov chains. In Numerical Analysis 1997, D. F. Griﬃths, D. J.

Higham, and G. A. Watson, eds., pages 150–178. Addison Wesley Longman, Ltd., 1998.

[Kar72] R. M. Karp. Reducibility among combinatorial problems. In Complexity of Computer Compu-

tations, Proc. Sympos. IBM Thomas J. Watson Res. Center, Yorktown Heights, N.Y., pages

85–103. Plenum, 1972.

[Kar84] N. Karmarkar. A new polynomial-time algorithm for linear programming. Combinatorica,

4:373–395, 1984.

[Kha79] L. Khachian. A polynomial algorithm in linear programming. Soviet Math. Dokl., 20:191–194,

1979.

[KM72] Victor Klee and G. Minty. How good is the simplex algorithm. In Inequalities III, pages

159–172. Academic Press, 1972.

[Knu93] D. E. Knuth. The Stanford GraphBase: A Platform for Combinatorial Computing. Addison-

Wesley, New York, 1993.

111

112 BIBLIOGRAPHY

[Kol90] G. Kolata. In card shuﬄing, 7 is winning number. In New York Times, page C1, 1990.

[Kor92] B. Kordemsky. The Moscow Puzzles. Dover, reprint edition, 1992.

[LLAS90] E. L. Lawler, J. K. Lenstra, A. H. G. Rinnooy Kan, and D. B. Shmoys. The Traveling sales-

man problem : a guided tour of combinatorial optimization. Wiley-Interscience, Chichester,

1990.

[Mak73] D. P. Maki. Mathematical models and applications, with emphasis on the social, life, and

management sciences. Prentice-Hall, Englewood Cliﬀs, 1973.

[Min78] E. Minieka. Optimization algorithms for networks and graphs. Marcel Dekker, New York,

1978.

[Ree81] J. Reeds. Theory of shuﬄing. unpublished manuscript, 1981.

[Rob76] Fred S. Roberts. Discrete mathematical models, with applications to social, biological, and

environmental problems. Prentice-Hall, Englewood Cliﬀs, N.J., 1976.

[Rob84] Fred S. Roberts. Applied combinatorics. Prentice-Hall, Englewood Cliﬀs, N.J., 1984.

[Ros90] S. M. Ross. A Course in Simulation. Macmillan Publishing Company, New York, 1990.

[SCCH97] G. C. Smith, C. L. Cheeseman, and R. S. Clifton-Hadley. Modelling the control of bovine

tuberculosis in badgers in england: culling and the release of lactating females. J. Applied

Ecology, 34:1375–1386, 1997.

[Sip97] M. Sipser. Introduction to the theory of computation. PWS Pub. Co., Boston, 1997.

[SK87] G. L. Swartzman and S. P. Kaluzny. Ecological Simulation Primer. MacMillan Publishing,

1987.

[Ski90] S. S. Skiena. Implementing Discrete Mathematics: Combinatorics and Graph Theory with

Mathematica. Addison-Wesley, Redwood City, CA, 1990.

[Slo00] Neil Sloane. The on-line encyclopedia of integer sequences. In A008292.

http://www.reseach.att.com/ njas/sequences, 2000.

[ST02] Daniel Spielman and Shang-Hua Teng. Smoothed analysis of algorithms. In Proceedings of

the International Congress of Mathematicians, volume I, pages 597–606, 2002.

[UMA95] UMAP Journal. MCM, The First Ten Years, special edition of the UMAP Journal. COMAP,

1995.

[ZP97] Z. Zhou and W. Pan. Analysis of the viability of a giant panda population. J. Applied

Ecology, 34:363–374, 1997.

2

Acknowledgments: These notes are based on many quarters of Math 381 at the University of Washington. They have developed in a wiki fashion. The original notes were compiled by Randy LeVeque in Applied Math at UW. Tim Chartier and Anne Greenbaum contributed signiﬁcantly to the current version in 2001-2002. Jim Burke contributed a portion of the chapter on linear programming. Sara Billey contributed a portion of the chapter on Stochastic Processes. We thank them for their eﬀorts. Since this document is still evolving, we appreciate the input of our readers in improving the text and its content. Sara Billey adds: I have gotten a lot of inspiration for this course from the following sources: • ”Operations Research” by Wayne Winston. • ”Mathematics in Service to the Community: Concepts and Models for Service-learning in the Mathematical Sciences” (Maa Notes #66) edited by Charles R. Hadlock. • ”Discrete Mathematical Models: With Applications to Social, Biological, and Environmental Problems” by Fred Roberts. • ”Freakonomics” by Steven Levitt and Stephen Dubner • ”Three Cups of Tea: One Man’s Mission to Promote Peace . . . One School at a Time” by Greg Mortenson and David Oliver Relin • ”Mountains beyond mountains” by Tracy Kidder • My friend Amy Smith: inventor of low-tech devices for use in developing countries. See http://en.wikipedia.org/wiki/Amy_Smith http://www.ted.com/index.php/talks/amy_smith_shares_simple_lifesaving_design.html

The following information is the bibtex citation for these notes if you want to cite them as a reference: @Book{381notes, author = {Sara Billey, James Burke,Tim Chartier, Anne Greenbaum and Randy LeVeque}, title = {Discrete Mathematical Modeling: Math 381 Course Notes}, year = {2010}, publisher = {University of Washington}, }

Math 381 Notes — University of Washington —Winter 2011

Contents

1 Introduction to Modeling 1.1 Lifeboats and life vests . 1.2 The modeling process . 1.3 The goals of 381 . . . . 1.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 2 3 6 9 9 10 10 10 11 11 13 13 13 14 14 15 15 16 16 17 19 20 25 25 26 28 28 29 31 33 33 34 37 38 40 41

2 Graph Theory, Network Flows and Algorithms 2.1 Graphs and networks . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 The traveling salesman problem . . . . . . . . . . . . . . . . . . . 2.2.1 Brute force search . . . . . . . . . . . . . . . . . . . . . . 2.2.2 Counting the possibilities . . . . . . . . . . . . . . . . . . 2.2.3 Enumerating all possibilities . . . . . . . . . . . . . . . . . 2.2.4 Branch and bound algorithms . . . . . . . . . . . . . . . . 2.3 Complexity theory . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Shortest path problems . . . . . . . . . . . . . . . . . . . . . . . 2.4.1 Word ladders . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.2 Enumerating all paths and a branch and bound algorithm 2.4.3 Dijkstra’s algorithm for shortest paths . . . . . . . . . . . 2.5 Minimum spanning trees . . . . . . . . . . . . . . . . . . . . . . . 2.5.1 Kruskal’s algorithm . . . . . . . . . . . . . . . . . . . . . 2.5.2 Prim’s algorithm . . . . . . . . . . . . . . . . . . . . . . . 2.5.3 Minimum spanning trees and the Euclidean TSP . . . . . 2.6 Eulerian closed chains . . . . . . . . . . . . . . . . . . . . . . . . 2.7 P, N P, and NP-complete problems . . . . . . . . . . . . . . . . . 2.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Stochastic Processes 3.1 Basic Statistics . . . . . . . . . . . . . 3.2 Probability Distributions and Random 3.2.1 Expected Value . . . . . . . . . 3.3 The Central Limit Theorem . . . . . . 3.4 Buﬀon’s Needle . . . . . . . . . . . . . 3.5 Exercises . . . . . . . . . . . . . . . . 4 Monte Carlo Methods 4.1 A ﬁsh tank modeling problem . . 4.2 Monte Carlo simulation . . . . . 4.2.1 Comparison of strategies . 4.3 A hybrid strategy . . . . . . . . . 4.4 Statistical analysis . . . . . . . . 4.5 Fish tank simulation code . . . . . . . . . . . . . . . . . . . . . . . . . . . . Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8. . . . . . . . . . . . . . . . . . . . . . . . . . .2 Formulation of the IP . . . . . . . . . . . . . . . . . . .4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5. . . . . . . .1. . . . . . . . . . . . . . . . . . . . . . . . . . . . 7. . . . . . . 5. . . . . . . . . . . . . . . . . . . . . . . 6. . . . . . . . . . . . . . . . . .2 Review of eigenvalues and eigenvectors . . . . . . . . . . . . . . . . . . . .7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 Complexity and the simplex algorithm . . . . .1 The lifeboat problem . . . . . . . . . . . . . . . . 9. . . . . . . . . . . . . . . . . . 6 Linear Programming 6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5 IP in Recreational Mathematics . . . . . . . . . . . . . . . . .2 Cargo plane loading . . . . . . . . . . . . . . . . . . .6 4. . . . . . . . . . . . . . . . . . . .5. . . . . . . .6 Recasting an IP as a 0-1 LP . . . . . . . . . . . . . . . . . . . . . . . . 7. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 Work schedules . . . . . . . . . . . . . . . . . . . . . . . . . 6. . . 8. . . . . . . . . . . . . . . . . . . .2. . . . . . . . . . . 8. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6. . . . . . . . . . . . . 8 Evolutionary Trees 8. 5. . . . . . . . .3 Branch and bound algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 The ﬁsh tank inventory problem . . . . .5 General properties of transition matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 An Example . . . . . . . . .1 Timber harvest scheduling . . . . . . . . . . . . . . . . . . . . 6. 6. . . . . . . . . . . . 5. . . . . .2 A pig farming model . . . .4 Other examples of Markov Chains . .ii CONTENTS 4. . . 7. . . . . . . . .2. .4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2. .3 Convergence to steady state . . . . . . . . . . . . . . . . . . . 7. . . .2. . .7 4. . . . . . .7 Exercises . . . . . . . . . . . . . . .6 Ecological models and Leslie Matrices . . . . . . . . . 7. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 Spatial structure . .5. . . . . . . . . . . .1 Breakdown of machines . . . 5. . . . . . 43 47 52 55 55 58 59 61 61 63 64 65 66 68 70 71 73 73 73 73 77 80 82 82 84 84 87 89 89 91 92 94 95 95 96 96 97 99 101 101 103 104 106 107 107 107 108 109 5 Markov Chains and Related Models 5. . . .4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5. . . . . . . . . . . . . . . . . .1 An Example Puzzle . . . . . . . . . . . . . . .1.4. . 7 Integer Programming 7. . .1 Harvesting strategies . . . . . 6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1. 7. . . . . . . . . . . . . . . . . . . . . . . . . .2 Maximum parsimony . . . . . . . . . 9. . . . . . . . . . . . . . . . . . . . .2 Solving the problem by enumeration 6. . . . . . . . . . . . . . . . . . . .4 Searching the forest for the optimal tree 9 Forest Resource Management 9. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4 Traveling salesman problem . . . . . . . . . . . .1 Adding more constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 The problem . . . Exercises . . . . . . .1 NP-completeness of integer programming 7. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Building roads . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7. . . . . . . . . . . . . . 9. .3 Maximum likelihood . . . . . 5. . . . . . . . . . . . . . . . . . . . . .4 Card shuﬄing . .1 A Markov chain model of a queuing problem 5. . . . . . . . . . . . . . . . . . . . . . . . . . . 7. . . . . . . . .4 The simplex algorithm . . . . . . . . . . . . . . . . . . . .8 Poisson processes . . . 5. . . . . . .6. . . . . . . . . . . . . . . . . . . Queuing theory . .3 Adding a third decision variable . . . . . . . . . . . . . . . . .1 What is optimization? . . . . 6. . . . . . . . . . . . . 5. . . . .3 LPs in Standard Form . . .1 Introduction . .4. . . 6. . .4 Some other issues .

First see if it’s possible to accommodate everyone in lifeboats. • We must have capacity for 1000 people in one or the other. They do carry enough life vests to make up the diﬀerence. so the answer is no. 3. Make sure there’s enough space for life vests for everyone. Figure out how many boats and vests are used if we exactly use all the available space and provide exactly the necessary capacity. They are designing a new ferry boat and want to determine the number of lifeboats and life vests to equip it with. How many of each should they install? One might go through the following steps in determining a solution: 1.Math 381 Notes — University of Washington —Winter 2011 Chapter 1 Introduction to Modeling 1. It would be better to carry more lifeboats since your chances of surviving are better in a lifeboat than in cold water. which would be best.1 = 105m3 of space. They give us the following information: • Each vest holds 1 person and requires 0. Suppose the XYZ Boat-builders have come to us with the following problem. However. 2. We can set this up as a mathematical problem.05 = 50m3 . There are two unknowns x1 x2 = = number of vests number of boats 1 .1 Lifeboats and life vests Recently there was an article in the Seattle Times about Washington State Ferries and the fact that they do not carry enough lifeboats for all passengers. we must use some combination. • Each boat holds 20 people and requires 2.05 m3 of space to store. But for the best solution we’d like to use more of the available space and provide as many boats as possible.1 m3 of space to store. This would require (1000/20) × 2. so it would be possible to provide only life vests and have space left over. otherwise the problem is impossible: 1000 × . lifeboats take up much more space and so there is a tradeoﬀ — the capacity of the ferries would be greatly reduced if they had to carry enough lifeboats for everyone. • The total space to be devoted to this equipment is at most 85 m3 .

1. . Solve the mathematical problem. Interpret the mathematical solution in terms of the real problem. There must be 30 boats and 400 vests. Everybody knows that the lifeboats are arranged on symmetric racks on the two sides of the boat. Fix up the solution using this new information. and so there must be an even number of lifeboats”.1 (volume of each boat) = 85 (total volume available) (1. but must be worked into this form. a substantial improvement.8182 (number of boats) 5. Present the answer to the XYZ Company. Actually it is an integer programming problem since we also have the constraint that the solution must be a pair of integers.2m3 if it means we can ﬁt in two more boats. Suggest this to XYZ. so we’ll have to ﬁx up these numbers. At which point they say: “You must be crazy.1m3. we could increase the number of boats to 32 instead of 30. • There may be many diﬀerent ways to model the same problem mathematically. The problem is better set up as a linear programming problem. But for more complicated problems of this same type this would not be possible and we would have to deal directly with the fact that we are given inequality constraints (e. a type of discrete optimization problem in which we try to maximize the number of boats subject to two sets of constraints (an upper bound on volume and a lower bound on capacity). 6.1) 4.6364 (number of vests) = 31. They respond: “Well sure. Equation (1..05 (volume of each vest) = 2.2m3 ). Clearly we can’t actually provide fractional vests or boats. This suggests that if we had just a little more space (85. The only way to do this and keep the volume from increasing is to reduce the number of boats to 31 and increase the number of vests to keep the capacity at 1000.1) is just a 2 × 2 linear system so it’s easy to solve. This is a linear system of equations: C1 x1 + C2 x2 = C V1 x1 + V2 x2 = V where C1 C2 C V1 V2 V = 1 (capacity of each vest) = 20 (capacity of each boat) = 1000 (total capacity needed) = 0. the 85m3 ﬁgure was just an approximation.2 Introduction to Modeling and two equations to be satisﬁed. 7.2 The modeling process This simple example illustrates some of the issues involved in mathematical modeling: • The problem is usually not presented as a standard mathematical problem. obtaining x1 x2 = 363.g. There are standard methods for solving these types of problems. Reconsider the solution to see if we’ve missed anything. Notice that the original mathematical solution had x2 almost equal to 32. the total volume must be no more than 85m3 ) rather than equalities. In the above example we reduced it to a linear system. one saying that the total capacity must be 1000 people and the other saying that the total volume must be 85m3 . We can easily ﬁnd another 0. Since 31 × 20 = 620 this means we need 380 vests. The total volume required is 84.

1. revise the model. Does it make sense? • Often the information presented is not complete or necessarily accurate. though some answers are much better than others! Mathematical modeling is an ongoing process of overcoming these issues described above. we weren’t told to begin with the information that the number of boats had to be even. but typically one must make decisions about what aspects to consider and which can be safely ignored (or must be ignored to get a solvable problem).. we need to interpret the answer to this problem in terms of the original problem. in the above problem we recognized early on that we could set it up as a linear system of equations. We will see a variety of standard problems and how each can arise in many diﬀerent contexts.3 The goals of 381 The goals of Math 381 are the following: • Learn about the modeling process. Did the proposed solution work better? If so. Collect collect real world data Translation Mathematical Problem/Model Testing Real world prediction Interpretation Mathematical Solution/Prediction Figure 1.1: The Modeling Method From the picture we see that the Modeling Method is a cyclic process. the linear system above). for example. Whenever we create a mathematical model. These will be introduced in the course of looking at a number of modeling problems. The course culminates in a ﬁnal modeling project. we need to assess the validity of the predictions. or that the available volume was only approximate.g. We summarize the process with the following picture: Identify a real world goal.3 The goals of 381 3 • Once the “mathematical problem” has been solved (e. • Often the problem is far too hard to model in detail and solve exactly. we need to study real world problems that eﬀect real people. and we knew that this would be a good thing to do since there are standard methods for solving such systems (such as Gaussian elimination). or the university. We need some more tools in our tool bag in order to do modeling. • In order to learn about the modeling process.1. small businesses. Here. This example was quite straightforward in this regard. We can always continue to improve our model when new information comes in. For example. How does one take an ill-deﬁned “real world problem” and reduce it to a mathematical problem that can be solved by standard techniques? What issues are involved in dealing with real data and complex problems. we must learn what some of these standard problems are. This Prediction . by how much? If not. government agencies. where we can often only hope to model some small piece of the total problem? • In order to have any chance of making a model that reduces to a “standard mathematical problem”. Final projects can be inspired by some of the challenges faced by non-proﬁt organizations. Often there is no “right answer”. We will seek out and solve problems related to the community around us. The most important step is “Testing”.

lifeboats. the life vests. DN A sequences.) Moreover there are typically a discrete set of possibilities to choose between. which means the mathematical problems typically involve a discrete set of values or objects (e. Discussion will be encouraged in class and collaboration is encouraged outside of class. in which case diﬀerent mathematical techniques are used. This is necessary in order . This will build your written communication skills. but is typically very ineﬃcient. Many modeling problems are better solved with continuous models. • Another goal of the course is to give you experience in working together with others on a mathematical problem. and interpreting the results of the model. – Network ﬂows and network optimization. – Diﬀerence equations and recurrence relations. – Dynamic programming. – Discrete dynamical systems. – Graph theory. One could try all possible combinations and ﬁnd the right mix by trial and error. – Game theory. boxes. engineers. – Stochastic processes and Markov chains. or specialists in another discipline.4 Introduction to Modeling is a service-learning course for which you can receive S credit on your transcript by satisfying the requirements. – Time series analysis and forecasting.. but as mathematicians we recognize that it is much easier to simply solve a 2 × 2 linear system. We want to use mathematical techniques to make the job easier. • The project also must be written up in an appropriate manner. cheaper.g. • Math 381 concentrates on discrete modeling. There are many standard mathematical problems and techniques which are useful in discrete modeling. ﬁsh tanks. This is an important part of any real-world modeling project and an important skill to master. The projects must be done in groups. – Monte Carlo simulation. taking a problem and setting it up as a mathematical problem. • The main emphasis of this course will be on the modeling process. scientists. people. solving a linear programming problem by the “Simplex Method”. or more beneﬁcial. Trying out all of the possibilities to see which is best might be one possible approach. During the quarter you also will read some papers written by others where modeling problems are presented and do some shorter writing assignments. – Decision analysis. You will receive W credit on your transcript for this course. or more commonly by a mathematician working together with community members. e. Some of these are: – Linear programming and integer programming. for example diﬀerential equations. For example. etc. Such models are studied in AMath 383. – Queuing theory.g. – Combinatorics and combinatorial optimization. This is the way most modeling is done in the real world: by teams of mathematicians. it is clear that the solution to the problem above must consist of at most 1000 vests (since this would accommodate everyone) and at most 40 lifeboats (since this would use all the available space). Along the way we also will learn something about techniques for solving some of the standard problems listed above. dollars. airplanes.

There are also other related courses in other departments. See the description of the Operations Research Option in the ACMS B.washington. There are other courses which concentrate on understanding the mathematical aspects of these problems and the methods which have been developed to solve them. including Industrial Engineering and Management Science. 492: Stochastic processes – CSE 373: Data structures and algorithms – CSE 417: Algorithms and Complexity – Industrial Engineering 324.5. Degree Program for a list of some of these: http://www.1.ms. Some of these are: – Math 461: Combinatorics – Math 462: Graph Theory – Math 407: Linear optimization – Math 408: Nonlinear optimization – Math 409: Discrete optimization – Math 491. but the study of methods for solving these problems is not the main emphasis of this course.edu/acms .6: Operations Research This course will provide an introduction to many of these topics which might motivate you to take some of these other courses and pursue the mathematics in more depth.S.3 The goals of 381 5 to solve the modeling problem.

and others are still fairly rural. with many people commuting to mainland jobs each day. each with strawberries yields $300 proﬁt. There are 8 islands with diﬀerent personalities. Again. what . What are some of the issues you would need to address in designing a ferry schedule for this set of islands? Write down some questions you would need to answer before starting. The labor and fertilizer used for each acre are given in the table below: Pumpkin Strawberry Labor : 3 workers 2 workers Fertilizer : 2 tons 4 tons One hundred vollunteer workers and 120 tons of composted fertilizer are available.” Practice breaking down a modeling problem into the 4-stages of the “Mathematical Modeling Cycle” by carefully describing each stage of this modeling problem. Suppose we are trying to design a ferry schedule for a set of islands. Deﬁne all of your notation and assumptions so that a reader who has not read our course notes would be able to follow your description. California Flagstaﬀ. Using the data on the network map. 3. you can ﬁnd a network map of the United States from the Rand McNally Road Atlas which shows distances and estimated driving times between various cities in the west. what is the best route to take? (First think about what “best” means!) (b) Suppose you are planning your vacation for next summer and would like to visit a number of diﬀerent places: • • • • • • Reno. Each acre planted with pumpkin yields $200 proﬁt. She is going to plant each acre with pumpkins or strawberries. small business issues. Some are big tourist centers. (b) Describe each step of the modeling cycle with regard to Jane’s farming situation. (a) Help Jane decide how many acres of each crop to plant in order to maximize proﬁt. state the goal in real world terms and translate that goal into a mathematical goal stated in terms of variables and equations. and some ideas as to how you might go about modeling this and what sorts of strategies you might consider in designing a schedule. Write approximately one paragraph for each problem. 2. Nevada Yellowstone National Park Denver. (c) What new information might come to light during the testing phase of the modeling cycle? 4. similar to the San Juan Islands in the Puget sound.4 Exercises 1. games you like to play. others are “bedroom communities”. Consider recent events in the news. and clubs that you belong to. Describe three problems that arise in everyday life which could be attacked using discrete mathematical modeling.6 Introduction to Modeling 1. She owns 45 acres of land. Consider the problem about “Lifeboats and Lifevests. Colorado Salt Lake City San Francisco. On our class website. Can you identify any mathematical problems that would naturally arise? 5. What is the best route to take? You don’t need to ﬁnd the answer. (a) Suppose you want to drive from San Francisco to Denver. In particular. but think about what is involved in solving this problem and write down some of your thoughts about translating this goal into a mathematical problem. life on campus. Arizona Your trip will begin and end in Seattle. Farmer Jane is a local organic farmer.

4 Exercises 7 is meant by “best”? Are there other constraints you might want to consider before trying to answer the question? 6. Write a second paragraph indicating the pros and cons of the this strategy in evaluating applications. May-June 2004. No. 3. pp. . This paper is included in the appendix of the course notes. Write a one paragraph summary of the voting strategy used for the Academy Awards.1. 226-234. 34. Read “Decision Rules for the Academy Awards Versus Those for Elections” by William Gehrlein and Hemant Kher. (b) Explain in your own words why the most popular movie of the year is not chosen to receive the “Best Picture” award by the Academy. (a) You have been asked to evaluate the voting strategy used by the Academy Awards applied to job applications reviewed by a committee. Use the vocabulary you have learned from this article. Interfaces Vol.

8 Introduction to Modeling .

9 . Or the vertices could represent individuals in a hierarchical organization. Figure 2. The map has a labeled point for each city and lines connecting nearby cities are labeled by the distance between the two cities. with an edge from A to B if Person A reports to Person B. The vertices and edges of a graph might represent any number of diﬀerent things. the vertices in a digraph might represent legal positions in a game. e.g. often indicating the cost of moving from one vertex to the neighbor. Graph Directed graph 2 4 3 5 12 10 4 2 3 6 8 6 Network Tree Figure 2. More generally. • A network has weights on the edges. for example if some routes on the map are one-way streets. a graph is a collection of vertices or nodes which are connected by edges.1 Graphs and networks Imagine you are planning a trip around the United States based on a map showing distances between various cities and points of interest. the distance or driving time in the map example.1: Some common graphs. • A tree is a connected graph which contains no circuits. depending on the application.Math 381 Notes — University of Washington —Winter 2011 Chapter 2 Graph Theory. One might ask a question like “How should we drive if we want to visit every city on the map?” This is an example of a network ﬂow problem from graph theory. A network could also be directed. A family tree is one familiar example. Network Flows and Algorithms 2. with an arrow from Vertex A to Vertex B if you can go from Position A to Position B by a single legal move. A graph is connected if there is a way to get from any vertex to any other vertex by following edges. These and other examples are discussed in [Rob76]. For example.1 shows a few types of (connected) graphs that arise in many diﬀerent applications: • A directed graph (or digraph) has arrows on edges indicating what direction one can move between nodes.

In general for any set of N distinct objects there are N ! possible orderings. but the same mathematical problem arises in many contexts. Similarly. 2. and returns to the original vertex? Studying the TSP gives a good introduction to the mathematical theory of combinatorics and complexity of algorithms. still fairly easy on a computer. This is illustrated as a decision tree in Figure 2. having an edge between every pair of vertices. and also will tell us something about how long it will take to do this brute force search. consider a 500 MHz PC. To get an idea of how long this would take.2.1 Brute force search One way to solve this problem is to simply enumerate all the possible routes. D. Counting questions like this are one aspect of the mathematical ﬁeld of combinatorics.2 Counting the possibilities The ﬁrst question is: how many possible routes are there? This will give us some guidance when designing an algorithm to check them all.6 million possibilities. 008. and choose the shortest. Note that what we have counted is the number of ways one can permute the N cities. This is a computer which performs 500 million “instructions” or “cycles” every second. e. 000 50! ≈ 3 × 1064 . This is called the “brute force approach” since we aren’t using any sophisticated ideas to do an intelligent search for the best route. For a TSP with 10 cities we would need to check about 3. But unfortunately N ! grows very rapidly with N . Here “instruction” means a lowlevel machine operation and it typically takes many cycles to perform one arithmetic operation such . however. Then we have to choose the city to visit third and there are only 2 unvisited cities left. It may still take some thought. and end up back in Seattle. we’re just doing the obvious thing. determine the total length of each. 640. where the four cities are labeled A. 432. B. The total number of distinct routes is 4! = 4 · 3 · 2 · 1 = 24. With only 4 cities we could quickly compute the traveling salesman’s path length for all 24 permutations by hand and determine the shortest route. for the TSP with N cities (actually N + 1 cities including Seattle).2 The traveling salesman problem A famous network ﬂow problem is the traveling salesman problem (or TSP for short). 800 20! = 2. what is the shortest path through the network which starts at a particular vertex. This name comes from one particular application. C. General formulation of TSP: Given a network with N + 1 vertices. visit 4 other cities. Network Flows and Algorithms 2. then 3 choices. What is the optimal order in which to visit the other cities? 2. then 2 and 1.10 Graph Theory.2. to ﬁgure out how to systematically enumerate and check each possibility without missing any routes. Once we’ve chosen that city. 176. note that we must choose one city to visit ﬁrst and there are 4 possibilities.g. Finally the fourth city is determined since there is only one city left. we must choose which city to visit second and there are only 3 possibilities since this must be a diﬀerent city. we have 4 choices. 902..2. Consider the TSP for a salesman who wants to start in Seattle. or permutations of the N objects. the number of possible routes we must check is N ! = N · (N − 1) · (N − 2) · · · 3 · 2 · 1. Starting from the city of origin O. visits every other node exactly once. 628. 5! = 120 10! = 3. To do a count for this 4-city TSP.

which makes it impossible to simply enumerate and check every possibility. Paths beginning with O-C and O-B still must be checked. if a less costly path has already been identiﬁed. In the rightmost column is an enumeration of all 24 routes in the lexicographic order. Within each set. . After discovering this. For a TSP with 50 cities there is no conceivable computer that could ﬁnd the solution in our lifetimes. that branch of the tree can be eliminated from further consideration. Then solving the 10-city TSP would take about 1/3 of a second. which is no improvement in running time. We now have about 2. the network in Figure 2.2. and problems are solved this way which would otherwise be intractable. This requires some algorithm for systematically checking every possibility. called ﬂops. note that to compute the length of a single route requires about N additions.2. to prune away those branches of the tree.2. Consider. equivalently. one ﬁnds that the path O-ABCD-O has a cost of only 14. Machine speed is often measured in megaﬂops (millions of ﬂoating point operations per second). since the cost of that edge alone is 16.2 The traveling salesman problem 11 as an addition or multiplication.4 × 1011 seconds.2 that uses edge BD (or. which has a total cost of 15. and so on. Let’s not worry too much about the exact cost.4 × 1018 routes to examine. DB) can be eliminated. Returning to the traveling salesman problem. a useful measure of a computer’s speed is the number of ﬂoating point operations per second.3 Enumerating all possibilities For an N-city TSP with N suﬃciently small. Even if we move up to a supercomputer that is a thousand times faster.2 shows one way to think about enumerating all the possible routes for the 4-city TSP. engineers and scientists do. and this enables one to eliminate even more possibilities. one ﬁnds that O-DABC-O costs only 11. as well as more work to keep track of which route to examine next.3. However in practice a good branch and bound algorithm will typically be much faster than pure enumeration. etc. 2. Many simple problems suﬀer from this kind of combinatorial explosion of possible solutions as the problem size increases. but as soon as the cost of a partial path exceeds 11. Examining next the paths that begin with O-D. This type of algorithm is sometimes called a branch and bound algorithm. we could solve by enumeration. 2. but suppose that we are working on a machine that can examine ten million routes per second.2. These are called ﬂoating point operations since scientiﬁc notation is usually used to represent a large range of values on the computer. one might be able to rule out some paths early if their cost exceeds that of the best path discovered so far. A 500 MHz machine might do about 40 megaﬂops. and at a rate of 10 million per second this will take 2. so called because this is the order that would be obtained by putting the 24 strings representing diﬀerent routes into alphabetical order. Not bad. By continuing to check the other paths that start with O-A. Figure 2. which is about 7. One might ﬁrst try a greedy algorithm that takes the shortest edge to a new city at each step.4 Branch and bound algorithms Instead of evaluating the cost of every path appearing in the tree of Figure 2. which route is shortest. using the brute force approach. the routes are grouped by which city comes second. This leads to the path O-ADCB-O. then the 6 routes starting with B. Supercomputer speeds are often measured in gigaﬂops (billions of ﬂops) or teraﬂops (trillions of ﬂops). Note that the 3! = 6 routes starting with A come ﬁrst. adding up the length between each pair of cities on the route. But already with 20 cities it would be diﬃcult to ﬁnd the answer using this brute force approach. and discarding any partial paths whose cost exceeds 14. using more sophisticated mathematical techniques. for example. The key idea is to obtain some sort of bound on how good the shortest path within a given subtree could possibly be and. thereby reducing the required computation. it would take almost 8 years. If we are unlucky then we might not be able to prune much of the tree and in the worst-case scenario we have to examine the entire decision tree anyway. For computations of the sort mathematicians.715 years. any path in Figure 2. ordered from “lowest” to “highest” in lexicographic ordering. Nonetheless such problems are solved daily.

B. C.3: A traveling salesman network starting from Seattle and visiting four other cities A. in lexicographic order. C.12 Graph Theory.2: Decision tree for enumerating all possible routes in a traveling salesman problem starting from City O and visiting four other cities A. Network Flows and Algorithms AB A AC AD ABC ABD ACB ACD ADB ADC BAC BAD BCA BCD BDA BDC CAB CAD CBA CBD CDA CDB DAB DAC DBA DBC DCA DCB ABCD ABDC ACBD ACDB ADBC ADCB BACD BADC BCAD BCDA BDAC BDCA CABD CADB CBAD CBDA CDAB CDBA DABC DACB DBAC DBCA DCAB DCBA BA B BC BD O CA C CB CD DA D DB DC Figure 2. . B. and D. A 2 B 2 5 Seattle = O 1 4 3 D 6 C 9 16 1 Figure 2. and D. The right column shows an enumeration of all permutations of the 4 cities.

• Dijkstra’s algorithm for ﬁnding the shortest path between two vertices in a network (see below) requires O(N 2 ) work in general. like O(2N ) or like O(N !).3 Complexity theory 13 Many other algorithms have been developed for the traveling salesman problem and for various special cases of this problem. If we double the size of the problem then the amount of work required roughly doubles. 2. Here are some examples of polynomial time algorithms: • Finding the largest of N values. Checking the value of each item and keeping track of the largest value seen takes O(N ) operations. In particular we want to know how fast this grows as N increases. If an algorithm requires O(N p ) work for some p then it is said to be a polynomial time algorithm. but many other problems can be put into this form even if they don’t involve “distance” in the usual sense. For some algorithms the work required increases much faster with N . • Solving a system of N linear equations in N unknowns using the Gaussian elimination algorithm takes O(N 3 ) operations. then it only takes O(N 2 ) operations using “back substitution”. We say that such an algorithm requires O(N ) work. • If the linear system is upper triangular. however. the number of vertices in the network. An O(N 2 ) algorithm requires work that is proportional to N 2 . say the traveling salesman problem. It’s fun to try to construct word ladders to go from some word to a very diﬀerent word with as few intermediate words as possible. there is no algorithm known which is guaranteed to solve all TSP’s in less than exponential time. e. meaning it is roughly cN where c is some constant. If we have an algorithm for solving the problem. For example. While many algorithms have been developed which work very well in practice.) 2. Such problems can generally be solved fairly easily on a computer unless N is extremely large. 2. we can make FLOUR into BREAD as follows: . like 1.g. 2..4 Shortest path problems A class of problems that arise in many applications are network ﬂow problems in which we wish to ﬁnd the shortest path between two given vertices in a network.4.7 for more about this. there is often a parameter N which measures the “size” of the problem. The “big Oh” stands for “order” — it takes order N work and we’re usually not too concerned about exactly what the size of the constant c is. The problem of ﬁnding the shortest route on a map is one obvious example. For most problems we will discuss with polynomial time algorithms. An introduction to many of these can be found in [LLAS90]. the more rapidly the work grows with N . and which apparently require an amount of work which grows exponentially with the size of N .3 Complexity theory For a particular mathematical problem. (See Section 2. or 3 in the examples above.1 Word ladders A word ladder is a sequence of valid English words. There are other problems which are essentially much harder. the value of p is typically fairly small. then we are interested in knowing how much work the algorithm requires as a function of N .2. each of which is identical to the previous word in all but one letter. The larger p is. For some algorithms the amount of work required (or time required on a computer) is linear in N . which means that doubling N increases the work required by a factor of 4. The traveling salesman problem is in this category.

Finding the shortest word ladder is equivalent to the shortest-path problem through a network if we assign every edge in the graph the weight 1.14 Graph Theory. that of ﬁnding the shortest route to all other vertices from a given vertex. and choose the shortest.2 Enumerating all paths and a branch and bound algorithm One approach to solving a shortest-path problem would be to enumerate all possible paths between the start and end vertices.3 Dijkstra’s algorithm for shortest paths Dijkstra’s algorithm is a systematic procedure for ﬁnding the shortest path between any two vertices. Suppose we are considering N -letter words. How many? In the worst case. just as for the TSP. so the work required is O(N ) in each of the N steps. To enumerate all paths. how do we ﬁnd the shortest path between them? For small networks this is easy to do by trial and error. 2. and the number of possible paths would be exponential in N . The set of all shortest routes is built up in N steps.4. In this case we could visit any or all of the other vertices on the way between the start and ﬁnish. how could we determine the shortest word ladder between them? (Assuming there is one. then we could in principle construct a graph where the vertices consist of all N -letter words in the English language. To implement it on a computer for a large network requires some suitable data structure for storing the network with information about which vertices are connected to one another and the weight of each edge. However.) This can be set up as a network ﬂow problem. The set of all possible paths could be searched more eﬃciently using a branch and bound algorithm. as is the case for a map. By the end we know the best route to every other node and in particular to the destination node. Given a network and two speciﬁc vertices. the next closest node in Step 2.2 for the TSP. there may be very many possible paths to choose between. or determine that none exists.4. and hence the total work is O(N 2 ). In fact this graph (with nearly 6000 vertices) has been constructed for N = 5 by Don Knuth and is available on the web (see the pointer from the class webpage). and so on. we could build a “decision tree” similar to what was illustrated in Figure 2. This requires examining at most N − 1 edges. Actually this algorithm solves the problem by embedding it into what appears to be a harder problem. But the weights may have nothing to do with physical distance and at any rate if there are thousands or millions of vertices and edges it may be impossible to even draw the graph. Word ladders and algorithms for manipulating these graphs are discussed at length in [Knu93]. We draw an edge from one vertex to another if those two words are simple mutations of each other in which only a single letter has been changed. there would be an edge connecting each vertex with each of the N − 1 other vertices. Each step requires examining the distance from the node that was “solved” in the previous step to all unsolved nodes which can be reached from this node. The rows . One way to represent a network or directed network on a computer is with a matrix. for this problem there turn out to be much more eﬃcient algorithms which are guaranteed to give the answer with relatively little work. 2. A word ladder then corresponds to a path through this graph from one word to the other. similar to that described for the TSP. Network Flows and Algorithms FLOUR FLOOR FLOOD BLOOD BROOD BROAD BREAD Given two English words with the same number of letters. which describes a general platform for combinatorial computing. by ﬁnding the closest node in Step 1. compute the length of each path. The problem with this approach is that for a graph with N vertices. particularly if the graph is drawn in such a way that the physical distance between the vertices roughly corresponds to the weight of the edge.

Each graph in Figure 2.e. It is described in many books. 2. then it is not hard to see that one (or more) of the edges can be removed without destroying the connectedness of the graph. In graph theoretic terms. A writing assignment will be to produce your own description of this algorithm. 2. {CE}(12). {BC}(53). it is added to the good set. [HL95] and [Rob84]. For the network shown in Figure 2. This is illustrated in Figure 2. Since the good set has no circuits.5 this order is: {AB}(8).4: Graphs with circuits and corresponding spanning trees In a spanning tree. {DE}(22). j) entry of the matrix is the weight on the edge from node i to node j (or.4 has N = 5 vertices. There are two wellknown algorithms for ﬁnding a minimal cost spanning tree in a connected network: Kruskal’s algorithm and Prim’s algorithm. a spanning tree). if there is no edge from i to j. and the two trees have e = 4 edges. The ﬁrst edges added to the tree from Figure 2. this is accomplished via a minimal cost spanning tree.3 is: O − 0 2 5 4 3 A − 2 0 2 9 1 B C − − 5 4 2 9 0 1 1 0 16 6 D − 3 1 16 6 0 O A B C D | | | | | We’ll go through some simple examples of Dijkstra’s algorithm in class. e. {CD}(10). Examples include highway construction and telephone line installation: the vertices represent cities and the roads or phone lines correspond to edges constructed so that there is some path between every pair of vertices..5 are {AB}. . Recall that a tree is a graph that is connected and has no circuits. that entry can be set to ∞). {BE}(54).2. clearly the least expensive way to connect the vertices will be via a graph with no circuits (i. A matrix representation of the network in Figure 2. Since each edge has a nonnegative cost associated with it. If all edges have been examined and the good set still does not have N − 1 edges. Each edge is then considered in turn and.5 Minimum spanning trees Another type of problem that arises in many applications is to connect a collection of vertices at the lowest possible cost. it means that the original graph must not have been connected.. If a group of vertices is connected by edges that do contain a circuit.g. Figure 2.5.5 Minimum spanning trees 15 and columns correspond to the nodes of the network and the (i. it is a tree. {AD}(63).4. This is continued until the good set has N − 1 edges. the number of edges is always one less than the number of vertices: e = N − 1.1 Kruskal’s algorithm Kruskal’s algorithm begins by sorting the edges of the graph in order of increasing cost. and this fact can be reported to the user. if it does not form a circuit with edges already in the good set. and {AE}(70).

so the algorithm is ﬁnished. for instance. A 70 63 E 22 D 10 12 C 8 54 53 B Figure 2. if there is more than one minimal cost spanning tree).3 Minimum spanning trees and the Euclidean TSP Often the weights on the edges of a network represent actual distances between cities or other destination points. For the graph of Figure 2. but it is found to form a circuit with edges already in the tree. This step is repeated until the tree has N − 1 edges or until there are no more edges joining vertices of T to vertices not in T . 2 2 2 so the work required is O(N log2 N ). In our example problem.16 Graph Theory. Network Flows and Algorithms {CD}. In the latter case. The number of edges in a graph is at most = N (N −1) . Edge {DE} is then examined. It then ﬁnds that edge in the graph joining a vertex in T to one not in T which has the lowest cost: {CD} in our example. namely. {CE}. {AB}. Do you see why Kruskal’s algorithm and Prim’s algorithm generate the same result? Both Kruskal’s algorithm and Prim’s algorithm are polynomial time algorithms. D. D. We then search for the lowest cost edge joining C. Now the tree has N − 1 = 4 edges. no matter where you start. {CD}. C. where N is the number of vertices and E the N number of edges in the original graph. and add edge {BC} to the tree. however.2 Prim’s algorithm Prim’s algorithm for ﬁnding a minimal cost spanning tree avoids the initial sorting phase of Kruskal’s algorithm. or E to one of the remaining nodes. when the minimal cost spanning tree is unique. and {CE}. so this edge is skipped and edge {BC} is added to the tree. In this case. Prim’s algorithm begins by choosing any vertex from the graph and including it in the tree T . The two algorithms generate the same result. it reports that the graph is not connected. vertex D.5.5. or E and A is added. When the weights on a network for the traveling salesman problem represent actual distances between points in a plane.5. and the costs also satisfy the triangle inequality: cost(AB) + cost(BC) ≥ cost(AC). and {BC}. Convince yourself that it generates the same minimal cost spanning tree (or one with the same cost. the problem is called a Euclidean TSP. The best implementations of these algorithms run in O(E log2 N ) time. the next edge to be added is {CE}. The edges of the minimum spanning tree are: {AB}. Finally the lowest cost edge between either B. 2.5: Minimum spanning tree 2. . the weight or cost of going from vertex A to vertex B is the same as that of going from B to A. we could choose. Exercise: Try using Prim’s algorithm with diﬀerent starting vertices.

This information can be used within a branch and bound algorithm to hone in on the actual optimal tour. and returns to the starting point. and return to the starting point. The total cost of this path is twice the cost of the minimum spanning tree. traverses each edge of the graph exactly once.6. not to backtrack over streets that have already been serviced. The o situation is pictured in Figure 2. where the minimum spanning tree is marked and the suggested path is ODOABCBAO. OABCBAODO. whose cost is within a factor of two of this value. this tour is ODABCO and has a total cost of 40. in scheduling snowplows or street sweepers. provides an upper bound. and the cost will be no greater. which has a total cost of 34 (and turns out to be the optimal tour for this problem). any traveling salesman tour that visits all of the vertices must cost at least as much as the minimal cost spanning tree: 25 for our example. 2. any path between nodes that visits other nodes along the way that have already been visited can be replaced by a direct route between the given nodes.6 Eulerian closed chains Still another type of graph thoeretic problem that arises in many applications is that of ﬁnding an Eulerian closed chain: A path that starts at a given node. A 5 O 7 10 5 D 13 9 C 12 11 7 8 B Figure 2. then the object is to plow or sweep each street once but. This multigraph is shown in Figure 2. let the land masses in the problem be represented by nodes and the bridges by edges in a multigraph (a graph that can have more than one edge between a pair of nodes). Path OD-O-ABC-BA-O costs twice as much as the minimum spanning tree. Thus one arrives at a tour whose cost is at most twice that of the minimum spanning tree.6: Euclidean TSP network with minimum spanning tree. Had we chosen the other path. it would have been replaced by the tour OABCDO. In fact. There are seven bridges and the question is whether one can start at a particular point. This gives a lower bound on the cost of the optimal tour. If the edges of the graph represent streets. there is an easy way to ﬁnd a route whose cost is within a factor of two of the optimal: Construct a minimum spanning tree. move along edges of the minimum spanning tree until you can go no further.7. cross each bridge exactly once. if there are three or ﬁve or . if possible. Since the minimal cost spanning tree is the lowest cost set of edges that connects all of the vertices. For the example in Figure 2. since it visits some nodes more than once. Note that if we start at a particular node in the multigraph and travel along one edge to a diﬀerent node. This is illustrated in Figure 2. it is said that graph theory was invented by Euler in the process of settling the famous K¨nigsberg bridge problem. But because the weights satisfy the triangle inequality. then we must leave that node along a diﬀerent edge. for example. while the tour derived above.2. Another possible path is OABCBAODO. This type of problem is also popular in many mathematical games. Then backtrack along these edges until you reach a point from which you can traverse a new edge to visit a new node or until all of the nodes have been visited and you can return to the origin. This problem might arise. The path constructed in this way is not a legitimate tour for the TSP. To see this in a more formal way.8.6.6 Eulerian closed chains 17 While there is no known polynomial time algorithm for solving the Euclidean TSP. If you try it for a while. you can probably convince yourself that the answer is “No”. which also traverses each edge of the minimum spanning tree twice. Moreover. If there is only one edge through the node then this will not be possible. and tour ODABCO is within a factor of two of optimal. Starting at the origin node.

we might go from C to E to F to G to E to C. The degree of a vertex is deﬁned as the number of edges through that vertex. so that the number of unused edges through each vertex remains even. all of the vertices in this multigraph have odd degree. The same holds for the starting node. (In fact. For example. so go to any vertex that has some unused edges through it: vertex C. vertex D has degree 3. for example. Moreover. for example. This path can be combined with our previous one to form the chain: . There is no Eulerian closed chain in Figure 2. then we can leave and return a certain number of times but eventually we will be at the starting node with one unused edge remaining to traverse. for example.8 since. But at that point we are stuck. because we must enter the node in order to use that edge. If there is only one edge through it.) A good way to establish the suﬃciency of Euler’s theorem is by describing an algorithm for computing an Eulerian closed chain in an even connected multigraph. Start at any vertex and travel to other vertices along unused edges until you can go no further. and repeat.8: Multigraph representing the bridges of K¨nigsberg o any odd number of edges through that node. it is the only one with an odd number of unused edges through it.7: The bridges of K¨nigsberg o A B D C Figure 2. Network Flows and Algorithms A B D C Figure 2. C. for every other vertex. we will have to enter then leave the node until there is only one unused edge remaining. A multigraph G has an Eulerian closed chain if and only if it is connected up to isolated vertices and every vertex of G has even degree. Therefore our path can end only back at the original vertex. and if there are an odd number of edges through it. we might start at A. we may not have traversed all of the edges of the multigraph. The condition that the graph be connected up to isolated vertices just means that there is no edge that cannot be reached at all. and then to A. once we leave the initial vertex. Consider. but then we cannot leave. back to B. At this point. then we can leave but never return.18 Graph Theory. go to B.9. Take one of the unused edges to a new vertex and again continue until you can go no further. then in order to traverse all of these edges. For example. Using the above sort of reasoning Euler proved the following: Theorem (Euler[1736]). D. For each of the vertices visited we use an even number of edges — one to enter and one to leave — each time we visit the vertex. if we enter we can also leave. the multigraph in Figure 2.

the set of all problems that can be solved in P-time. if we could search eﬃciently enough. If so. and so are many other famous and easy-to-state problems. N P. A problem which is not in N P is more clearly hopeless. and NP-complete problems C E 19 B A G D F Figure 2. might lie in the class P but probably don’t.g. The amount of work required to ﬁnd an Eulerian closed chain is proportional to the number of edges in the multigraph.. The TSP is NP-complete. Note that any problem in P is also in N P. [Sip97]. see for example [GJ79]. then every problem in N P can be solved in P-time and so in fact P = N P. Combining this with the previous chain gives an Eulerian closed chain: ABCEF GECD − F D − BA. Then it is possible to construct examples in which there are exponentially many paths in the solution set. and the Eulerian closed chain problem. There are still two unused edges in the multigraph. and NP-complete problems We have seen that some problems. are there problems in N P which are not in P? Or does P = N P? This is one of the great unsolved problems of theoretical computer science. 2. N P. The basic idea is that for a problem in this class it is at least possible to check one possible solution in polynomial time. and the exact deﬁnition is a bit complicated. e. For more discussion of such problems. such as the traveling salesman problem. NP stands for nondeterministic polynomial.2. which have the following property: if any one of them can be solved in P-time. . Is it a proper subset? In other words.9: Multigraph with an Eulerian closed chain ABC − EF GEC − DBA. For such problems there is at least a chance one could ﬁnd a P-time algorithm to ﬁnd the best path. such as the Hamiltonian circuit problem and the integer programming problem. nobody has yet found the P-time algorithm. however. so even writing down the solution if we knew it would take exponentially long and clearly there’s no hope of a P-time algorithm to solve the problem. the minimum spanning tree problem. for the TSP it is possible to compute the length of any one possible path with only O(N ) operations. Some problems. so P is a subset of N P. For example. This problem does lie in a larger class called NP. Such problems are in the class P. suppose we consider a variant of the TSP in which we specify a number B and we wish to ﬁnd all tours on a given set of N cities which have total length B or less.7 P. Some progress was made toward this problem when Karp [Kar72] showed that many hard problems lie in a set of problems within N P called the NP-complete problems. such as the shortest path problem.7 P. can be solved in a number of steps that is polynomial in the size of the problem. so we repeat by going from D to F and back to D.

Circle the webpage address on your printout or write it on the printout. Bookmark this page for future reference. then submit this information. For the chosen exercise.C. Write a description of the algorithm aimed at students at your level. The electrical work requires 4 units of time. and the drywall must be installed before it can be painted. and carefully write out the solution. you will often need to learn new math on your own from books at the math library or at Odegard. Choose one exercise from the ﬁrst chapter. Print out the page specifying how to write the bibliography entry for a book and submit this with your homework.) Read the ﬁrst 3 pages of the article. rooﬁng requires 6 units of time. (b) Go to the math library and ﬁnd a textbook on “Graph Theory”. from journal articles and from webpages. We will exchange papers and have a chance to comment on another person’s document. 5. (b) Give a formula for computing the number of DNA words of length n and justify your formula using the Basic Counting Principles discussed in class. Flooring requires 3 units of time if the walls have already been painted and 2 units of time otherwise. Discuss the computational complexity of a naive algorithm for ﬁnding an order of visits that minimizes the total travel time. A house has been framed and now several jobs remain to be done. Read the ﬁrst chapter of the book. but not in two consecutive times. (a) Count the number of all DNA words having length at most 3 by listing them all. each of which is one of four letters T. The roof must be on the house before either painting or ﬂooring work can be done. and ﬂooring must be installed. (a) Find a webpage which outlines standard format for a bibliography.20 Graph Theory. Explain it in such a way that someone reading only your document. Suppose that there are n phone booths in a region and we wish to visit each of them twice. Note. An algorithm will be presented verbally to you in class for ﬁnding the shortest path between two given nodes in a network. Network Flows and Algorithms 2. but try also to be fairly concise (2 pages at most). Electrical wiring must be installed.8 Exercises 1. could ﬁgure out the algorithm and apply it to solve a problem. Draw a network showing the possible scheduling orders for the diﬀerent jobs. who have not taken a course in graph theory algorithms (or this class). (Choose a diﬀerent one if you don’t like the ﬁrst one you open. The genetic code in a DNA molecule consists of a string of symbols. . 4. Explain what the goal of the algorithm is. you must submit all information requested for each item below to receive full credit so read each problem carefully.) Submit the bibliography entry for this textbook in the format found in (1a). As a mathematical modeler. (The UW reference librarians can be very helpful looking up research articles. so please make it easy on each other! 3. solve it. copy the problem.A and G. drywall must be put up and it must be painted. You will probably want to use illustrations and a speciﬁc example in order to make it clear. Painting requires 5 units of time if ﬂooring has already been installed and 3 units of time otherwise. a roof must be added. or other people with similar background. with no other books or lectures on the subject. and how it works. Here is a scavenger hunt to get you started on this process. Submit the full bibliography reference for this article and a paragraph description of the main problems addressed in this paper. and drywalling requires 3 units of time. The electrical work must be done before the drywalling. Write these up neatly enough that others can read them. (c) Find a journal article by Kruskal from 1956 that appeared in the Proceedings of the American Mathematical Society. 2. why it is important (perhaps in the context of a real problem).

Madrid. 6. How many possible routes are there for him to take? . Rome.) 7. You might want to experiment with similar problems where the answer isn’t so obvious. visit Paris. Below is a map of Oneway City. A number of locations are also indicated on the map. C. and C (ignore D for now) and two edges between each pair of vertices. Frankfurt. (b) Determine the fastest route from start to ﬁnish doing all jobs in series.2. B.8 Exercises 21 (a) Encode the constraints for this problem as the Hasse diagram for a partial order. B O C A D (a) Draw the complete directed graph with vertices corresponding to O. indicating the partial distances computed and indicating what parts of the tree you were able to prune oﬀ. one in each direction. and Vienna. Use the branch and bound algorithm to ﬁnd the best route. the distance from A to C is 1 block and from C to A is 11 blocks. Show the portion of the tree that you constructed in ﬁnding this route. I constructed it to make the branch and bound algorithm fairly simple to apply. The following table shows distances (in km) between some major European cities: London London Paris Madrid Frankfurt Rome Vienna 343 1261 634 1444 1237 Paris 343 1050 471 1117 1037 Madrid 1261 1050 1434 1377 1812 Frankfurt 634 471 1434 960 604 Rome 1444 1117 1377 960 765 Vienna 1237 1037 1812 604 765 (a) Suppose a traveling salesman wishes to start in London. and D and return to O. A. For example. Label each with the distance (number of blocks) one must travel to get from one location to the other. (b) Suppose a delivery van leaving from O must visit A. (It is pretty obvious what the best route is in this case. a poorly planned town with one-way streets as indicated. B. (each exactly once) and return to London.

How many subgraphs does G have? (Give an exact answer. then the previous equation for x3 . but if anyone rents the property for at least 4 weeks. Draw a graph showing the minimum spanning tree. so computing x4 = b4 /a44 takes 1 “multiply”. 8. etc. Suppose you own a summer rental property and at the beginning of the summer season.22 Graph Theory. Find a route that is within a factor of 2 of being optimal. For example.) Exactly how many “multiplies” does it take to solve a general n × n upper-triangular system? In particular. d s . How should you decide which requests to accept in order to maximize your income for the summer? Formulate this type of problem in general using a graphical model and solve the problem for the example above to demonstrate your method. and indicate the order in which you consider edges for inclusion in your tree. (c) Determine a lower and upper bound on the distance the salesman must travel and explain how you found these. Such systems are easily solved by back substitution (solve the last equation for x4 . you receive a number of requests for the property. in the case of a 4 × 4 system. Explain which algorithm you are using. how many to solve an upper-triangular system of 1000 equations? 9. An upper-triangular linear system has a11 a12 0 a22 0 0 0 0 the form a13 a23 a33 0 x1 a14 a24 x2 a34 x3 x4 a44 b1 b2 = b3 b4 for example. then they are charged $70 per week. assume you rent out the property for 10 weeks in total and you have received the following 6 requests specifying which week each renter would like to start and how many weeks they want to stay (duration) person start week duration A 1 1 B 1 4 C 3 2 D 5 3 E 7 2 F 9 1 Say the rate of renting the property is $100 per week normally.) c s s s a b 10.) How many multiplication operations does this require? (Count a division the same as a multiplication. Let G be the graph below. Network Flows and Algorithms (b) Construct a minimum spanning tree for the network that this table represents. for example.

2006. July-Aug. Try to come up with a realistic scenario in which it would not hold. . Then relate this to a “traveling salesman problem” with tij corresponding to the “distance” from Node i to Node j. 36. The amount of time spent switching might depend very much on what order they are done in. Interfaces. 12. Or a machine shop at a large aerospace company may need to manufacture many diﬀerent types of parts on the same set of machines. each requiring a special set up. and k represent three cities. Be prepared to answer questions on the paper in class. (b) For traveling salesman problems involving physical distances. one often has the “triangle inequality”: If i. no. For example a print shop may need to print diﬀerent sorts of posters and pamphlets. Read “The Missouri Lottery Optimizes Its Scheduling and Routing to Improve Eﬃciency and Balance” by Jang. . Write a critique (at most one page) of the paper. switching between two print jobs that require the same paper and ink colors might take much less time than switching between two jobs that require cleaning out one color of ink and loading another. For example. then the distance dij from i to j naturally satisﬁes dij ≤ dik + dkj . (a) Write a one paragraph summary of the problem presented in “The Missouri Lottery . Include both positive and negative aspects if any. pp 302-313. (b) Say you have been hired to evaluate the Missouri team who your company is considering hiring. The job scheduling problem arises in many industries. Also. Raskin and Perkins. Crowe. j. each requiring diﬀerent types of papers and types and colors of ink. This paper is included in the appendix to the course notes and on the class web site. We could also specify an “origin” node corresponding to the clean state found at the beginning of the day and required at the end of the day. In each case there is a certain amount of time required for each job. which might be independent of the order we do the jobs. but there may also be substantial time required to switch from one job to another. Write a second paragraph summarizing the techniques and merits of the mathematical model used. A shop has dozens of “jobs” that need to be done each day. “ from Interfaces.2. 4. . include your recommendation to your boss on hiring this team. suppose for each pair of jobs i and j we know the time tij required to switch from i to j. (a) Formulate a general scheduling job in mathematical language and determine the optimal order of jobs to minimize the total time spent on the entire task.8 Exercises 23 11. For starters. The analogous inequality might not hold for tij in job-scheduling problems. vol.

Network Flows and Algorithms .24 Graph Theory.

036 1. there are many situations which are out of our control or where things happen with a certain probability. A thorough understanding of statistical modeling is beyond the scope of this course. the mean of X.552 8.146 6.678 6.230 6.318 2.113 7. denoted µ or E(X). consider the following data set.668 8.066 7. . Also. These situations are called stochastic processes or random processes or probabilistic processes. For example.376 3. the number of games won by the Husky volleyball team.208 1. then the median equals xm where m = n+1 . If we capture the right information. and the daily Dow Jones Industrial Average are other examples. .616 25 5.273 6.739 9. your modeling capabilities can be enhanced by a brief introduction to some fundamental concepts from probability and statistics including the mean. i. 3.716 4. situations where the outcome is completely determined by our decisions. is deﬁned as µ= n i=1 xi n .136 6.939 9.199 7. the amount of time we have to wait for the next bus to arrive is a stochastic process. Let X be the number of minutes we wait for the bus on the ﬁrst 51 days of class.994 9. If n is odd.973 . For the data set X = {xi : i = 1.555 6. order the values xi in ascending (or descending order). 2 2 To analyze the behavior of averages. Otherwise. .253 5.006 9. To ﬁnd the median.352 6.Math 381 Notes — University of Washington —Winter 2011 Chapter 3 Stochastic Processes So far we have studied deterministic processes.488 9.859 7. In real life however.972 4. 3.e.447 3. then it helps us predict what time we will arrive at our destination.537 7. Still.400 5. the median equals the average of xm and xm+1 where m = n .893 1.947 7.048 5.759 0. 2.991 3.764 4.649 3.939 9.654 1.099 4. the number of salmon returning to spawn in the UW Hatchery each year.165 5.200 8.1 Basic Statistics In general terms.316 3. The primary example of a statistic is the average or mean. we can use the statistic to predict future outcomes. a statistic is a number that captures important information about a data set. . For example. n}.692 6. random variables and the Central Limit Theorem. median and variance of a probability distribution. if we learn that bus number 43 comes every 10 minutes on average during rush hour.344 5.827 8.

Larger data sets (say. Random 1 Some authors deﬁned the sample variance to be Var(X) = 1 n Pn i=1 (xi − µ)2 instead. Take a moment and reﬂect on why the median changes in this case. replacing 9.668 5.066 8.947 5.759 3. .146 1.483. The variance of the data set X = {xi : i = 1. This tells us we would probably never have to wait for the bus for more than 12 minutes.692 5. of size 1. In such a case.352 3. 2.313 3.654 3. 0.1) The variance for the original data set above is 6.827 9.939 2. .626 and 5.26 Stochastic Processes 7. On the other hand.253 4. .099 0. n} is deﬁned as 1 Var(X) = 1 n−1 n i=1 (xi − µ)2 .552 6. We will write this as P (Y = y) = 1 for y = 1. it is a variable that we cannot solve for explicitly but instead one that takes on a diﬀerent ﬁxed value each time we run a trial by rolling the die.716 3. 6.273 6.400 1. remained unchanged.715 and 5.994 The mean and median of X equal 5.318 6.972 4.316 9.616 9. respectively.739 6. 2. 3.313 7.113 5. Suppose instead. 2.447 9.488 5. Another important statistic is the variance of the data set.991 7. respectively. Equivalently.973 6. . Since a fair die is equally likely to roll any of the values from 1 to 6.2 Probability Distributions and Random Variables In order to model stochastic processes. Such a change in the mean is expected given the large change in a value of the corresponding data set. we had changed the largest value of X from 9.973. the value of the middle element. .000 or 10.136 4.657 5. which equals the median. albeit by a small amount.850 4.919 5.537 7.947. .589 1. .376 8.006 7.678 3. (3. we are deﬁning 6 a probability distribution function p(y) = 1/6 for y = 1.994 to another random number between 0 and 10.994 by 100 we get a variance of 180.649 6.589 To aid in calculating the median.165 7.036 7. Such a probability distribution can be written as y p(y) 1 1/6 2 1/6 3 1/6 4 1/6 5 1/6 6 1/6 In this example. Let Y denote any one of the six possible outcomes that can be observed when a fair die is rolled. Consider changing the largest value of X from to 9.484. . We will introduce the concepts of a probability distribution and a random variable through examples. 6.200 9. the following sorted list is also included.994 to 100. The variance gives us information about how close the data is centered around the mean µ. we ﬁrst need a mathematical notation that can describe a wide variety of situations.555 6. .230 8.919 3.850 0. say 5. .048 9. Y is a random variable. there is a 1/6th chance of rolling a given number.000) exhibit similar behavior although the change in the mean is seen in less signiﬁcant digits. . the mean still changes. . the median remains unchanged.764 7. the mean equals 7. On the other hand.199 4.939 8. .344 3.208 6. Our change to X altered only the value of the largest element.657 4. Now.284. the mean and median equal 5.893 5.859 1. Therefore. Now.

In contrast. each roll of the die is independent from the previous rolls. total # outcomes Therefore. In other words. Each time we roll. P (Y = y) = p(y). the histogram we construct in this experiment should have 6 columns with similar heights.25 0. A random variable Y is said to be discrete if it can assume only a ﬁnite or countably inﬁnite number of distinct values.15 0. For example.2 p(y) p(y) 0. Figure 3. the uniform distribution does not imply that a streak of 4’s is impossible. As any gambler will attest. say we roll the die 100 times and construct a histogram of the data.15 0.2 0.1 (b) is an example of a probability histogram for a nonuniform distribution. .1 0. the die has no memory of its last roll.1: Probability histograms (a) for a uniform distribution and (b) a nonuniform distribution.1 (a) depicts a probability histogram for a uniform distribution. Yet. 4.3 0. 5}. variables are very useful in probability because we can manipulate them just like ordinary variables in equations. we observe a diﬀerent value of Y .2 Probability Distributions and Random Variables 27 0. 500 and 5000 times. Another way to recognize a uniform distribution is by noting the probability distribution has the form 1 p(y) = .1 0.05 0. Back to our example.3 0. Every discrete random variable comes with an associated probability distribution function. In this histogram.05 0 1 2 3 y 4 5 6 0 1 2 3 y 4 5 6 (a) (b) Figure 3. the height of column i is determined by the number of times we roll i in this experiment. An experiment where every value is equally likely to occur is said to have a uniform distribution as its underlying probability distribution. one could write P (2 < Y ≤ 5) to mean the probability that a roll is among the numbers {3. Figure 3. 14 90 900 80 12 70 10 60 8 800 700 600 50 500 6 40 400 30 4 20 2 10 300 200 100 0 1 2 3 4 5 6 0 1 2 3 4 5 6 0 1 2 3 4 5 6 (a) (b) (c) Figure 3.3. Since Y has the uniform distribution. this type of probability is closely related to the study of counting methods known as combinatorics.2: Histograms for tossing a fair die 50.25 0.

3 (a).3. 6. . in the example E(Y ) equals 6 i/6 = 3.000 experiments will not produce a histogram that exactly replicates the distribution in Figure 3. Yn be independent and identically distributed random variables.1 (Central Limit Theorem) Let Y1 . There are many additional probability distributions. We will see more about convergence of histograms for diﬀerent experiments modeled with Monte Carlo methods in the next chapter. the resulting distribution of the sample average tends toward a normal distribution as sample size increases. Y2 . Note that even 10.28 Stochastic Processes To see this graphically. The mean of this distribution will be the mean of the population and the variance of this distribution will be the variance of the population divided by the sample size. . i=1 3. Theorem 3.2 (a).5. two populations may have widely diﬀering underlying distributions such as the uniform and nonuniform probability distributions in Figure 3. . 2. In fact.1. respectively. histograms of 50. 500.3 The Central Limit Theorem Let Y1 . If this is not readily apparent. and 5000 such experiments are depicted in Figure 3. . . . (b) and (c). Then the sample average Y tends to a normal distribution as the sample size tends to inﬁnity. if multiple. Yet. What can be said about the distribution of Y ? The Central Limit Theorem answers this question. The probability distribution for X is: x p(x) 1 1/8 2 1/3 3 1/12 4 5/24 5 1/12 6 1/6 Figure 3. respectively. 3. take a moment and compare our experiments to Theorem 3. . Then. . the underlying distribution can place the odds heavily in the “house’s” favor over many such experiments.1 Expected Value The expected value E of a discrete random variable Y with probability function p(y) is deﬁned to be E(Y ) = y yp(y).3 (b). Therefore. Let Y and X denote any one of the six possible outcomes available from tossing a fair and weighted die.2. Should this be expected? Indeed. Both histograms reﬂect a normal distribution. . Suppose we throw the fair die 50 times and record the sample average of the 50 outcomes. it is by the powerful Central Limit Theorem. The probability function p(y) = 1/6 for y = 1. independent experiments occur for either population. . Yet. Repeating such an experiment with the weighted die produces the histogram in Figure 3. + Yn ) is the random variable measuring the mean over n trials. As the number of experiments increases. . . . That is. larger sample sizes are needed for a histogram of sample averages to reﬂect the underlying normal distribution of such an experiment. 6. Take a moment and consider the power of this theorem. The theorem allows any population. the histograms converge toward the corresponding uniform distribution. The modeler’s challenge is to choose the one that most closely approximates the data. . . Repeating this 1000 times and plotting the results produces the histogram in Figure 3. . indeed.1 (a). from the perspective of a casino.1. Recall that the probability function for the rolling of a fair die is p(y) = 1/6 for y = 1. .3. Yn be independent and identically distributed random variables. guessing the value of a single roll of a fair die is mere chance even with knowledge of a previous outcome. respectively. Note that convergence is slower for highly skewed distributions. 2. .1 (a) and (b) are the probability histograms for Y and X. (This all holds providing the mean and variance of the population is ﬁnite and there is random sampling). Y = (1/n)(Y1 + Y2 + . Y2 . . Seen from the perspective of a gambler.

In (b). Let D denote the distance between the needle’s midpoint and the closest line.4: Buﬀon’s Needle experiment.8 3 3. Parallel computing meant calculating with a large number of mathematicians. The probability of the needle intersecting a line 2 is the ratio of the area of the shaded region to the area of the rectangle.4 (b). This is the continuous version .2 3. laptops and calculators.8 4 4.5.2 3.3. Such hand computation limited the feasibility of mathematical experiments before the advent of the digital computer.) Here both D and θ are random variables. The needle crosses a line if D≤ 1 sin θ.4 Buﬀon’s Needle In 1777. (Refer to Figure 3.4 3.6 3.4 0 2.8 4 4. and let θ be the angle of the needle to the horizontal. the term “computer” referred to a job title.4 Buﬀon’s Needle 29 300 250 250 200 200 150 150 100 100 50 50 0 2.2 (a) (b) Figure 3.6 3. a straight needle of unit length is dropped at random onto a plane ruled with straight lines of unit length apart as depicted in Figure 3.4 3. Speciﬁcally. What is the probability p that the needle will intersect one of the lines? Before the time of digital computers. D denotes the distance between the needle’s midpoint and the closest line. 2 Consider the plot of 1 sin θ as seen in Figure 3.4 2.6 2. 3. θ is the angle of the needle to the horizontal.3: Histograms taken from sample averages (a) for a uniform distribution and (b) a nonuniform distribution. Comte de Buﬀon performed the experiment repeatedly to determine p.4 (a). (a) depicts the experiment where a needle of unit length is randomly dropped between two lines of unit length apart.8 3 3. & & 1 unit E & & & & T 1 unit & & c θ r& & & & D T 1 2 sin θ c (a) (b) Figure 3. Comte de Buﬀon described an experiment that relates the real number π to a random event. Comte de Buﬀon also derived the value of p mathematically.2 4.

Yet. The area of the rectangle is clearly π/2. the next time that you drop a pencil. you have completed the ﬁrst step in a Monte Carlo experiment in estimating π.14 Figure 3. .5 1/2sinθ 0 0 θ 3. of the uniform distribution.30 Stochastic Processes 0. The area of the shaded region is π 0 1 sin θ dθ = 1. 1 π 2 = 2 .5: Plot of 1 2 sin θ aids in analysis of Buﬀon’s Needle. π Just imagine. 2 which implies that the probability of a hit is p= Hence. it takes heroic patience before an accurate estimate emerges. We will see in class that Buﬀon’s Needle experiment indeed produces an estimate of π but is slow to converge. an estimate of π is 2 total number of drops total number of hits .

What is the probability that the ages of a random subscriber is (a) more than 35. She concludes that if ﬁghter pilot Woody returns safely from 49 consecutive missions. In a horse race.000. The ages of subscribers to a certain mailing list are normally distributed with mean 35. Say 4000 of these tickets pay oﬀ $30 each.5 years? (b) between 30 and 40 years? 8.3. he should return home before the ﬁftieth mission. If the ﬂight is scheduled to depart at 9:00am and preparation begins at 8:37am. What is E(R). In a lottery every week. A card is drawn at random from an ordinary deck of 52 cards. 6. The odds against the second horse winning are 7 to 3. (a) What is the probability that it is a black ace or a red queen? (b) What is the probability that it is a face card or a black card? (c) What is the probability that it is neither a heart or a queen? 5. Let R be the random variable which determines the face value of the red die. .8. Consider the random variables R and G as in Problem 1. and G be the random variable which determines the face value of the green die. and R + G? 2. Prove that V AR(X + Y ) = V AR(X) + V AR(Y ) if X and Y are independent random variables. 1. ﬁnd the probability that the plane is prepared for departure on schedule. 2.000. (a) What is the population (possible values) for R. In a certain country. G. This book is also on reserve in the Engineering Library for the class. one green and one red. Consider rolling two fair dice. the odds in favor of the ﬁrst horse winning in an 8-horse race are 2 to 5.5 Exercises Recommended Reading Assignment: Pages 710-729 from ”Operations Research: Applications and Algorithms” by Wayne Winston. the probability is 49 that a randomly selected ﬁghter plane returns from a 50 mission without mishap. and E(R + G)? 3. G. and R + G? (b) What is the probability distribution for R. it takes a random amount of time between 20 and 27 minutes. Explain any assumptions you need to make. Is Mia right? Explain why or why not. E(G). To prepare an aircraft for departure. 500 pay oﬀ $800 each. and no ticket pays oﬀ more than one prize. 9. (a) What is the expected value of the winning amount for a player with a single ticket? (b) What is the expected value of the winning amount for a player with ﬁve tickets? 7. and one ticket pays oﬀ $1.5 Exercises 31 3.5 years and standard deviation 4.000 tickets are sold for $1 apiece. What is the probability that one of these two horses will win? 4. Mia argues that this means there is one mission with a mishap in every 50 consecutive ﬂights.200.

32 Stochastic Processes .

important advancements in their use as a research tool occurred during World War II with mathematicians including Neumann and Ulam and the beginnings of the modern computer. The question is: which strategy is better. or is there another that is better yet? Think about some of the questions that might naturally arise if you were going to try to determine an answer. the customer will go elsewhere. but if there aren’t any customers for several weeks in a row they will accumulate. 4. With our knowledge of statistics and ideas regarding the use of random numbers in simulation. On the other hand. While Monte Carlo methods have been used for centuries. and economics. Monte Carlo methods are essentially carefully designed games of chance that enable research of interesting phenomenon. Note that no essential link exists between such methods and a computer. computers enormously enhance the methods’ eﬀectiveness. mechanics. Monte Carlo methods are used eﬀectively in a wide range of problems including physics. but they may often be caught without any. They are trying to decide on the best strategy for ordering the 150 gallon tanks. There’s probably less chance of being caught without one in stock.1 A ﬁsh tank modeling problem The XYZ Tropical Fish Store has come to us for advice on the following problem. so they won’t waste much shelf space. What questions should you ask the store owners before you even try to model it? What 33 . Yet. They carry 150 gallon ﬁsh tanks which are quite large and take up a lot of shelf space in the store. They are considering these two in particular: 1. Yet. Buﬀon’s Needle is a classic example of Monte Carlo methods and the ability of random numbers to generate meaningful results. Moreover. It can seem surprising that meaningful insights can be made from such results. They observe that on average they only sell about 1 each week. we are ready to examine problems that we can model and study with Monte Carlo methods.Math 381 Notes — University of Washington —Winter 2011 Chapter 4 Monte Carlo Methods Monte Carlo methods model physical phenomenon through the repeated use of random numbers. Then they will never have more than one in stock at a time. Monte Carlo methods are used in a wide variety of ﬁelds to model the physical world. Today. Strategy 1: Order a new tank each time one is sold. so they don’t want to have too many in stock at any one time. it takes 5 days for a new one to be delivered by the distributor after they place an order. Since they sell one a week on average it may be good to have one arrive each week on average. 2. Strategy 2: Order a new tank once a week. these are a high-proﬁt item and if they don’t have one in stock when a customer comes in. arriving 5 days later.

We’ll also assume that “an average of once a week” means that each day there is a 1/7 chance that a customer will come. in matlab. A more complete program which also allows comparison of diﬀerent strategies can be found in Section 4. Later we’ll consider this further. relative to other items in the store that would have to be eliminated in order to stock more ﬁsh tanks? (This is one thing we would need to know in order to make any sort of quantitative comparison of diﬀerent strategies. . In this case there is always at most one tank in stock. The probability of a customer each day is 1/3 and it only takes 2 days for a new tank to be delivered after ordering.5.) • What is the cost of having additional tanks in stock? What is the cost of losing a customer by not having a tank available? (Besides the lost sale itself. Here we consider the two simple strategies mentioned above. and see which pays oﬀ better.2 Monte Carlo simulation The approach we will consider at this stage is a simulation of how the competing strategies might work. which often have to decide on an ordering strategy for many diﬀerent products simultaneously. even thousands for a grocery store. The program has been simpliﬁed by leaving out some commands to keep track of totals and display results. if it is greater than 1/7 we can say that a customer does not come. there might be the cost of losing that customers business in the future. The parameters have been changed here to make it possible to see more action in a short simulation.) • What do you mean by “we sell one a week on average”? Is there one person in town who comes in every Wednesday at noon and always buys one tank? Or are there 52 people in town who always give a tank as a Christmas present. for example. Here are a few examples: • How much is the proﬁt on one of these ﬁsh tanks. In particular. we’ll assume the sales are uniform over the year. This means we write a computer program which simulates several weeks of how the store’s inventory ﬂuctuates assuming one strategy or the other. but we can model the arrival of customers at random times by generating random numbers on the computer and using these values to indicate whether a customer arrives on each day or not. if they decide to buy all their ﬁsh and supplies from the same place they buy the tank. Trying to start with a completely realistic model is usually hopeless. Note in making these assumptions that we follow a primary rule of model building: It is best to start with a simple model and reﬁne it incrementally. which simpliﬁes the program. On the next page is a simple matlab program which simulates the strategy of ordering a new ﬁsh tank only when the one in stock is sold. The program prints out information about what happens each day and keeps track of statistics over the course of the simulation. This is a classic example of an inventory problem. Variants of this problem are constantly being solved by businesses. For example. and downloaded from the class webpage. so there are lots of sales in the month of December and none in other months? Either scenario would lead to “one a week on average” over the year. Of course we don’t know exactly when customers arrive. If this number is less than 1/7 we can say that a customer comes. based on some simpliﬁed assumptions of customer behavior.34 Monte Carlo Methods other information would you need to know before you could hope to evaluate the strategies? Thinking about this should help to suggest what is important in a mathematical model. 4. the command rand(1) generates 1 random number which is uniformly distributed between 0 and 1. This isn’t completely realistic: surely there’s some chance that two customers will appear in the same day. but would lead to very diﬀerent ordering strategies.

. end else lost = lost+1. end % loop on weekday end % loop on week % output total statistics now. lost = 0.. for week = 1:3 for weekday = 1:7 sold = 0. % probability of a customer each day days_for_delivery = 2. end random_num = rand(1). % number of tanks in stock % number of days until delivery of tank on order % -1 means none on order total_cust = 0. if random_num < a customers = 1.1: First pass at a matlab program for Monte Carlo simulation of the ﬁsh tank modeling problem. total_lost + lost. % we sold a tank and now order % another one. if deliv < 0 deliv = days_for_delivery. % display results for % this day disp([week weekday stock customers sold lost]). total_sold + sold. else customers = 0. total_sold = 0. Figure 4. end % a new tank is delivered % days till next delivery % generate random number % use this number to tell if a customer arrived if customers==1 if stock>0 % we have a tank to sell the customer sold = sold+1. stock = stock-1. total_lost = 0. % days from order to delivery of new tanks stock = 1. if deliv == 0 stock = stock+1. end if deliv >= 0 deliv = deliv-1.2 Monte Carlo simulation 35 a = 1/3.4... deliv = -1. % we didn’t have a tank and lost a customer end end % if customers==1 % keep track total_cust = total_sold = total_lost = of total statistics: total_cust + customers.

6054 0.36 Monte Carlo Methods Here are the results of one sample simulation: week 1 1 1 1 1 1 1 2 2 2 2 2 2 2 3 3 3 3 3 3 3 day 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 stock 1 1 1 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 1 1 1 cust 0 0 0 1 0 0 1 1 0 1 0 1 0 0 1 1 0 0 0 0 0 sold 0 0 0 1 0 0 1 0 0 1 0 0 0 0 1 0 0 0 0 0 0 lost 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 total over entire simulation: customers 7 sold 4 lost 3 It’s hard to come to any conclusion from this one short simulation. the numbers could have been very diﬀerent. To get a better feel for how this strategy works in the long run. each for 104 weeks: customers 241 223 255 266 237 235 255 250 227 249 sold 147 135 149 158 144 144 148 147 138 152 lost 94 88 106 108 93 91 107 103 89 97 fraction served 0.6076 0.5880 0.5843 0. For example we might run it for 104 weeks (2 years) rather than only 3 weeks. Much . If they had appeared diﬀerently.5804 0. In all of the above results we have roughly this number.6128 0.6100 0.6104 Notice that we expect roughly 104 × 7/3 = 243 customers to appear over this time period (since we’re using the rate of 1 every 3 days). we can run the simulation for many more weeks. The fact that we could only serve 4 of the 7 customers was due to the particular pattern in which they arrived.6079 0. We might also run the simulation several times over to see if we get similar results even though the exact pattern of customer arrivals will be diﬀerent in each case. Here are the results of 10 diﬀerent simulations.5940 0.

585 0. 4.561 0 0 0 0 0 1 0 0 1 1 660.577 0. fixed_delivery = 0..584 0. The variable order_when_out is a ﬂag which determines whether to order a new tank whenever we run out. Here are the results of a few 104-week simulations with each strategy. This proﬁt is based on the assumed proﬁt per sale.1 Comparison of strategies Section 4. We will study this later in the quarter. This might be one number we want to use in comparing diﬀerent strategies.610 0.9000 .00 830. We also compute the “proﬁt” for each simulation and the average proﬁt over 100 simulations with each strategy. we could serve about 60% of the customers who come looking for a 150 gallon tank.00 680.594 0. Strategy 1 corresponds to order_when_out = 1. Strategy 2 can easily lead to overstock. The variable fixed_delivery determines whether there is a ﬁxed schedule of deliveries. This depends not only on how many customers we serve or lose. More important to the business.00 710..00 670. For Strategy 1 tested above.4. and cost per night of overstock as illustrated in the program.537 0. while Strategy 2 uses order_when_out = 0.574 0. % arrivals every 7th day % no standing order Percentage of customers served is one important number to compare.656 overstock 0 0 0 0 0 0 end_stock 1 1 1 0 1 0 profit 830.00 760. By doing the above simulations we came to the conclusion that with this strategy and particular set of parameters.00 average_profit = 787.2 Monte Carlo simulation 37 more interesting is the fact that in each simulation we end up being able to serve about 60% of these customers. Strategy 1: customers 106 100 104 104 94 90 etc. 108 101 94 99 98 sold 63 61 58 60 55 59 lost 43 39 46 44 39 31 fraction_served 0.00 840.00 700. might be the expected proﬁt over time with diﬀerent strategies. however. but also on the cost of any overstock.00 760. Markov chains.00 870. cost of losing a customer.558 0.5 contains a more complex program for this Monte Carlo simulation which allows comparison of diﬀerent strategies. if we use the theory of Markov chains. It turns out that we can predict this number using mathematics without doing any simulations.2. once every so many days. there is never more than 1 tank in stock and so no overstock.00 58 59 54 61 55 50 42 40 38 43 0. fixed_delivery = 7.616 0.

000 0. The proﬁt for each individual simulation is typically not too far from the average. • Note one problem with Strategy 2 as implemented so far: There is no mechanism to stop automatic delivery if the stock goes too high.00 10 512. 4.990 0. Figure 4.4 shows histograms of the performance of this strategy for two diﬀerent values of N .990 1.50 1583.954 0. • The results for Strategy 1 have much smaller variance than for Strategy 2.3 A hybrid strategy A better ordering strategy might be to have a standing order for a tank to arrive every N days.967 0.50 1455.38 Monte Carlo Methods Strategy 2: customers 93 99 98 96 99 87 106 etc. and then to also order a new tank whenever we run out. N = 9 and N = 14.794 0.989 1.2.953 0. while with Strategy 1 the percentage is around 60%. If the business is lucky.00 94 103 101 104 104 88 90 101 100 0 23 5 27 5 3 1 0 1 1.50 958.50 average_profit = 342.50 70. usually above 90%.989 0.50 -292.990 2867 493 2023 709 637 6135 4165 3899 3755 11 2 4 1 1 17 15 4 5 446. . Customers are rarely lost but there’s a large cost associated with this overstock which reduces proﬁt. However. By the end of two years the stock has grown to 20 tanks. This is seen more strikingly by plotting histograms of the proﬁts over each set of 100 simulations. but there is a great risk of being unlucky. Since customers arrive at the same average rate as new tanks.50 1711.00 7 401..000 0.50 112.50 4 245.3 shows the stock of tanks as a function of day in the course of one particular unlucky simulation. Figure 4. Some simulations give much higher proﬁts than Strategy 1 while many simulations resulted in negative proﬁts (a net loss). a high stock will tend to stay high. as shown in Figure 4.000 0. This is because Strategy 2 generally leads to an overstock.00 18 -1665. For Strategy 2 the values are all over the place. In the program N is denoted by fixed_delivery.50 7 1049.817 0.953 overstock 4795 1802 2716 3690 3098 6811 3450 end_stock profit 13 -567.00 9 75. the average proﬁt is larger with Strategy 1.969 1. where N is larger than 7.7050 Note some interesting features of these simulations: • The percentage of customers served is generally larger with Strategy 2. 94 126 106 131 109 91 91 101 101 sold 92 98 95 96 98 87 101 lost 1 1 3 0 1 0 5 fraction_served 0.000 0. proﬁts may be higher with Strategy 2..50 -1337.

4: Histograms of proﬁt observed over 100 simulations with hybrid strategies.3: History of number of tanks in stock each day for one particular simulation with Strategy 2.2: Histograms of proﬁt observed over 100 simulations with two diﬀerent strategies. . Hybrid Strategy with $N=9$ 30 30 Hybrid Strategy with $N=14$ 25 25 20 20 15 15 10 10 5 5 0 −1500 −1000 −500 0 500 1000 1500 2000 0 −1500 −1000 −500 0 500 1000 1500 2000 Figure 4.4.3 A hybrid strategy 39 Strategy 1 30 30 Strategy 2 25 25 20 20 15 15 10 10 5 5 0 −1500 −1000 −500 0 500 1000 1500 2000 0 −3000 −2500 −2000 −1500 −1000 −500 0 500 1000 1500 2000 Figure 4. 25 20 15 10 5 0 0 100 200 300 400 500 600 700 800 Figure 4.

then the average (sample mean) is deﬁned as y= ¯ The sample variance is s2 = and the sample standard deviation is s= 1 N −1 N 1 N N yi . We can view the N values observed as N observations of a random variable chosen from some distribution. To get a better idea of the underlying distributions. For a normally distributed random variable. . yN for the observed proﬁt.) will lie somewhere between 600 and 1000.5 shows histograms of the proﬁts observed when 5000 simulations are performed with each strategy. This is the realm of statistics. . etc. ¯ √ s2 . With a large enough sample. .2.2. Typically we would also like to be able to quantify how conﬁdent we are in the conclusions we reach. For Strategy 1 we merely conﬁrm what we already guessed from the 100 simulations of Figure 4. One statistic we have already looked at is the average proﬁt observed for each strategy.2. i=1 i=1 (yi − y)2 . Even if the average observed with Strategy 2 were higher than the average proﬁt observed with Strategy 1.4 Statistical analysis In comparing the results of simulations with diﬀerent strategies. . it might be the better strategy.) Looking at the histograms of Figure 4. So if the proﬁt with .. Figure 4. The point of doing many simulations is to get some experience with what range of diﬀerent outcomes they might reasonably expect to observe in practice. In practice the owners of the store don’t really care about the average behavior over hundreds of simulations. and the point of doing a Monte Carlo simulation is to gain some information about the distribution. If one strategy gives a larger average proﬁt over many simulations than another strategy. This follows from the Central Limit Theorem as we will be better able to understand after studying the Markov Chain interpretation of this model. With Strategy 2. the sample mean should approximate the mean of the true distribution and the sample variance should approximate its variance. The sample mean from this particular set of runs is about 783 and the standard deviation is about 83. it can be shown that the value observed will be farther than twice the standard deviation away from the mean only about 5% of the time. If we do N simulations and obtain N diﬀerent values y1 . Comparing the histograms of Figure 4. we need to be able to reach sensible conclusions from a large quantity of “experimental data”. In fact it can be shown that the distribution of observed proﬁts should be approximately normally distributed (Gaussian). The distribution has a fairly “Gaussian” shape with almost all results lying between 600 and 1000. we see that 100 trials gives more information about the distribution of results in Strategy 1 than with Strategy 2. How big a sample is “large enough”? That depends on the distribution. Typically we don’t know what this distribution is. we might repeat these experiments with many more simulations. On the other hand we have also observed that some strategies give results with much greater variance between simulations than other strategies.40 Monte Carlo Methods 4.. the strategy with the smaller variance might be preferable in practice because there would be less risk involved. how sensitive these conclusions are to the parameters we chose in the model. for example. Here we get a much better sense of the distributions. we see that with Strategy 1 we might be able to predict fairly conﬁdently that the proﬁt which will actually be observed (assuming the model is good enough . y2 . and sensible analysis of Monte Carlo simulations generally requires a good knowledge of this ﬁeld. they care about what will happen in the one real experience they will have with whatever strategy they adopt. we can only predict that it will probably lie somewhere between -2000 and 2000. (Assuming of course that the model is good enough to match reality in some sense.

% print column headings: % profit from selling one tank % loss from losing a customer % cost of each tank overstock per night . % initialize: profit = zeros(nsim.1). nweeks = 104.5 that the distribution of proﬁts is certainly not a normal distribution. The average proﬁt observed is 439 and the standard deviation is 1020. fixed_delivery = 12. % probability of a customer each day % days from order to delivery of new tanks % = 1 ==> order a new tank when stock==0 % = 0 ==> don’t order when out of tanks % >0 % ==> standing order for a new tank every so many days days_for_delivery = 5. ndays = 7*nweeks.4. order_when_out = 1.5 Fish tank simulation code % number of different simulations to do % number of weeks in each simulation % number of days in each simulation % set parameters: nsim = 100.50.5 Fish tank simulation code 41 Strategy 1 500 350 Strategy 2 450 300 400 250 350 300 200 250 150 200 150 100 100 50 50 0 300 400 500 600 700 800 900 1000 1100 1200 1300 0 −6000 −5000 −4000 −3000 −2000 −1000 0 1000 2000 3000 Figure 4. The fact that a very large overstock can accumulate accounts for the wide spread to the left. 4. For Strategy 2 we see in Figure 4. a = 1/7. Strategy 1 is essentially normally distributed then we expect that the proﬁt observed should lie between 783 − 2 · 83 = 617 and 783 + 2 · 83 = 949 about 95% of the time. % profits and losses: saleprofit = 20.5: Histograms of proﬁt observed over 5000 simulations with two diﬀerent strategies. which partly accounts for the sharp cut-oﬀ at the higher end. overstockloss = . Note that the proﬁt would not be expected to exceed about 104 · 20 = 2080 even with a perfect strategy (why not?). In 26% of the simulations the observed proﬁt was negative. We will discuss some of these in class. lostloss = 10. and various ways that the model might be improved. There are many other statistical questions that one might ask about the data obtained from these simulations.

% a new tank is delivered % at the beginning of the day end if deliv >= 0 deliv = deliv-1. lost = 0. total_sold = 0. stock = stock-1. % days till next delivery end if (mod(7*week + day. % array of random numbers to use each day total_cust = 0.1).42 Monte Carlo Methods disp(’customers sold lost fraction_served overstock end_stock profit’) for sim=1:nsim % initialize: random_nums = rand(ndays. end end % we have a tank to sell the customer % we didn’t have a tank and lost a customer if (order_when_out & stock==0 & deliv < 0) . % increment every night by number of excess tanks in stock % main loop for a single simulation: day = 0. else lost = lost+1. total_lost = 0. stock = 1. for week = 1:nweeks for weekday = 1:7 day = day+1. end if customers==1 if stock>0 sold = sold+1. % number of days until delivery of tank on order % -1 means none on order overstock = 0. end % use random number to decide how many customers arrived % Here assume 0 or 1: if random_nums(day) < a customers = 1. if deliv == 0 stock = stock+1. else customers = 0. % day in the simulation sold = 0. % number of tanks in stock deliv = -1. fixed_delivery) == 0) % A new tank is delivered every so many days regardless of stock stock = stock+1.

. total_lost = total_lost + lost..1). % order another end if stock > 1 overstock = overstock + (stock .3f %11. stock_record(day) = stock. average_profit)]) % standard deviation: variance = sum((profit .0f %12. end % keep track of total statistics: total_cust = total_cust + customers. .-1500:50:2000) axis([-1500 2000 0 30]) 4.0f %8.0f %12.0f %6. total_sold = total_sold + sold. profit(sim))) end % loop on sim % compute and print average profit over all simulations: average_profit = sum(profit) / nsim disp(’ ’) disp([’ average profit = ’ sprintf(’%10. total_lost.2f’.total_lost*lostloss .4.2f’. stock.^2) / (nsim-1). disp([’ standard deviation = ’ sprintf(’%10.. total_sold..6 Poisson processes In the Monte Carlo simulation of the ﬁsh tank inventory problem. The matlab code contained the following lines: random_num = rand(1). . profit(sim) = total_sold*saleprofit .. standard_deviation)]) disp(’ ’) % plot histogram of profits over all simulations: hist(profit. overstock. we assumed that each day there was a probability a of a customer coming in.. fraction_served. if random_num < a customers = 1.average_profit). total_cust.2f’.overstock*overstockloss.0f %8.6 Poisson processes 43 % none in stock and none on order deliv = days_for_delivery. % keep track of stock on each day end % loop on day end % loop on week fraction_served = total_sold / total_cust.. standard_deviation = sqrt(variance). % generate random number % use this number to tell if a customer arrived . % output total statistics: disp(sprintf(’%6.

seconds or even smaller) and assume that in each small piece there is really no chance that more than one customer will appear. i. end This assumes that each day there is either 0 or 1 customers. minutes. p0 + p1 + p2 is very close to 1.e.m from the webpage. and clearly unrealistic to assume there will be either 0 or 1 customer. (Note that if N is large enough then p = a/N will be very small even if a > 1. More generally let a represent the average number of customers per day. since some days we would expect 2 or even more to appear if they come at random times. we might want to simulate a situation in which there are typically many customers each day. This is not very realistic. (Generated using poisson.. and ignore the possibility of more than one customer. and that the probability of one customer appearing is p = a/N . instead of only one customer every few days. It would be better if we could predict the probabilities pk that exactly k customers will appear in a given day and then use something like: random_num = rand(1). so it is valid to view this as a probability. % assume negligible chance of more end end end Here we have assumed that a is small enough that the chance of more than 3 customers is negligible. . We could split a day up into N much smaller pieces (hours. One way to estimate these probabilities would be to do a Monte Carlo simulation. etc. This isn’t very realistic if a is somewhat larger. We could simulate many days and see what fraction of the days there are k customers for each k. else if random_num < p0+p1+p2 customers = 2. else customers = 3.44 Monte Carlo Methods else customers = 0. If a is very small then we can interpret a as the probability that a customer comes on any given day. Moreover.) Each string has 20 bits. else if random_num < p0+p1 customers = 1. If a > 1 then it is impossible to interpret a as a probability. % generate random number if random_num < p0 customers = 0. The ﬁrst “day” two customers arrived.) Here are the results of a 10 trials with N = 20 and a = 1. the second day only one. with 0 representing “no customer during this interval” and 1 representing “a customer during this interval”.

0004 Run 2 0. p1 ≈ 0. 40% had 2 customers.36. 000 days. If we had run it for only 100 days each we would see quite a bit of variation from run to run and would know that we don’t have reliable values.0001 Poisson distribution.37. Below are the results of simulations with 10. pk = k! In particular. and that for any value of a.0596 0.0024 0. 30% of the days no customer arrived.18.0151 0.4.0122 0. p0 p1 p2 p3 = e−a = ae−a 1 2 −a a e = 2 1 3 −a a e = 6 Note that by running the simulation 3 times we get some reassurance that the value of 10.0021 0. and they compare well with the values obtained from the three 10. In fact we can compute these values without doing a simulation. Run 1 0 1 2 3 4 5 6 7 customers customer customers customers customers customers customers customers 0.0031 0.0613 0. It can be shown mathematically that probabilities follow a Poisson distribution.1837 0.3717 0.3722 0. 20% had 1 customer.0025 0. This was repeated 3 times and we get slightly diﬀerent results each time. Here it seems fairly clear that p0 ≈ 0. To estimate the probabilities more reliably we should do a simulation with many more days.3580 0.3606 0.0646 0 0.000 is large enough to give values that don’t change much when we rerun the simulation.3767 0. p2 ≈ 0.0002 Run 3 0.6 Poisson processes 45 00000010001000000000 00000100000000000000 00010100000000000000 00000000000000000000 00000000000000000100 00000000000000000000 00000000000000000000 00000000100000000010 00110000000000000000 10100000000010000000 In this short simulation.0575 0 0. Note that ∞ pk k=1 = e−a 1 + a + a2 a3 + + ··· 2! 3! = e−a ea = 1.0005 0.1911 0. etc.0001 Poisson 0.3686 0.000-day simulations. The values for the case a = 1 are shown in the last column of the table above.1883 0.3679 0.1839 0.0153 0. If we ran another 10-day simulation we’d probably get diﬀerent results. the probability of observing exactly k customers in a single day is ak e−a . .0004 0.0120 0. the average number of customers per day.3679 0. and there was 1 day (10%) when 3 customers arrived.

We have (x + y)N = xN + The values are N k N 1 xN −1 y + N 2 xN −2 y 2 + · · · + N N −1 xy N −1 + y N . . We can see this easily using the fact that (x + y)3 = x3 + 3x2 y + 3xy 2 + y 3 with x = 1 − p and y = p. 2. .4) p1 p2 (N ) N 2 etc. ﬁrst consider a simpler case where we look at a sequence of only 3 Bernoulli trials. without regard to the past history.2) Note that these probabilities sum to 1. The arrival of customers is then said to be a Poisson process. The sequence of tests to see whether there is a customer in each interval also called a series of Bernoulli trials. The probabilities we seek p0 (N ) (N ) = (1 − p)N = = N 1 (1 − p)N −1 p = N (1 − p)N −1 p (1 − p)N −2 p2 = 1 2 (N − N )(1 − p)N −2 p2 2 (4. and the number of ways it is possible to have 0. events occurring over the entire day.1) (1 − p)3 p3 (1 − p)p2 (1 − p)p2 (1 − p)2 p (1 − p)p2 (1 − p)2 p Note that there are 3 ways to observe 1 customer during the entire day. with a probability p of “success” in each trial (denoted by 1) and probability 1 − p of “failure” (denoted by 0). In fact we can use this type of binomial expansion to easily compute the (N ) probabilities pk when N is larger without enumerating all 2N possible strings and counting things up. There are 23 = 8 possible strings that can arise from these 3 independent trials.3) (“N choose k”) are often called the binomial coeﬃcients. To see how these values arise. (4. 1. the probability of observing k customers when N = 3: 0 customers (1 − p)3 = p0 (3) (3) 3 customers p3 = p3 2 customers 3(1 − p)p2 = p2 (3) 1 customers 3(1 − p)2 p = p1 (3) (4. . Adding up the probabilities we ﬁnd the (3) following values for pk . but only 1 way to observe 0 or 3 customers. . For this to be valid we must assume that customers appear randomly with probability a/N in any small increment of 1/N day. These are listed below along with the probability of observing each: 000 001 010 011 100 101 110 111 (1 − p)2 p (4. The formulas for the Poisson distribution can be derived by using a combinatorial argument based on the probability of the event occurring in each small time period. and 3 ways to observe 2 customers.46 Monte Carlo Methods The probabilities all add up to one as we should expect.

1] (using rand in matlab. This can be shown to be the proper distribution of service times for a Poisson distribution in the following manner.4. and a second random number to determine whether the customer currently being served ﬁnishes.7 Queuing theory 47 This is called the binomial distribution for the total number of “successes” in N Bernoulli trials.7 Queuing theory As another example of Monte Carlo simulation. see for example [HL95]. Instead. According to the Poisson distribution. rather than minutes. N →∞ −a a N a 1− N N −1 a N (4. Customer arrivals are modeled with a Poisson process with average rate a per minute. recall that we want to determine the probability of observing k customers in the case where N is very large and p = a/N . The amount of time required to serve them is also random. Here we will look at the simplest example of a single server and discuss how Monte Carlo simulation might be used to study the expected length of the line. but this is exactly the probability that the next arrival time is more than T units later. we will look at an example from queuing theory. To relate this to the Poisson distribution. . with average rate of b people per minute. Suppose we split up the time since the last arrival into units T. Suppose customers arrive at a service center or cashier and have to line up to wait for service. this simulation works by ﬁrst generating a set of arrival times and service times for each customer.6) N −1 4. for example) and then compute 1 − log(r). . The probability p0 is given by p0 = lim p0 N →∞ (N ) = lim N →∞ 1− a N N = e−a . . In this simulation this is also assumed to be Poisson process with probability of roughly b/N that they will be ﬁnished in any small subunit of time 1/N . 2T. The matlab code in Figure 4. and the comparison of diﬀerent strategies. where a is a ﬁxed value representing the average number of customers per day.5) Similarly. or some other distribution of service times. The times are generated using an exponential distribution: Probability that inter-arrival time is larger than T = e−aT . Queuing theory is the study of how such lines are expected to behave. a . for example? There is a large literature on this subject. Which approach minimizes the average time a customer has to wait. A better model might be that there is some minimal service time plus a random length of time after that. For example.6 shows a simple simulation of a queue. if there are multiple servers we might want to compare the strategy of having independent lines for each (as is common in grocery stores) or have a single line with the next customer going to the ﬁrst available server (as is common in banks and airport check-in counters). the probability of exactly 0 arrivals in the ﬁrst of these periods is thus e−aT . To generate random numbers from an exponential distribution with parameter a. and then computing the total time that each customer must wait on the basis of these times. we can ﬁrst generate a random number r uniformly distributed in the interval [0. (4. p1 = lim N 1 − N →∞ = a lim = ae . Then in each unit the expected number of arrivals is aT . (This may not be very realistic. They must wait in the queue until the single server is free. 3T.) One could model this by splitting each minute up into N subunits and generating a random number to determine whether a new customer arrives.

as we will look at later.91 is even closer to the average interarrival time. There are now typically some customers who have to wait more than 20 minutes. In this case there are a ﬁnite set of states possible (diﬀerent queue lengths) and it is possible to derive a Markov chain model of transitions. . In fact it can be shown that for arbitrary parameters a < b. some similarities are observed. we could assume a maximum queue length beyond which customers go away rather than joining the queue. for 1000 customers each. Also.1 is used instead. Figure 4.7 shows the results of three simulations using the parameters a = 1 and b = 1. For example. the queue length can grow quite long due to the randomness of arrivals. While the 3 simulations show variation in details. More complicated models would be required to model a bank or super market with multiple lines and servers. in which case the average service time 1/b ≈ 0. Note that this is consistent with what is observed in these simulations. and the 1 indicates there is a single server. Various modiﬁcations of this model are possible. Figure 4.8 shows another set of simulations in which the parameter b = 1. for example. In all three simulations the queue length is more than 10 at times. One can also use Markov chain ideas to study the expected behavior of such systems and determine expected values for quantities such as the total waiting time. but it is clear that now the queues will typically grow to be of length greater than 20 at times and the average waiting time is much longer. looking at the histogram we see that there are some customers who have to wait as long as 10 or 12 minutes in each simulation. Two plots are shown for each simulation: on the left is the length of the queue as a function of time. the expected total waiting time is 1/(b − a). From comparing these sets of results it should be clear that such simulations can be eﬀectively used to predict how badly queues may behave if the proper parameters are estimated. the second M indicates that the service times are also Markovian.48 Monte Carlo Methods Note that log is the natural logarithm command in matlab. This model is called an M/M/1 queue. Note that although the server can handle 50% more customers than are arriving on average. The ﬁrst M indicates that the arrival times are exponentially distributed (or Markovian).5. Again the details vary between simulations. We have only looked at the simplest situation of a single queue and a single server. though the average total time is more like 2 minutes. and on the right is a histogram of the total time spent by each customer (including both the time in line and the time being served).

% average number of people served per minute ncust = 1000. % Generate inter-arrival times. % service times for each % Compute time each customer finishes: ft(1) = at(1)+st(1). % Single server.1). % Generate random arrival times assuming Poisson process: r = rand(ncust. ft = zeros(ncust.at. st = -1/b * log(r).4. end % arrival times of other customers % Generate random service times for each customer: r = rand(ncust. % Arrival time of first customer for i=2:ncust at(i) = at(i-1) + iat(i). end total_time = ft . % total time spent by each customer % time spent waiting before being served ave_service_time = sum(st)/ncust ave_wait_time = sum(wait_time)/ncust ave_total_time = sum(total_time)/ncust % Plot histogram of waiting times: hist(total_time.0:. See the Handouts webpage for the full m-ﬁle.5. wait_time = total_time .7 Queuing theory 49 % Simple queuing theory simulation. The code shown here is simpliﬁed and doesn’t show the computation of the queue length which is plotted here.st. ft(i-1)+st(i)). exponential distribution at(1) = iat(1).6: Queuing theory simulation.1). . % finish time for first customer for i=2:ncust % compute finish time for each customer as the larger of % arrival time plus service time (if no wait) % finish time of previous customer plus service time (if wait) ft(i) = max(at(i)+st(i). single queue: M/M/1 queue a = 1. iat = -1/a * log(r). % % initialize arrays: at = zeros(ncust.1). % average number of arrivals per minute b = 1.1). % Notation: % at = arrival time of a person joining the queue % st = service time once they reach the front % ft = finish time after waiting and being served.5:20) Figure 4.

1487 14 300 average service time =0. time 18 histogram of waiting times 400 a =1 16 350 b =1.5 average total time =2.50 Monte Carlo Methods queue length vs.7: . time 16 histogram of waiting times 400 a =1 14 350 b =1.1234 300 10 average service time =0.6786 12 250 10 200 8 150 6 100 4 2 50 0 0 200 400 600 800 1000 1200 0 0 2 4 6 8 10 12 14 16 18 20 queue length vs. time 15 histogram of waiting times 400 a =1 350 b =1.67806 250 200 150 5 100 50 0 0 200 400 600 800 1000 1200 0 0 2 4 6 8 10 12 14 16 18 20 Figure 4.5 average total time =2.2084 12 300 average service time =0.6693 10 250 8 200 6 150 4 100 2 50 0 0 100 200 300 400 500 600 700 800 900 1000 0 0 2 4 6 8 10 12 14 16 18 20 queue length vs.5 average total time =2.

2679 15 100 average service time =0.1 140 20 120 average total time =6.8: . time 30 histogram of waiting times 200 180 25 a =1 160 b =1.3831 100 15 average service time =0.7 Queuing theory 51 queue length vs.4. time 35 histogram of waiting times 200 180 30 a =1 160 b =1.77 100 average service time =0. time 25 histogram of waiting times 200 180 a =1 20 160 b =1.88394 80 10 60 40 5 20 0 0 100 200 300 400 500 600 700 800 900 1000 0 0 5 10 15 20 25 30 35 40 queue length vs.91675 80 10 60 40 5 20 0 0 200 400 600 800 1000 1200 0 0 5 10 15 20 25 30 35 40 queue length vs.87122 10 80 60 5 40 20 0 0 200 400 600 800 1000 1200 0 0 5 10 15 20 25 30 35 40 Figure 4.1 25 140 120 20 average total time =11.1 140 15 120 average total time =4.

35. which sections of this paper correspond with each stage and each edge of the modeling cycle? (b) Describe strategies BF5 and OI5 in words. Arrival Times: 12 Service Times: 40 31 32 63 55 95 48 99 18 154 50 198 47 221 18 304 28 346 54 411 40 455 72 537 12 (a) Determine the departure times of these 13 customers. 000. a customer either enters service if the server is free or joins the waiting line. Customers arrive at the post oﬃce at a rate of approximately three per minute.52 Monte Carlo Methods 4.45 respectively. Read ”America West Airlines Develops Eﬃcient Boarding Strategies” by van den Briel. Hogg. The following data yield the arrival times and service times that each customer will require for the ﬁrst 13 customers at a single server system. 000? 5. Upon arrival. what is the expected number of times someone gets up to let another passenger into their seat? For example.55 and q = 1 − p = . Suppose that 3% of families in a large city have an annual income of over 80. Obama will not be the last to ﬁnish his call? 4. at most three have an annual income of over 80. M16 (HT T HHT T HHHT T T T T H) = 3 corresponding to the run of three heads starting in position 8. the one who has been waiting the longest) enters service. Assume for simplicity that each cab brings exactly one passenger who is either a man or a woman with probabilities p = .e. vol. (b) Repeat (a) when there are two servers and a customer can be served by either one. For example. 3. 7. what is the probability that among the three people. There are two phone lines available and both are being used by the VP and the secretary of state. Interfaces. 2. t]? What assumptions must one make to answer this question? 6. that Mr. A run is a sequence of trials in a row all coming up heads or all coming up tails. What is the probability that of 60 random families. if the people in seats B and C are already seated when the passenger in seat A arrives. What does BF and OI stand for? (c) How was OI5 found? How was BF5 found? (d) Given a single half row of an airplane with 3 seats per side. then 2 people get up. Lindemann and Mule. Obama is waiting to make a phone call at the oﬃce. What is the probability that the next customer does not arrive during the next 3 minutes? Explain your assumptions. Discussed on page 193 and again on page 199. 191-201. Let Mn be the random variable that counts the maximum length of any run of heads in n trials. . What is the distribution of NW (t) = the number of women arriving during the time interval [0. pp. What is the expected value and variance of Mn for 1 ≤ n ≤ 64? An exact formula would be best but a good approximation from a simulation will suﬃce. When the server completes work on a customer. the VP and the secretary. Mr. Obama. the next one in line (i. Villalobos. the next customer to enter service is the one that has been waiting the least time. (c) Repeat (a) under the new assumption that when the server completes a service. (a) In terms of our 4-stage modeling process. Consider the Bernoulli process with a fair coin. May-June 2005. If the duration of each telephone call is an exponential random variable with λ = 1/8. Cabs carrying passengers to an airport terminal arrive according to a Poisson process with 25 arrivals per hour on average. Mr. No.8 Exercises 1. 3.

4.8 Exercises

53

(e) Discuss the entries in Table 1 and Table 2. What does the number 69 in the row labeled “Economy class[xxx]” represent in words? What should we compare 69 to in Table 2? (f) Discuss Tables 3 and 4. What conclusions can you make from this data? (g) What is the strategy that the authors recommend and what evidence did they give that it is best? Quote from the paper. (h) Is the recommended strategy optimal? If not, what do you think would be better?

54

Monte Carlo Methods

Math 381 Notes — University of Washington —Winter 2011

Chapter 5

**Markov Chains and Related Models
**

5.1 A Markov chain model of a queuing problem

To introduce the theory of Markov chains, let’s consider a simple queuing theory model in which there is a single server and there are only three “states” which the queue can be in at any given time: State 1: State 2: State 3: No customers are present A customer is being served A customer is being served and another is waiting.

Assume that the queue is never longer: if a customer arrives when the queue is in State 3 then that customer disappears. Suppose that arrivals and service are Poisson processes, and over a given short time interval there is a probability p of a new customer arriving and a probability q that any customer currently being served will ﬁnish. If the time interval is very short then these are very small and we can assume there is negligible chance of more than one customer arriving or ﬁnishing, and also negligible chance of both one customer ﬁnishing and another arriving in the same period. We could perform a Monte Carlo simulation of this queue by keeping track of what state it is in, and each time period either staying in this state or moving to a diﬀerent state depending on the value of some random number: random_num = rand(1); if random_num < p # a new customer arrives: if state<3 state = state + 1; end end if random_num > 1-q # a customer finishes: if state>1 state = state - 1; end end Note that we’ve done the tests in such a way that if p and q are both very small then we can’t have both happen. Figure 5.1 shows the transition diagram between the three states. For example, if we are in State 2, then with probability p we move to State 3 (if a customer arrives), with probability q we move to State 1 (if a customer ﬁnishes), and with probability 1 − p − q we stay in State 2 (if neither happens). 55

81% of the simulations. 95 96 97 98 99 100 etc.08. We could determine this for many diﬀerent simulations and collect some statistics. about what we’d expect since p = 0. there are still no simulations in State 3 after one period. Since it’s impossible for 2 customers to arrive in a single period.05 and q = 0. We might be interested in determining what percentage of the time we are in each of the three states over the long term. After one time period a customer had arrived in 481 of the simulations. for example. all starting in State 1.. Initially all simulations were in State 1. and print out how many of the simulations are in each of the three states at each time: time period 0 1 2 3 4 5 6 etc. After a while we might expect to see the same percentages as reported above (why??). Here’s what we observe if we do 10. we will observe that about 50% of the time we are in state 1. and about 19% in State 3.. If we do a single simulation then we will move around between the three states.. about 31% in State 2.1: Transition diagram for a simple queuing theory model with three states. Another way to get these number is to run lots of simulations simultaneously and keep track of what percentage of the simulations are in each state at each time period. or 4.000 simulations.56 Markov Chains and Related Models 1-p p 1-p-q state 1 q state 2 0 q p 0 state 3 1-q Figure 5. 4949 4913 4920 4958 4982 4995 3134 3166 3161 3148 3072 3038 1917 1921 1919 1894 1946 1967 State 1 10000 9519 9073 8685 8379 8096 7812 State 2 0 481 891 1240 1491 1702 1914 State 3 0 0 36 75 130 202 274 . If we do this with p = 0.05..

or stays in State 1 with probability 1 − p = 0.5) 0 .193481 0. and 3 steps.05.05 and q = 0. If all simulations start in State 1 then we have 1 v (0) = 0 . v (100) = A100 v (0) = 0. with roughly 50% of the simulations in State 1. From the transition diagram of Figure 5. v (2) = .1 we expect v1 (n+1) (n+1) = (1 − p)v1 = pv1 =0· (n) (n) v1 (n) + qv2 (n) + 0 · v3 (n) (n) (n) v2 (n+1) v3 + (1 − p − q)v2 + (n) pv2 + qv3 (5. we appear to have reached a sort of “steady state”.05 0. 2. note that v (2) = A2 v (0) and in general v (n) = An v (0) . To predict the percentages seen later.08.05 0. for n around 100. we have reached a steady .8685 v (1) = . Alternatively. We can write this in matrix-vector notation as v (n+1) = Av (n) .95 . 0 Now consider the general case. Note that these are the same percentages quoted above.3) q 1−p−q p 0 0. We can compute 0. and let v (n) be the vector with these three components. From the relation (5. where the transition matrix A is 1−p A= p 0 (5.08 0 q = 0. 2.1) 0 What fractions do we expect for v (1) ? Since each simulation moves to State 2 with probability p = 0.92 (5. and 19% in State 3.193502 Note that again the values in the simulation are close to these values.95 .95 v (1) = 0.4) The right-most matrix above is for the speciﬁc values p = 0. 31% in State 2.309994 .5.3) we can compute . 1−q 0 0. Let vk be the fraction of simulations which are in State k after n steps (for k = 1. (5.496525 0. Suppose we know v (n) and we want to determine v (n+1) . we expect 0.2) + (1 − (n) q)v3 . 0. 3).1 A Markov chain model of a queuing problem 57 After 2 periods. What we now want to explore is whether we can determine these values without having to do (n) thousands of simulations. In fact we can by using the theory of Markov chains. Also notice that these two values are very close to each other. etc. we need only multiply repeatedly by the matrix A.08 . Of course each individual simulation keeps moving around between states.9065 .1247 . however. v (3) = . it is possible to be in State 3.309999 .0069 Note that these values agree reasonably well with what is seen in the simulation after 1.05 . Just as we conjectured from the simulation.95.87 0.05 . How does this agree with the number you should expect? Notice that after many time periods.0025 .0910 . (5.496498 v (99) = A99 v (0) = 0. and in fact 36 of the simulations reached this state. but the number in each stays roughly constant.

5. In general A has m eigenvalues λ1 . Note that A2 = (RΛR−1 )(RΛR−1 ) = RΛ(R−1 R)ΛR−1 = RΛΛR−1 = R(Λ2 )R−1 (5. The set of all eigenvectors corresponding to a given eigenvalue forms a linear subspace of lRm . det(A − λI) = 0. then (A − λI)r = 0 where I is the m×m identity matrix. λm . Recall from linear algebra that a scalar value λ is an eigenvalue of the matrix A if there is a nonzero vector r (the corresponding eigenvector) for which Ar = λr. 0.4961240310077407 0. but this case is a bit more subtle. (Note that the order is important! In general AR = RA and RΛ = ΛR since matrix multiplication is not commutative.8) as A = RΛR−1 .4961240310076444 v (1000) = A1000 v (0) = 0. If we kept applying A we would ﬁnd little change in future steps.3100775193797780 .2 Review of eigenvalues and eigenvectors Let A be a square m × m matrix with real elements.10) .1937984496123613 As n increases.7) If r is an eigenvector.58 Markov Chains and Related Models state in which the percentages do not change very much from one time period to the next. . .9) (We might be able to do all this even if the eigenvalues are not distinct. Computing the determinant gives the polynomial. v (10000) = A10000 v (0) = 0. Hence its determinant is zero. with eigenvalue λ = 1. e. This arises from the fact that if (5. (5..g. Note that the relation (5. (5.6) means that this vector is an eigenvector of the matrix A.6) so that a step leaves the percentages unchanged. . it has an inverse R−1 and we can rewrite (5.8) where Λ is the diagonal matrix with the values λi on the diagonal. even after thousands of steps. which can be determined by ﬁnding the roots of a polynomial of degree m in λ. v ˆ (5.3100775193798381 .1937984496123990 0.9) is to compute powers of the matrix A. the vectors v (n) are converging to a special vector v which has the property that ˆ Aˆ = v . . then so is αr for any scalar value α.7) holds. it must be singular.) One use of the decomposition (5. Then Ari = λi ri leads to AR = RΛ (5.) Since R is nonsingular. If the eigenvalues are distinct then the eigenvectors ri (corresponding to the diﬀerent λi ) will be linearly independent and so we can form a matrix R with these vectors as columns which will be a nonsingular matrix. 0. Since r is a nonzero null-vector of this matrix.

5.3 Convergence to steady state

59

**and similarly, An = RΛn R−1
**

n

(5.11) λn i

for any n. This is easy to work with because Λ is just the diagonal matrix with the values on the diagonal. In matlab it is easy to compute eigenvalues and eigenvectors. For example, for the matrix A of (5.4), we can ﬁnd the eigenvalues by doing: >> p = .05; q = .08; >> A = [ 1-p p 0 >> eig(A) ans = 1.0000 0.8068 0.9332 q 0; 1-p-q q; p 1-q]; ... ...

If we also want to ﬁnd the eigenvectors, we can do >> [R,Lambda] = eig(A) R = -0.8050 -0.5031 -0.3144 0.4550 -0.8146 0.3597 0.7741 -0.1621 -0.6120

Lambda = 1.0000 0 0 0 0.8068 0 0 0 0.9332

This returns the matrices R and Λ.

5.3

Convergence to steady state

Note that one of the eigenvalues of A computed above is equal to 1 and the others are both smaller than 1 in magnitude. As we will see, this means the Markov process converges to a steady state. Recall that |λn | → 0 as n → ∞ if |λ| < 1, |λn | = 1 for all n if |λ| = 1,

n

(5.12)

|λ | → ∞ as n → ∞ if |λ| > 1,

So as n → ∞, the matrix A approaches

n

60

Markov Chains and Related Models

>> R * [1 0 0; 0 0 0; 0 0 0] * inv(R) ans = 0.4961 0.3101 0.1938 0.4961 0.3101 0.1938 0.4961 0.3101 0.1938

as we can easily conﬁrm, e.g., >> A^100 ans = 0.4965 0.3100 0.1935 0.4960 0.3101 0.1939 0.4954 0.3102 0.1944

Now recall that when we multiply a matrix A by a vector v, we are forming a new vector which is a linear combination of the columns of the matrix. The elements of v give the weights applied to each column. If aj is the j’th column of A and vj is the j’th element of v, then the vector Av is Av = a1 v1 + a2 v2 + a3 v3 . In particular, if v (0) = [1 0 0]′ , then Av (0) is just the ﬁrst column of A. Similarly, A100 v (0) is the ﬁrst column of the matrix A100 . Note that for the example we have been considering, all three columns of A100 are nearly the same, and for larger n the three columns of An would be even more similar. This has an important consequence: Suppose we started with diﬀerent “initial conditions” v (0) instead of v (0) = [1 0 0]′ , which corresponded to all simulations being in State 1 to start. For example we might start with v (0) = [0.5 0.3 0.2]′ if half the simulations start in State 1, 30% in State 2, and 20% in State 3. Then what would distribution of states would we see in the long run? After n steps the expected distribution is given by An v (0) . After 100 steps, >> v100 = A^100 * [.5 .3 .2]’ v100 = 0.4961 0.3101 0.1938 The same steady state as before. Since all three columns of A100 are essentially the same, and the elements of v (0) add up to 1, we expect A100 v (0) to be essentially the same for any choice of v (0) . Even though the steady state behavior is the same no matter how we start, the transient behavior over the ﬁrst few days will be quite diﬀerent depending on how we start. For example, starting with v (0) = [0.5 0.3 0.2]′ instead of (5.1) gives .499 .4982 .4976 v (1) = .302 , v (2) = .3036 , v (3) = .3049 , etc. (5.13) .199 .1982 .1975

5.4 Other examples of Markov Chains

61

instead of (5.5). Another way to look at the behavior of v as we multiply repeatedly by A is to decompose the initial vector v (0) as a linear combination of the eigenvectors of A: v (0) = c1 r1 + c2 r2 + c3 r3 . (5.14)

Since the vectors r1 , r2 and r3 are linearly independent, this can always be done for any v (0) and the solution c = [c1 c2 c3 ]′ is unique. In fact the vector c is the solution of the linear system Rc = v (0) , since this is exactly what (5.14) states, writing the matrix-vector multiply as a linear combination of the columns. Now consider what happens if we multiply v (0) from (5.14) by the matrix A: v (1) = Av (0) = c1 Ar1 + c2 Ar2 + c3 Ar3 = c1 λ1 r1 + c2 λ2 r2 + c3 λ3 r3 . Multiplying by A again gives v (2) = Av (1) = c1 λ1 Ar1 + c2 λ2 Ar2 + c3 λ3 Ar3 = c1 (λ1 )2 r1 + c2 (λ2 )2 r2 + c3 (λ3 )2 r3 . In general we have v (n) = An v (0) = c1 (λ1 )n r1 + c2 (λ2 )n r2 + c3 (λ3 )n r3 . (5.17) Again recall from (5.12), that as n gets large, (λj )n → 0 for any eigenvalue that is smaller than one in magnitude. For this particular problem λ1 = 1 while the others are smaller, and so v (n) → c1 r1 for large n. This means the steady state is a multiple of r1 , which is again an eigenvector of A, as we already knew. Note that the smaller |λ| is, the faster λn goes to zero. So from (5.17) we can see how quickly we expect to approximately reach a steady state. This depends also on how large the values cj are, which depends on the particular initial data chosen. If v (0) happens to be a steady state already, a multiple of r1 , then c2 = c3 = 0. (5.16) (5.15)

5.4

5.4.1

**Other examples of Markov Chains
**

Breakdown of machines

A certain oﬃce machine has three states: Working, Temporarily Broken, and Permanently Broken. Each day it is in one of these states. The transition diagram between the states is shown in Figure 5.2. If the machine is working on Day 1, what is the probability that it is still functional (i.e., not permanently broken) a year from now? (In this example we will consider at a single machine and talk about the probability that this machine is in each state. Alternatively we could consider a pool of many machines and consider what fraction of the machines are in each state on a given day.) We can answer this question by computing the 365’th power of the transition matrix: >> A = [.90 .095 .005 .6 .4 0 0; ... 0; ... 1];

095 0.0001 0. so v (0) = [1 0 0]).0000 so by this time the machine is almost surely dead whatever state we started in.2: Transition diagram for the modeling machine breakdown. or 1825 days? We ﬁnd >> A^1825 ans = 0.005 0 0 0 state 3 1 Figure 5.62 Markov Chains and Related Models 0. Note that the probabilities would be about the same if we’d started in the Temporarily Broken state on Day 1 (by looking at the second column of this matrix). and State 3: Permanently Broken. as we can see by computing the eigenvalues and eigenvectors: >> [R. We see that the probability that the machine is permanently broken in 0.7937 and so there is about a 21% chance that the machine is still functional. State 2: Temporarily Broken.0003 0.0001 0.0284 0. >> A^365 ans = 0.9996 0. Why does this make sense? Why is the third column so diﬀerent? How about after 5 years.9996 0 0 1.7922 0 0 1. For this Markov chain the steady state solution is the vector [0 0 1]′ .1779 0.4 state 1 0.0286 0.0003 0.7937 0.6 state 2 0.1792 0. The three states are: State 1: Working.0000 The ﬁrst column of this matrix gives the probability of being in each state on Day 366 (with our assumption that it is working on Day 1.9 0. Lambda] = eig(A) .

Then the transition diagram is simply 0 1 0 state 1 1 state 2 since with probability 1 the employee switches states each day.9957 State 3 (Permanently Broken) is called an absorbing state.3043 0 0 0 0.7532 Lambda = 1.7 0. 5.3 0.7 .0000 0. we will eventually end up in State 3 with probability 1.1036 0.2 Work schedules Suppose the XYZ Factory is operating every day and has a work schedule where each employee works every other day. . In this case we do not reach a single steady state.7045 -0.0000 0 0 0 0. In this case the transition matrix is A= 0 1 1 0 .5. Say that an employee is in State 1 on work days and in State 2 on oﬀ days.3 0.3 .7 .4 Other examples of Markov Chains 63 R = 0 0 1.7 0. If 70% of the employees are working the ﬁrst day then we have v (0) = 0. .3 . 0].7096 -0.4. but we do have a very regular pattern of cycling back and forth between two states. v (1) = 0. v (2) = 0. v (3) = 0. This can again be understood by looking at the eigenvalues and eigenvectors of the matrix: >> A = [0 1 1...0051 -0. and so on. and once in State 3 we can never leave. No matter what state we start in.6496 -0.

5 v (0) = . The ﬁrst terms ﬂips sign every day and accounts for the oscillation.7071 Both eigenvalues have magnitude 1. No customer arrives. No customer arrives. Working out the probabilities of moving from each state to each of the others. we would expect c1 = 0 and in this case the solution should be truly steady. 0. but there’s a tank in stock. there’s none in stock but one due to arrive in two days. In the special case where v (0) is a multiple of r2 .18) 0 1−p 0 1−p 1−p 0 0 0 1−p 0 0 1−p 1−p 0 0 0 0 0 . there’s none in stock but one due to arrive the next day. 4. A customer arrives. 3.64 Markov Chains and Related Models >> [R. No customer arrives. A customer arrives and there’s a tank in stock. If we decompose v (0) in terms of the eigenvectors. Lambda] = eig(A) R = 0. we ﬁnd the transition matrix 0 p 0 p p 0 0 0 p 0 0 p p 0 0 0 0 0 A= (5.5 This makes sense: if the factory starts with 50% of employees working on the ﬁrst day then 50% will be working every day.7071 Lambda = -1 0 0 1 0. 6. A customer arrives. we could serve about 60% of the customers who come looking for a 150 gallon ﬁsh tank.7071 0. 5. Since the elements of v must sum to 1.1 gave results indicating that with this particular ordering strategy and set of parameters. For the situation modeled in this program the store can be assumed to always be in one of six states each day: 1. We can derive this using a Markov chain. v (0) = c1 r1 + c2 r2 then we have v (n) = c1 (−1)n r1 + c2 r2 . 2. there’s none in stock but one due to arrive in two days.3 The ﬁsh tank inventory problem The Monte Carlo simulation shown in Figure 4. there’s none in stock but one due to arrive the next day.7071 -0. 5. the only multiple of r2 we can consider is 0.4.

45. This is generally considered to be a reasonable model of the way humans perform a riﬄe shuﬄe. as observed in the Monte Carlo simulation.4 Other examples of Markov Chains 65 where p = 1/3 is the probability of a customer arriving on any given day. Unfortunately. 3 = 0. Shannon and by Reeds [Ree81]. the process of card shuﬄing is represented as a Markov chain. /2n . one could determine the probability of any given number of rising sequences after the next shuﬄe. The great contribution of Bayer and Diaconis was to notice that the number of “states” of this system could be drastically reduced. The heaps are then k riﬄed together into a single pile.4 Card shuﬄing Another sort of application of Markov chains is to model a riﬄe shuﬄe of a deck of cards. This matrix has three non-zero eigenvalues including λ = 1. The riﬄe shuﬄe is modeled as follows: A deck of n cards is divided into two heaps. and 6.0667 = v= ˆ 15 6 0. with a deck of 52 cards there are 52! possible orderings. when normalized so the elements sum to 1.1333 2 0.6. and this would seem to render the problem intractable from a computational point of view.5.4. The answer turns out to be 7. For example. The eigenvector corresponding to λ = 1. With this model. with the probability n of a heap containing k cards being given by a binomial distribution.1333 The components of this vector are the expected fraction of days that the store is in each of the six possible states. the ordering 146253 has three rising sequences: 123. Given only the number of rising sequences in the current ordering. Given the current ordering of the cards. with the probability of dropping the next card from a particular heap being proportional to the number of cards in that heap. From this we can see that the fraction of time a customer arrives and is served can be computed from the components of v as ˆ fraction served = v1 /(ˆ1 + v2 + v3 ) = ˆ v ˆ ˆ Exactly 60%.4000 2 0. An excellent description of the problem and solution from more of a linear algebra point of view can be found in J´nsson and Trefethen [JT98]. A version of their code is available on the class web page. The entries of the probability transition matrix are Pji = 2−n n+1 2i − j αj . With this observation the number of states of the system is reduced from n! to n. where they also supply a MATLAB code that forms the o probability transition matrix P and a related matrix A = P − P ∞ whose kth power gives the diﬀerence between the state of the system after k shuﬄes and that after inﬁnitely many shuﬄes. This application received a great deal of attention some years back when it was used to determine how many shuﬄes are necessary to randomize a deck of cards [Kol90]. αi . The model was developed by Gilbert[Gil55]. A rising sequence is deﬁned as a maximal sequence of consecutively numbered cards found (perhaps interspersed with other cards) during one pass through the deck. and it was used by Bayer and Diaconis [BD92] to answer the question about randomizing an initially sorted deck of cards.0667 1 1 0. and for a deck of n = 52 cards it now becomes an easy problem to solve numerically. in the long term steady state. one can write down the probability of achieving any other ordering after the next shuﬄe. although this has been debated.2000 1 0. is found to be 3 0. 3+1+1 5.

16 0.9999 a well-shuﬄed deck has between 19 and 34 rising sequences. . 5. with probability .1 0. and αj = Anj . The deck is deemed to be suﬃciently well shuﬄed when this diﬀerence drops below . See [Slo00] for more details on this sequence. . . Then shuﬄe several more times (at least three more times) and with very high probability you will count at least 19 rising sequences. . where A11 = 1. for example. It will be no more than 16.k−1 . .k + (r − k + 1)Ar−1. 1.4 shows a plot of the diﬀerence between the state vector after k shuﬄes and that after n 1 inﬁnitely many shuﬄes: 2 i=1 |vi − (P k s0 )i |.14 0.08 0. αn )T /n! The entries of this vector are the probabilities of each possible number of rising sequences after the deck has been shuﬄed inﬁnitely many times. .5. The αj ’s are known as Eulerian numbers. n − 1. Notice that this happens after k = 7 shuﬄes. . If the deck is initially sorted in increasing order — so that there is one rising sequence and the initial state vector is s0 = (1. and the eigenvector corresponding to eigenvalue 1 is known as well: v = (α1 . . j = 0. This is called the total variation norm and is the measure used by Bayer and Diaconis. . n} that have exactly j rising sequences. .06 0. . 0. . Figure 5. Do you see why? When you take a deck with two rising sequences.5 General properties of transition matrices Transition matrices are special types of matrices which have some general properties which we summarize here: • Every element in a transition matrix must lie between 0 and 1 since it is a probability. The eigenvalues of the probability transition matrix are known to be 2−j . cut it in two and riﬄe the two portions together.2 0.3: Steady-state distribution of number of rising sequences where αj denotes the number of permutations of {1. Shuﬄe a deck of cards four times and count the number of rising sequences.12 probability 0. The vector v is plotted in Figure 5. the resulting ordering can have at most four rising sequences. This begins to give us a clue as to why the system behaves as it does.18 0. . After j shuﬄes there are at most 2j rising sequences. Try the experiment. 0)T — then after one shuﬄe there will be at most two rising sequences. etc. It can be determined from the entries of v that.66 Markov Chains and Related Models Steady−State Distribution of Rising Sequences 0. .3.02 0 5 10 15 20 25 30 35 Number of rising sequences 40 45 50 Figure 5. so that for j ≤ 4 it is easy to determine that the ordering is not random. . . A1k = 0 for k = 1. .04 0.k = kAr−1. and they satisfy the recurrence: Ar.

) • It can also be shown that all eigenvalues of a transition matrix satisfy |λ| ≤ 1.5. all of these probabilities must add up to 1. the inverse of the matrix of right eigenvectors. • On the other hand the rows do not need to sum to 1.3144]′.8 0. This proves that λ = 1 is always an eigenvalue of any transition matrix. if we use Matlab and obtain an eigenvector such as r1 = [−0. This is just a restatement of the fact that all columns must sum to 1. so some components of an eigen-expansion such as (5.2 Distance from random (in total variation norm) 1 0.5 General properties of transition matrices 67 1.6 0.3144 0.3101 −0. are diﬀerent from the right eigenvectors.4 0. (Note: In many books transition matrices are deﬁned diﬀerently.4: Convergence toward the steady-state in total variation norm • Each column of a transition matrix sums to 1. In fact the left eigenvectors ℓ are the rows of the matrix R−1 .5031 − 0. (Recall that in general the left eigenvectors of A. This is because the elements in the j’th column represent the probabilities of moving from State j to each of the other states. row vectors ℓ for which ℓA = λℓ.4 1. Since the elements of v must sum to 1. What can you hypothesize about eigenvectors of A corresponding to eigenvalues λj with |λj | < 1? . Since we must move somewhere.5031 − 0. with aij being the probability of moving from State i to State j instead of being the probability of moving from State j to State i as we have used. and the rows sum to 1 instead of the columns. But this also shows that w is a left eigenvector of A with eigenvalue λ = 1.4961 r1 v= ˆ = 0. so we choose cj = 1/ i rij where rij are the elements of the vector rj .) • If w is the row vector with all components equal to 1.2 0 0 1 2 3 4 5 6 Number of shuffles 7 8 9 10 Figure 5. • If λj = 1 then the eigenvector rj can be used to ﬁnd a potential steady state of the system. For example. In this case the transition matrix is the transpose of ours.1938 This is the expected steady state. • Note that normalizing an eigenvector gives a reasonable result only if all the elements of the eigenvector have the same sign. but none can grow exponentially. but the eigenvalues are the same.8050 − 0.14) decay while others have constant magnitude. we can compute the normalized eigenvector as 0. we must normalize rj to have this sum. Recall that any scalar multiple cj rj is also an eigenvector. then wA = w.8050 − 0.

Let v0 . • Fecundity rates. 1000 of each age.1461 0.20) Note that in this case the columns of the matrix do not sum to 1.5(a). we can predict the future population by iterating with this matrix just as we did using transition matrices for a Markov chain. We ﬁnd A = 0 0. based on survival rates and fecundities.6 Ecological models and Leslie Matrices Similar matrix techniques can be used in other situations that are not exactly “Markov chains”.0678]′.7861 0. Age 1. (n) 0 v2 (5. of which 3 females are expected to survive to the next breeding season. Females that are 2 years old lay more eggs and suppose 6 females are expected to survive to the next spring. Since v (20) = A20 v (0) . and the eigenvalues and eigenvectors will give useful information about what we expect to see in the long run. we can predict all this from the eigenvalues and eigenvectors of A. Suppose 20% of Age 0 birds survive to the next spring. Suppose females that are 1 year old produce a clutch of a certain size. and 50% of Age 1 birds survive to become Age 2. As an example. 1000 500 100 900 and so on. and we will keep track of the population just before breeding. This settles down to a steady state.2000 0 3. However. Clearly the population is growing exponentially. and Age 2. We can compute that between the last two years the total population grew by a factor of 1. v1 and (n) v2 represent the number of females in each age class just before breeding in Year n.0000 0 0. SK87] for more discussion. suppose we want to study the population of a certain species of bird.68 Markov Chains and Related Models 5. To model how the population changes we need to know: • Survival rates.19) (n+1) v2 (n) 0. If we plot the population in each age group as a function of Year. we see what is shown in Figure 5.0760.5 (n) v 6 0 (n) 0 v1 . v (1) = 200 .) Suppose we start with a population of 3000 females.0000 0 0 . and some of the entries do not represent probabilities.5(b) shows the fraction of the population which is in each age class as a function of year. v (3) = 720 .2 0 0 0. This form of matrix. These birds are born in the spring and live at most 3 years. (n) (n) There will be 3 types of birds: Age 0 (born the previous spring). Then we ﬁnd 1000 9000 3600 6000 v (0) = 1000 . (See [Jef78.2v0 (n) (5. Then we have the following model: v0 v1 (n+1) = 3v1 = (n) + 6v2 (n) (n+1) = 0. After 20 years the population is v (20) = [20833 3873 1797]′ while the percentage in each age class is [0.5000 6. v (2) = 1800 . is called a Leslie matrix.5v1 In matrix-vector form: v (n+1) 0 3 = 0. Figure 5.

The ﬁrst eigenvector r1 .7860 0.5179i >> R(:.5 Age 1 0.1297i Lambda = 1.6 0. then we would see smooth behavior from the beginning with no transient.1)) ans = 0.2545i 0.6990 .2545i 0.1110 . normalized to have sum 1. so the corresponding components in the initial vector v (0) die out.0.5380 .6990 + 0.0759 0 0 0 -0.2 Age 2 0 0.5 1 0.6459i 0.5179i 0 0 0 -0.6459i 0.6 Ecological models and Leslie Matrices 69 2.1461 0.0149 .0149 + 0.) .4 0.0679 We see that λ1 = 1.1297i -0.9 2 Age 0 0. This transient behavior which results from the fact that the initial population distribution (equal number of each age) is not the steady state distribution. The other two eigenvalues are complex and have magnitude 0.0.0846 -0. (See the example in the next section.5: Bird population with a32 = 0.5380 + 0.7468.1821 0.5.7 1.0.9796 0.5 and initial data v (0) = [1000 1000 1000]′ . >> [R. gives the fraction expected in each age class in the steady state.5 0.Lambda] = eig(A) R = 0.0759 is greater than 1 and accurately predicts the growth rate of the total population.1110 + 0.0.3 0.8 0. If we had started with a population that was distributed this way.1)/sum(R(:.5 x 10 4 population of each age group 1 fraction of population in each age group 0.1 0 5 10 15 20 25 0 0 5 10 15 20 25 Figure 5.

4 0. For example. We might like to decrease the rates to a point where the dominant eigenvalue is very close to 1 in value.3.0491]′ and for a population of 3000 birds this suggests we should take v (0) = 3000r1 = [2371 482 147]′ . Note that we now have exponential decay of the population and eventual extinction. one would have to investigate what assumptions the model is based on and how far in the future these assumptions will be valid. population of each age group 2500 1 fraction of population in each age group 0.1 Age 2 0 0 5 10 15 20 25 0 0 5 10 15 20 25 0.6: Bird population with a32 = 0.5 to 0.70 Markov Chains and Related Models 5. then all eigenvalues will be less than 1 and the population size will decrease exponentially to extinction.1 Harvesting strategies We can use this model to investigate questions such as the following.4 3000 0.9 2000 0.6.7: Bird population with a32 = 0.6 5000 0. Then we ﬁnd that the largest eigenvalue of A is 0.1 0 0 5 10 15 20 25 0 0 5 10 15 20 25 Figure 5.1608 0.2 0.7. Then we obtain the boring graphs shown in Figure 5. If we decrease the rate too much. . For the new matrix the relevant scaled eigenvector is r1 = [. (Assuming it’s accurate at all!) To judge this. suppose we harvest an additional 20% of the 1-year olds.5 Age 0 4000 0.8 0.8 0. Suppose we want to harvest a certain fraction of this population.7 6000 0. either to control the exponential growth or as a food source (or both). Of course it might be more reasonable to assume the population starts out with a steady-state distribution of ages. How much should we harvest to control the growth without causing extinction? Harvesting will decrease the survival rates. population of each age group 9000 1 fraction of population in each age group 8000 0. giving the graphs shown in Figure 5.3 2000 Age 1 0.3 and initial data having the expected steady-state distribution of ages.983)20 = 2129 and probably our model is not even accurate out this far.5 1000 0. so the survival rate a32 falls from 0.2 1000 Figure 5.3 500 0.3 and initial data v (0) = [1000 1000 1000]′ .7902 0.7 1500 0.9 7000 0.9830.6.6 0. over the 20-year span shown here the total population is predicted to decrease from 3000 to 3000(0. However.

5 (a) Write the model in matrix form to project the ﬁre danger probability from one day to the next. what is the equilibrium probability of being in each state? 2. .5 . what is the probability that we will be in State 2 the day after tomorrow? (c) If the matrix you found is correct.5 . whereas 40% of those who had leased minivans changed to sedans.7832 -0. (b) If we are in State 1 today.0697 0 0 0 0.5.7 Exercises The ﬁre danger during the summer in Mount Baker National Forest is classiﬁed into one of three danger levels.3 state 2 .2 .5 . (b) If 20% of the cars leased to customers last year were minivans. Brown and B.1 0 state 3 .4072 -0.2410 -0. (a) Sketch the transition diagram. 2 =moderate. The rest of the customers repeated the type of vehicle they had before.1813 0.4303 Based on these.4699 -0. F.5988 0 0.7 Exercises 71 5. The probability of daily transitions between these states is given by the following ﬂow diagram: . (Adapted from Ecological Simulation Primer by Swartzman and Kaluzny) state 1 . These are 1 =low.7961 -0.0000 0 0 R = -0. W.7801 0. 3 =high. what percentage of customers will lease minivans this year? (c) Write the Markov chain in matrix form and determine the 2 × 2 transition matrix A. (Adapted from Finite Mathematics by R.4 1. then it has eigenvalues and eigenvectors given by Lambda = 1. Brown) In an automobile-leasing company’s study of its customers who lease a new car every year.5551 0. it is found that 30% of those who leased sedans the preceding year changed to minivans.

What is the probability that he will ever graduate? (b) Mary is a sophmore. where β is some parameter. This might be modeled as a 4-state Markov chain. Comment on whether this is realistic. (Hint: you know what one of the eigenvalues should be. What is the probability that she will ever graduate? (c) Do you think a 4-state Markov chain model really makes sense in this case? Recall that we must assume the transition probabilities depend only on the state a student is currently in. this would lead to a decline in enrollment.72 Markov Chains and Related Models (d) Determine the eigenvalues and eigenvectors of A by hand. A certain 2-year college considers students to be either “freshmen” or “sophmores” each year depending on how many credits they have at the start of the year. How might you improve the model by adding more states? 4.) (c) With this strategy.) (e) What is the equilibrium distribution of sedan and minivan leases? 3. based on the determinant of A−λI. They keep track of how many students graduate in Year n. admitting twice as many new students as there are graduates. while 15% remain sophmores and 5% drop out. while 15% remain freshmen and 25% do not return. They might like to take β = 1 and admit as many new students as graduate. do you expect the enrollment to rise or decline in the long run? (b) Can you ﬁnd a better value of β to use if the goal is to have a steady enrollment? (It’s ok to use trial and error combined with matlab to ﬁnd the “right” value. Each year 80% of sophmores graduate. call this Gn . not their past history. (a) Peter is a freshman. about what percentage of students will be freshmen each year? Note: the matrix you want to look at in this case is similar to a Leslie matrix rather than a Markov chain transition matrix. The administration has determined that 60% of the freshmen each year go on to be sophmores the next year. Continuing the previous problem. . but since not all students ever graduate. suppose the admissions oﬃce is setting a new policy for the number of students to admit. Instead they plan to use this strategy with β = 2. and plan to admit βGn new freshmen in Year n + 1. (a) With this strategy.

Some of the discussion in the original notes which introduces duality theory and sensitivity analysis has been eliminated.edu/~burke/crs/407/HTML/notes. the feasible region is always taken to be a subset of lRn (real n-dimensional space) and the objective function is a function from lRn to lR.Math 381 Notes — University of Washington —Winter 2011 Chapter 6 Linear Programming 6. . . . We further restrict the class of optimization problems that we consider to linear programming problems (or LPs). In this course. . we will review the four basic steps in the development of an LP model: 1. i = 1. .html Other introductions to linear programming can be found in many references. 73 . including [HL95] and [Chv83]. . .1 of the notes is adapted from an introduction to linear programming written by Professor J.1 Introduction Section 6. . the objective has the form c1 x1 + c2 x2 + · · · + cn xn for some ci ∈ lR i = 1. transportation scheduling.math. The function to be minimized or maximized is called the objective function and the set of alternatives is called the feasible region (or constraint region). ﬂight crew scheduling. . . layout. Linear programming is an extremely powerful tool for addressing a wide range of applied optimization problems.1. parameter estimation. that is. 6. warehousing. Burke’s original notes can be found on the webpage http://www. facility location. A short list of application areas is resource allocation. s 6. .1 What is optimization? A mathematical optimization problem is one in which some function is either maximized or minimized relative to a given set of alternatives. production scheduling. . Determine and label the decision variables. . .washington. n. of the form ai1 x1 + ai2 x2 + · · · + ain xn ≤ bi and ai1 x1 + ai2 x2 + · · · + ain xn = bi i = s + 1. Burke for Math 407. . An LP is an optimization problem over lRn wherein the objective function is a linear function.2 An Example To illustrate some of the basic features of LP. and the feasible region is the set of solutions to a ﬁnite number of linear inequality and equality constraints. In modeling this example. .1. m. we begin with a simple two-dimensional example.

The ﬁrst step in our modeling process is to determine the decision variables. Once these variables are correctly determined then the remainder of the modeling process usually goes smoothly. Determine the objective and use the decision variables to write an expression for the objective function. Decision Variables: B = # of cases of beer mugs to be produced daily. Determine the implicit constraints. one can now specify the problem objective. After specifying the decision variables.74 Linear Programming 2. one can ask the question “If I were the plant manager for this factory. About 15 cases of either product can be produced per hour. we need to specify those quantities whose values completely determine a production schedule and its associated proﬁt. of plastic resins to produce while champaign glasses require 12 lbs. We will model the problem of maximizing the proﬁt for this company as an LP. 4. They produce personalized beer mugs and champaign glasses. The cups are manufactured with a machine called a plastic extruder which feeds on plastic resins. Determine the explicit constraints and write a functional expression for each of them. everything is determined once it is known how many cases of beer mugs and champaign glasses are to be produced each day. In the problem under consideration. per case. determine the implicit constraints. there are explicit constraints on the amount of resin and the number of work hours that are available on a daily basis. PLASTIC CUP FACTORY A local family-owned plastic cup manufacturer wants to optimize their production mix in order to maximize their proﬁt. These are the variables that represent the quantiﬁable decisions that must be made in order to determine the daily production schedule. The explicit constraints are those that are explicitly given in the problem statement. determine the explicit constraints. and 2. That is. That is. In order to determine these quantities. Objective Function: Maximize proﬁt where proﬁt = 25B + 20C The next step in the modeling process is to express the feasible region as the solution set of a ﬁnite collection of linear inequality and equality constraints. C = # of cases of champaign glasses to be produced daily. We separate this process into two steps: 1. Explicit Constraints: resin constraint: 20B + 12C ≤ 1800 . 3. The proﬁt on a case of beer mugs is $25 while the proﬁt on a case of champaign glasses is $20. You will soon discover that the most diﬃcult part of any modeling problem is the determination of decision variables. At the moment the family wants to limit their work day to 8 hours. what must I know in order to implement a production schedule?” The best way to determine the decision variables is to put oneself in the shoes of the decision maker and then ask the question “What do I need to know in order to make this thing work?” In the case of the plastic cup factory. one can write an expression for the objective function. Each case of beer mugs requires 20 lbs. The daily supply of plastic resins is limited to at most 1800 pounds.

both B and C must be non-negative quantities. Typically these constraints are associated with “natural” or “common sense” restrictions on the decision variable. x2 ) = c1 c2 . x2 ) = c1 x1 + c2 x2 . since the gradient is constant on the whole space. Step 3: Draw the gradient vector of the objective function. In the case of the cup factory problem B 45 this gives the solution to the LP as = C 75 We now recap the steps followed in the solution procedure given above: Step 1: Graph each of the linear constraints and indicate on which side of the constraint the feasible region must lie. The ﬁrst step toward a graphical solution is to graph the feasible region. to obtain the location of the point at which the objective is maximized we simply set a ruler perpendicular to the gradient and then move the ruler in the direction of the gradient until we reach the last point (or points) at which the line determined by the ruler intersects the feasible region. Then determine on which side of each of these lines the feasible region must lie (don’t forget the implicit constraints!). The next step is to draw in the vector representing the gradient of the objective function at the origin. The set of points of intersection between the straightedge and the feasible region is the set of solutions to the LP. Moreover. the level sets of f associated with diﬀerent function values are given by the lines perpendicular to the gradient. Once the correct side is determined it is helpful to put little arrows on the line to remind yourself of the correct side. Consequently.6. Then shade in the resulting feasible region. the gradient of f is the same at every point in lR2 . Step 4: Place a straightedge perpendicular to the gradient vector and move the straightedge either in the direction of the gradient vector for maximization. In the cup factory problem it is clear that one cannot have negative cases of beer mugs and champaign glasses. . This problem also has other constraints called implicit constraints. Don’t forget the implicit constraints! Step 2: Shade in the feasible region. P : max 25B + 20C subject to 20B + 12C 1 15 B 1 15 C 0 ≤ C. To do this. These are constraints that are not explicitly given in the problem statement but are present nonetheless. Since the objective function has the form f (x1 . ∇f (x1 . C + 0 Since this problem is two dimensional it is possible to provide a graphical solution. or in the opposite direction of the gradient vector for minimization to the last point for which the straightedge intersects the feasible region. That is.1 Introduction 75 work hours constraint: 1 15 B + 1 15 C ≤ 8. Implicit Constraints: 0 ≤ B. ﬁrst graph the line associated with each of the linear inequality constraints. Recall from calculus that the gradient always points in the direction of increasing function values. The entire model for the cup factory problem can now be succinctly stated as ≤ 1800 ≤ 8 ≤ B.

76 Linear Programming C 15 14 13 12 11 10 9 8 20B+12C=1800 objective normal n=[25 20]’ [B C] = [45 75] optimal value = $2625 7 6 5 4 3 2 1 B B/15 + C/15 = 8 feasible region 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Figure 6.1: .

But in the end. we continue to study this 2 dimensional LP to see what else can be revealed about the structure of this problem. there are many ways to choose the decision variables for this problem. . 2. From this information everything else can be determined. This inequality can be transformed to one in standard form by multiplying the inequality through by −1 to get −ai1 x1 − ai2 x2 − · · · − ain xn ≤ −bi . Therefore. For example. we can rewrite this LP as P : maximize subject to cT x Ax ≤ b 0≤x. as you can see there are many ways to model a given problem. ≤ bi for i = 1. they should all point to the same optimal process. . We emphasize that there is not one and only one way to model the Cup Factory problem. Clearly. 2. or any problem for that matter. Using matrix notation. . Transformation to Standard Form Every LP can be transformed to an LP in standard form. linear inequalities If an LP has an inequality constraint of the form ai1 x1 + ai2 x2 + · · · + ain xn ≥ bi . In this section we provide a step-by-step procedure for transforming any LP to one in standard form. the number of cases of beer mugs that get produced is 15 times the number of hours devoted to the production of beer mugs. 6. it is suﬃcient for the shop manager to know how many hours each days should be devoted to the manufacture of beer mugs and how many hours to champaign glasses.1 Introduction 77 The solution procedure described above for two dimensional problems reveals a great deal about the geometric structure of LPs that remains true in n dimensions.1. In particular. Before leaving this section. In this section we propose one ﬁxed formulation for the purposes of developing an algorithmic solution procedure. we make a ﬁnal comment on the modeling process described above.3 LPs in Standard Form Recall that a linear program is a problem of maximization or minimization of a linear function subject to a ﬁnite number of linear inequality and equality constraints. We say that an LP is in standard form if it has the form P : maximize c1 x1 + c2 x2 + · · · + cn xn subject to ai1 x1 + ai2 x2 + · · · + ain xn 0 ≤ xj for j = 1. m where the inequalities Ax ≤ b and 0 ≤ x are to be interpreted componentwise. linear equation . This process usually requires a transformation of variables and occasionally the addition of new variables. We then show that every LP can be recast in this form. . . minimization → maximization To transform a minimization problem to a maximization problem just multiply the objective function by −1. . This general deﬁnition leads to an enormous variety of possible formulations. We will explore this geometric structure more fully as the course evolves.6. n . . But for the moment. .

the bound li ≤ xi is equivalent to the bound 0 ≤ wi . In this case. we put the following LP into standard form. then we get xi = ui − vi with 0 ≤ ui and 0 ≤ vi . To obtain standard form every free variable must be replaced by the diﬀerence of two non-negative variables. −2 ≤ x4 ≤ 2. if xi is free. the bounds li ≤ xi ≤ ui are equivalent to the constraints 0 ≤ wi free variables Sometimes a variable is given without any bounds. The second of these inequalities can be transformed to standard form by multiplying through by −1. variables with upper bounds If a variable xi has an upper bound ui (xi ≤ ui ) one obtains a non-negative variable wi with the substitution xi = ui − wi . In this case. Such variables are called free variables. . variables with lower bounds If a variable xi has lower bound li which is not zero (li ≤ xi ). −1 ≤ x2 . x3 ≤ 5. variables with interval bounds An interval bound of the form li ≤ xi ≤ ui can be transformed into one non-negativity constraint and one linear inequality constraint in standard form by making the substitution xi = wi + li .78 Linear Programming The linear equation ai1 x1 + · · · + ain xn = bi can be written as two linear inequalities ai1 x1 + · · · + ain xn ≤ bi and ai1 x1 + · · · + ain xn ≥ bi . minimize subject to 3x1 −x1 − + x2 6x2 7x2 − x3 x3 + + + x4 x4 x4 ≥ −3 = 5 ≤ 2 and wi ≤ ui − li . one obtains a non-negative variable wi with the substitution xi = wi + li . In this case. the bound xi ≤ ui is equivalent to the bound 0 ≤ wi . To illustrate the ideas given above. That is.

6. Putting it all together we get the following LP in standard form: maximize subject to −3y1 y1 + − 3y2 y2 + − − y3 6y3 7y3 7y3 − − y4 − + − + y5 y5 y5 y5 y5 ≤ ≤ ≤ ≤ ≤ −10 14 −14 −1 4 y4 0 ≤ y1 . The equality constraint is replaced by the two inequality constraints 7x2 −7x2 + x4 − x4 ≤ ≤ 5 −5. . we rewrite the objective as maximize − 3x1 + x2 . so we replace it by x3 = 5 − y4 with 0 ≤ y4 . y3 . so we replace it by x1 = y1 − y2 with 0 ≤ y1 . Observe that the variable x1 is free. y4 .1 Introduction 79 First. y5 . The variable x4 is bounded below and above so we replace it by x4 = y5 − 2 with 0 ≤ y5 and y5 ≤ 4. 0 ≤ y2 . y2 . The variable x2 has a non-zero lower bound so we replace it by x2 = y3 − 1 with 0 ≤ y3 . Next we replace the ﬁrst inequality constraint by the constraint x1 − 6x2 + x3 − x4 ≤ 3. The variable x3 is bounded above.

.Aeq.b.beq) solves the problem above while additionally satisfying the equality constraints Aeq*x = beq. X=LINPROG(f. The pigs must get at least 200 units of carbohydrates and at least 180 units of protein in their daily diet. 100x1 + 40x2 ≥ 200 (6. Suppose corn and alfalfa each cost 30 cents / kg. since it solves a minimization problem rather than a maximization problem: >> help linprog LINPROG Linear programming.1) c1 x1 + c2 x2 where x1 and x2 are the quantity of corn and alfalfa to use (in kilograms) and c1 and c2 are the cost per kilogram of each. X. How expensive would each of the above diets be? 4. so that the solution is in the range LB <= X <= UB.Aeq. Set LB(i) = -Inf if X(i) is unbounded below. With these prices.A.80 Linear Programming 6. how many kg/day does he need for each pig? 2. If the farmer uses only corn.beq. The farmer knows that: • corn has 100 units/kg of carbohydrate and 30 units/kg of protein • alfalfa has 40 units/kg of carbohydrate and 60 units/kg of protein.LB. x2 ≥ 0. is there some mixture of the two grains that would satisfy the dietary requirements and be cheaper? 5. Linear programming problems can be solved in matlab. If the farmer uses only alfalfa. Multiplying the objective function and constraints by −1 would allow us to put this in the standard form discussed above. which requires a slightly diﬀerent standard form. Use empty matrices for LB and UB if no bounds exist. X=LINPROG(f. how many kg/day does he need for each pig? 3.A.2 A pig farming model Suppose a pig farmer can choose between two diﬀerent kinds of feed for his pigs: corn or alfalfa. He can feed them all of one or the other or some mixture.A.UB) defines a set of lower and upper bounds on the design variables. Here are a few questions the farmer might be faced with in choosing a good diet for the pigs: 1.b.b) solves the linear programming problem: min f’*x x subject to: A*x <= b X=LINPROG(f. What is the best mix if the price of alfalfa goes up to $1/kg? What if it goes up only to 60 cents / kg? The problem of choosing the mix which minimizes cost can be set up as a linear programming problem minimize subject to 30x1 + 60x2 ≥ 180 x1 .

iterations. This option is only available with the active-set algorithm.A. Currently. NOTE: the LargeScale (the default) version of LINPROG uses a primal-dual method.6. MaxIter. [X. .2 A pig farming model 81 set UB(i) = Inf if X(i) is unbounded above. at the solution: LAMBDA.FVAL. LargeScale. < 0 then the problem was infeasible or LINPROG failed. 0 then LINPROG reached the maximum number of iterations without converging.beq. [X. LAMBDA. The primal problem in standard form is min f’*x such that A*x = b.UB. The dual problem is max b’*y such that A’*y + s = f.Aeq.cgiterations. Diagnostics.Aeq. an argument created with the OPTIMSET function.LAMBDA]=LINPROG(f.upper for UB.Beq. X=LINPROG(f. only ’final’ and ’off’ are valid values for the parameter Display when LargeScale is ’off’ (’iter’ is valid when LargeScale is ’on’). LAMBDA. X=LINPROG(f. are given as appropriate. x >= 0.FVAL.OUTPUT] = LINPROG(f.LB.ineqlin for the linear inequalities A.A.FVAL]=LINPROG(f.X0) sets the starting point to X0. Infeasibility messages of either the primal or dual.algorithm.OPTIONS) minimizes with the default optimization parameters replaced by values in the structure OPTIONS.b) returns a structure OUTPUT with the number of iterations taken in OUTPUT.LB.b.b) returns EXITFLAG that describes the exit condition of LINPROG. the number of conjugate gradient iterations (if used) in OUTPUT. or both.lower for LB.EXITFLAG.A. See OPTIMSET for details. [X. The default interior point algorithm will ignore any non-empty starting point. If EXITFLAG is: > 0 then LINPROG converged with a solution X.UB.EXITFLAG] = LINPROG(f.b) returns the set of Lagrangian multipliers LAMBDA. s >= 0.X0.b.b) returns the value of the objective function at X: FVAL = f’*X.A. TolFun. and LAMBDA.A. the type of algorithm used in OUTPUT. Use options are Display.eqlin for the linear equalities Aeq. Both the primal problem and the dual problem must be feasible for convergence. [X.OUTPUT.FVAL.A.EXITFLAG.

>> linprog(f.1 Adding more constraints Suppose we add some new constraints to the problem.00/kg: >> f=[30.[]. -1 -1]. 30 60.2(b). we could easily solve the problem. so that x1 ≤ 3 and x2 ≤ 3. This problem can be solved as follows: >> A = -[100 40.2(a). so the solution is at one of the corners of the feasible set. 6.vub) Optimization terminated successfully.A.b. suppose pigs should eat at most 3 kg of any particular grain. The objective function we are minimizing is also linear. the farmer might want to keep the pigs from getting too fat by requiring that the total daily diet should be at most 5 kg. If we knew which two constraints were binding at the solution. >> b = -[200.b. adding the constraint x1 + x2 ≤ 5. Unfortunately we don’t generally know this in advance.[]. Corners correspond to points where two constraints are satisﬁed simultaneously. 3]. so the feasible set is a convex polygon. ans = 6. 0]. >> linprog(f.2.30]. In class we will study the graphical interpretation of these and other related problems.3.[].82 Linear Programming To solve the optimization problem with costs c1 = c2 = 30.[].2. 180.vlb.0000 1.100].5000 This solution is illustrated graphically in Figure 6.0000 0 This solution is illustrated graphically in Figure 6.[].[]. and can be found by solving a linear system of two equations (from these two constraints) in two unknowns x1 and x2 . .5000 This solution is illustrated graphically in Figure 6. >> vlb = [0.0000 2. -5]. >> linprog(f. we can do the following: >> f = [30. 6. ans = 3.0].vlb) Optimization terminated successfully.b. ans = 1.A. 30 60] >> b = -[200. >> vub = [3. In addition. >> A = -[100 40. For example. >> vlb = [0.180]. If the cost of alfalfa goes up to $1.2 Solving the problem by enumeration Each constraint corresponds to a straight line.vlb) Optimization terminated successfully.A.

Shade in the feasible region! . which is one of the corners of the feasible region. The solid lines show constraints and the dashed line is the iso-cost line passing through the optimal solution. Shade in the feasible region in each ﬁgure! 8 7 6 5 4 3 2 1 0 −1 −1 0 1 2 3 4 5 6 7 8 Figure 6.2 A pig farming model 83 8 8 7 7 6 6 5 5 4 4 3 3 2 2 1 1 0 0 (a) −1 −1 0 1 2 3 4 5 6 7 8 (b) −1 −1 0 1 2 3 4 5 6 7 8 Figure 6. In each case the solid lines show constraints and the dashed line is the iso-cost line passing through the optimal solution. which is one of the corners of the feasible region.3: Graphical solution of the pig-farming problem with more constraints.2: Graphical solution of the pig-farming problems in the x1 -x2 plane.6.

like branch and bound algorithms. • If it is feasible. the best value of the objective function found gives the solution. Loosely speaking. The basic idea is to start by identifying one corner of the feasible region and then systematically move to “adjacent” corners until the optimal value of the objective function is obtained. compute and record the value of the objective function at this point. x3 makes up the slack in the inequality constraint 2x1 + x2 ≤ 12. and again the optimal solution occurs at a corner of the feasible set. To gain a basic understanding of the simplex method. 3 6. Changing the value of the objective function sweeps out another set of planes.2. For each choice of 2 equations we could: • Solve the linear system for these two constraints. x2 The corresponding feasible region is depicted in Figure 6. In this case the pictures are harder to draw. becomes 2x1 + x2 + x3 = 12 with the addition of the slack variable x3 ≥ 0. then the problem is even worse.3 Adding a third decision variable If there are more decision variables. • Check to see if the resulting solution is feasible (it might violate other constraints!).84 Linear Programming One approach to solving the problem would be to consider all possible corners by solving all possible sets of 2 equations selected from the various constraints.4. The problem with this.2. if we consider 3 grains instead of only 2. In this case it is impossible to enumerate and check all possible “corners” of the feasible set. For instance. then we would have three variables x1 . If there are m constraints and two decision variables. and x3 to consider.2) must be reformulated into a system of linear equalities constraints. 2x1 + x2 ≤ 12. is that there is a combinatorial explosion of m possible choices. These corners are points where 3 of the constraints are simultaneously satisﬁed. the system of linear equalities in LP (6. After doing this for all choices of two constraints. the inequality. in the worst case it can still have exponential running time). The simplex method uses a series of elementary matrix operations.2) . For example. The introduction of slack variables facilitates such a conversion of the constraints. Such veriﬁcation is an important part of mathematical modeling. Take a moment and verify that the equality and inequality constraints are equivalent. consider the following LP max subject to 2x1 − 3x2 2x1 + x2 x1 − 3x2 0 ≤ x1 . Each constraint corresponds to a plane in 3-space and the feasible set is a convex body with planar sides.4 The simplex algorithm Practical linear programming problems often require solving for hundreds of variables with thousands of constraints. then there are = 2 1 2 2 m(m − 1) = O(m ) possible choices. so we can ﬁnd all possible corners by considering all choices of 3 constraints from the set of m constraints. for larger problems anyway. The simplex algorithm is a method that has proved exceedingly eﬀective at ﬁnding solutions in much less time (though. ≤ 12 ≤5 (6. 6. x2 . There are now m = O(m3 ) ways to do this. Therefore. which is now an object in a linear space with dimension in the hundreds.

Since the column associated with variable x1 contains more than 1 nonzero value. which are equivalent to matrices.2 A pig farming model 85 12 10 8 2x1 + x2 ≤ 12 x2 6 4 2 x1 − 3x2 ≤ 5 0 0 1 2 3 x1 4 5 6 7 Figure 6. the simplex method forms tableaux. The initial simplex tableau for LP (6. which becomes x1 − 3x2 + x4 = 5. Therefore. each inequality constraint should use a unique slack variable. Traditionally. 3. Note that our linear system (6. which implies that x3 = 12/1 = 12. = 12 =5 (6. In contrast. The constraint x1 − 3x2 ≤ 5 still remains to be converted into a linear equality with a slack variable. 4 Notice.4: Feasible region for LP.2) becomes max subject to z = 2x1 − 3x2 2x1 + x2 + x3 x1 − 3x2 + x4 0 ≤ xi . Having obtained a system of linear equality constraints. which would form the linear equality x1 − 3x2 + x3 ≤ 5. one may be tempted to use x3 again as a slack variable.6.2)! Therefore.3) i = 1. Therefore. x1 = 0. LP (6. the column associated with x3 contains only one nonzero value. our system is prepared for matrix computation. or loosely speaking that there are equal amounts of “slack” in each constraint. In general.2) does not imply that 2x1 + x2 − 12 = x1 − 3x2 − 5. another slack variable x4 ≥ 0 is introduced into the second constraint. In an attempt to reduce decision variables. 2. Forcing the same slack variable in each constraint changes the problem we are solving! Take a moment and consider the implications of using x3 for each constraint in (6.3) is z 1 0 0 x1 -2 2 1 x2 3 1 -3 x3 0 1 0 x4 0 0 1 RHS 0 12 5 ←− (z − 2x1 + 3x3 = 0) ←− (2x1 + x2 + x3 = 12) ←− (x1 − 3x2 + x4 = 5) Solutions can be read directly from the tableau. z equals the optimum of the objective function. this tableau leads to the solution .

z 1 0 0 x1 -2 2 1 x2 3 1 -3 x3 0 1 0 x4 0 0 1 RHS 0 12 5 ←− indicators Basic variables are nonzero variables in the tableau and correspond to a corner point of the feasible region. Check the indicators for possible improvement – If all the indicators are nonnegative. Check for unboundedness – If αi < 0 and no positive element in the column associated with variable xi exists. For x1 to become nonzero. Note that x1 is the entering variable and x4 . we ﬁnd the following: z 1 0 0 x1 x2 -2 3 2 1 1 -3 ↑ pivot column x3 0 1 0 x4 0 0 1 RHS 0 12 5 Quotients 12/2 = 6 5/1 = 1 ⇒ pivot row Verify these results. The entering variable is xi and will be made nonzero. Otherwise. The column associated with xi is called the pivot column. 3. and αi is the most negative indicator. The pivot row is the row corresponding to the smallest quotient. Select the entering variable – The most negative indicator identiﬁes the entering variable. Select the departing variable – This step determines which basic variable will become zero. That is. x3 and x4 are basic variables associated with the corner point (0. Applying these steps to our tableau. the departing variable. we apply elementary row calculations to form the following tableau: z 1 0 0 x1 0 0 1 x2 -3 7 -3 x3 0 1 0 x4 2 -2 1 RHS 10 2 5 . The basic idea of the simplex algorithm is to change which variables in the system are basic variables in order to improve the value of z. The numbers in shaded boxes in the top row of the tableau are called indicators. Find a quotient for each row by dividing each number from the right side of the tableau by the corresponding number from the pivot column. Therefore. ﬁnite improvement in the objective function is possible. In the current tableau.Use elementary row operations on the old tableau so the column associated with xi contains a zero in every entry except for a 1 in the pivot position. go to Step 3. Considering more details of the method requires that we establish additional terminology regarding a simplex tableau. the algorithmic steps for the simplex algorithm as applied to the simplex tableau are as follows: 1. 4. then the solution is optimal else go to Step 2. 2.0) (since x1 = 0 and x2 = 0). select αi such that αi < 0. Let αi denote the indicator in the column associated with variable xi . 5. x4 = 5 and z = 0. The pivot element is the element in the pivot row and pivot column. x3 = 12. Go to Step 1. Pivot creating a new tableau .86 Linear Programming x1 = x2 = 0. then the problem has an unbounded objective value.

3 Complexity and the simplex algorithm 87 This tableau leads to the solution x1 = 5. Fortunately.6. Khachian [Kha79] invented the “ellipsoid” algorithm for ﬁnding solutions to linear programming problems in 1979 that he showed runs in polynomial time. software packages exist that are aimed speciﬁcally at solving linear programming and related problems. which represents that the optimal value z cannot be improved. The step of selecting the departing variable is left to the reader. x3 = 2. the indicators identify that possible improvement still exists by selecting x2 as an entering variable.3 Complexity and the simplex algorithm The simplex algorithm was invented by George Dantzig in 1947 [Dan63]. The indicators are all 7 7 7 positive. Spielman and Teng [ST02] introduced a new complexity class they call polynomial smoothed complexity and they showed that the simplex algorithm is the classic example in this class. x2 = 0. Program packages such as lindo and matlab demonstrate the need for eﬀective computation in the ﬁeld of optimization. the current tableau represents an increase in the optimal value! Yet. which is available on some campus computers. In 2001. x2 = 2 . Refer to Figure 6. the solution has been found. Therefore. 6. The basic variables are indeed x1 and x3 . Moreover. Another more eﬃcient algorithm due to Karmarkar[Kar84] uses an interior-point method instead of walking around the boundary. Elementary row operations produce the following tableau: z 1 0 0 x1 0 0 1 x2 0 1 0 x3 3/7 1/7 3/7 x4 8/7 -2/7 1/7 RHS 76/7 2/7 41/7 This tableau leads to the solution x1 = 41 .4 to verify that our solution is a corner point of the feasible solution. Their work attempts to identify algorithms which are exponential in the worst case but which work eﬃciently in practice like the simplex algorithm. . x3 = x4 = 0 and z = 76 . Implementation of the simplex algorithm requires a great deal of linear algebra and bookkeeping. This algorithm appeared in 1984. Imagine performing elementary row operations on a simplex tableau with thousands of decision variables. One popular package is lindo. x4 = 0 and z = 10. but is very eﬃcient in practice. The matlab function linprog implements the simplex algorithm. Klee (from UW) and Minty [KM72] proved that the simplex algorithm is exponential in the worst case.

88 Linear Programming .

however.1 The lifeboat problem Many linear programming problems arising from real problems have the additional complication that the desired solution must have integer values.6364 (number of vests) = 31. The best integer solution was found to be x1 = 381. of reducing the x2 to the nearest integer and increasing x1 as needed. and the corners can be found by solving an appropriate linear system of equations corresponding to the binding constraints. x2 = 2. x2 = 32 would give an infeasible solution (requiring too much volume). this is usually out of the question. maximizing the number of lifeboats subject to certain constraints. but for problems with many variables and with a large range of possible integer values for each. Figure 7.8182 (number of boats) Trying to round this to x1 = 364. for the values considered in Chapter 1. and we see that the best integer solution is x1 = 8. which is not possible.636 shows the maximum value of the objective function x2 obtained at the LP solution x1 = 3. Instead we must ﬁnd the best integer solution. In this case we can’t even use the approach of Chapter 1. This usually does not work. x2 = 31. For the IP problem typically none of the inequality constraints are exactly binding at the solution.Math 381 Notes — University of Washington —Winter 2011 Chapter 7 Integer Programming 7. 89 . We had to reduce x2 to 31 and then the number of vests x1 went up substantially. The feasible integer points are indicated by the dark dots. Recall that for the LP problem we know the optimum occurs at a corner of the feasible set.1 shows the graphical solution to this problem with a diﬀerent set of parameters: C1 C2 C V1 V2 V = 1 = 2 (capacity of each vest) (capacity of each boat) = 11 (total capacity needed) = 1 (volume of each vest) = 3. For example. in the lifeboat problem considered in Chapter 1.1 (volume of each boat) = 15 (total volume available) The dashed line x2 = 3. For most IP (integer programming) problems. The solution to the resulting linear programming problem will typically require some fractional number of lifeboats and life vests.727.636. For example. we must determine how best to equip a ferry with lifeboats and life vests. the LP solution was x1 x2 = 363. x2 = 3. In simple cases one might be able to enumerate all integer points in the feasible set and check each. Our ﬁrst guess at how to solve such a problem might be to simply round oﬀ the values that come from the LP solution. there is no direct way to ﬁnd the solution.

which is one of the corners of the feasible region. When integer constraints are added.90 Integer Programming 8 7 6 5 4 3 2 1 0 −1 0 2 4 6 8 10 12 14 16 18 20 Figure 7. . The solid lines show constraints and the dashed line is the objective function contour line passing through the optimal solution. only the points marked by dots are feasible.1: Graphical solution of the lifeboat problem in the x1 -x2 plane.

then for each cargo the value xj can take one of two values and so there are 2k possible combinations of cargo. Pj the proﬁt. (E. as discussed in the next section. such as in the example we considered of canneries and warehouses. Often each value represents a yes-no decision that must be made. The ﬁrst scenario gives a linear programming problem with continuous variables. It will be useful to consider problems of the ﬁrst type in the process of solving the integer programming problem.. and is easier to solve. and V = 1000 the total volume available in the cargo plane.4) 0 ≤ xj ≤ 1 xj integer If we drop constraint (7. taking half the cargo yields half the proﬁt. Many practical problems can be put in this form. For this small problem we can easily solve the integer programming problem by simply enumerating all possible choices of cargos. in which linear programming problems are used to obtain bounds.2 Cargo plane loading 91 Some IP problems have a suﬃciently special form that the solution is easily found. certain transportation problems. For example. In the ﬁrst scenario above.1) subject to j Vj xj ≤ V (7.3) (7. and you can choose from the following sets of cargos to load the plane: Cargo 1 2 3 4 5 Volume 410 600 220 450 330 Proﬁt 200 60 20 40 30 For each cargo the volume and proﬁt expected from shipping this cargo are listed. The second scenario is more realistic and gives a zero-one integer programming problem. while for the actual problem each xj must be either 0 or 1. In this case k = 5 and there . have the property that the LP solution can be shown to be integer-valued. with the proﬁt adjusted accordingly.) • We must take all of the cargo or none of it. 7. using the simplex method for example.2) (7.4) then we have the linear programming problem corresponding to the ﬁrst scenario again. If we have k cargos to choose from. xj can take any value between 0 and 1. For other classes of IP problems there are special techniques which are eﬀective. The integer programming problem then has the form maximize j Pj xj (7. For many problems a branch and bound approach must be used. This is illustrated in the next section for a relatively simple class of problems called zero-one integer programming.7. Let Vj be the volume of the j’th cargo.2 Cargo plane loading Suppose you have a cargo plane with 1000 cubic meters of space available.g. There are two possible scenarios: • We are allowed to take any amount of each cargo up to the total volume. Introduce the decision variable xj to represent the fraction of the j’th cargo that we decide to take. since each decision variable is restricted to take only the values 0 or 1.

3.2. . For a larger problem it might not be possible to enumerate and check all possibilities. This suggests building up a binary tree by considering each cargo in turn. Part of such a tree is shown in Figure 7. This is easily illustrated for the cargo plane problem since this is a particularly simple form of integer programming. Moreover the real problem might be more complicated. for example a shipping company might have many planes and diﬀerent sets of cargos and want to choose the optimal way to split them up between planes. Not all will be feasible since the volume constraint may be exceeded. a 0-1 integer program in which each decision variable can take only the values 0 or 1. meaning we should take Cargos 1. Here is the enumeration of all 32 possibilities. and 5.3 Branch and bound algorithm Integer programming problems are often solved using branch and bound algorithms. with the proﬁt arising from each (which is set to zero for infeasible combinations): cargo choice 2 3 4 5 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 profit 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 30 40 70 20 50 60 90 60 90 0 0 80 0 0 0 200 230 240 0 220 250 0 0 0 0 0 0 0 0 0 0 The best combination is 10101. 7.92 Integer Programming are 32 possibilities to consider.

This is more easily solved than the integer programming problem. where the cargo is represented as ????? meaning all ﬁve decision variables are unspeciﬁed. The goal of the branch and bound algorithm is to avoid having to build up the entire tree by obtaining a bound at each branching point for the best possible solution we could ﬁnd down that branch of the tree. It is exactly these selections which lie further down the tree on this branch. Make sure you understand why! For example.7. since the solution there had x1 = 1. . This gives an upper bound UB on the best that could be obtained with this partial cargo. The essential idea is to instead solve a linear programming problem in which the variables still denoted by ? are allowed to take any value between 0 and 1.) The solution is the same as found at the root node. along with any possible choice of zeroes and ones for the cargos which are still denoted by ?.3) and also the constraint x1 = 1. with a proﬁt of 259. by changing the bounds on x1 to 1 ≤ x1 ≤ 1. At the node 1???? we solve the linear programming problem (7.1 cargo: 1???? UB: 259 cargo: ????? UB: 259 cargo: 0???? UB: 96.6 cargo: 1001? UB 252. In this case we want an upper bound on the best proﬁt we could possibly ﬁnd with any selection that contains the pattern of zeroes and ones already speciﬁed.4 cargo: 10??? UB 253.3) only.6 cargo: 101?? UB 253. This isn’t a valid solution to the integer problem.1) with the constraints (7. x2 = 0. Each of these has two branches depending on whether we take or do not take Cargo 2.2) and (7. We want to obtain this bound by doing something simpler than enumerating all possibilities down this branch and computing each proﬁt — this is what we want to avoid.9833 and x3 = x4 = x5 = 0. we obtain an upper bound on the proﬁt of any selection of cargos by solving the linear programming problem (7. in which we either take or do not take Cargo 1. at the root node ?????. At the node 0???? we solve the linear programming problem (7. The solution is x1 = 1. Continuing in this way we could build up a full tree whose leaves would be the 32 possible choices of cargos.3 Branch and bound algorithm 93 cargo: 11??? UB 0 cargo: 1011? UB 0 cargo: 10101 profit 250 cargo: 1010? UB 250 cargo: 10100 profit 220 cargo: 100?? UB 253. dropping all integer constraints.1) with the constraints (7. Again this requires taking fractional cargos and any integer selection with x1 = 0 will be even worse.7 cargo: 10011 profit 0 cargo: 10010 profit 240 cargo: 1000? UB 230 Figure 7. but whatever proﬁt it determines (perhaps with these partial cargos) is an upper bound on the proﬁt that could be obtained with the integer constraints imposed. This gives an upper bound on the proﬁt of only 96.36 if Cargo 1 is not taken.2: Illustration of the branch and bound algorithm for solving the cargo-loading problem. If UB is 0 then this choice is infeasible. From here we branch to 1???? and 0????.2) and (7.3) and also the constraint x1 = 0. The root is to the left.2) and (7. (Or simply reduce the number of decision variables and modify the objective function appropriately. In each box the values of some 0-1 integer values are speciﬁed and then the linear programming problem is solved with the other variables allowed to take any value between 0 and 1. The solution to this problem may involve non-integer values of some xj .1) with the constraints (7. but any valid selection must have a proﬁt that is no larger than this.

N .94 Integer Programming It looks more promising to take Cargo 1 than to not take it.8) with all the other xij = 0. taking Cargo 1 and Cargo 2 and solving a linear programming problem for the other variables. Also. Consider a graph with N vertices labeled 1 through N. but we ﬁnd this is infeasible since Cargo 1 and 2 together require too much volume. so that all such tours are still feasible. One constraint is that we must use exactly one edge out of each node. and so we explore down this branch ﬁrst. (For a graph with N nodes there will be N (N − 1) decision variables. i=1 (7. and let dij be the “distance” from i to j. .9) Note that u1 does not appear in these constraints. This can be done in various ways. . (7. So. . we must add other constraints. One possibility is to introduce a new set of decision variables u1 . N.) Now we wish to N N minimize i=1 j=1 dij xij (7. 7. This is not a single TSP tour. 3. j=1 (7. such as (7.4 Traveling salesman problem We now return to the Traveling Salesman Problem (TSP) of Chapter 2. At this point we can prune oﬀ this branch of the tree since every selection which lies down path must be infeasible. but rather two disconnected tours. 2. for example. To rule out the possibility of subtours. we must use exactly one edge into each node. On the other hand for a set of xij that contain subtours. then it should be possible to choose some values ui so that these constraints are satisﬁed. . . for the constraints would allow things like x12 = x23 = x31 = 1 and x45 = x54 = 1 (7. We next try 11???. Recall that we wish to ﬁnd the path through all nodes which minimizes the total distance traveled.36 is smaller than leaf values we have already found. The idea is that if we have a set of xij which correspond to a valid TSP tour. . .6) for each i = 1.7) for each j = 1. . uN which do not appear in the objective function but which have the (N − 1)2 constraints ui − uj + N xij ≤ N − 1. with N = 5 the path 1 → 2 → 3 → 5 → 4 → 1 would be encoded by setting x12 = x23 = x35 = x54 = x41 = 1 and the other 15 xij values to 0. This is not enough. For example. .9) and hence such sets will no longer be feasible.5) subject to various constraints. let xij be 1 if the edge from Node i to Node j is used and 0 otherwise. .8). . and see how this can be set up as an integer programming problem. To set this up as an IP problem. Continuing this way. so N xij = 1. for i. all of the tree lying beyond the node 0???? can be pruned oﬀ without further work since the upper bound of 96. . 2. . . we eventually work down to a leaf and ﬁnd an actual integer selection with an associated proﬁt. N . From this point on we can also prune oﬀ any branches for which the upper bound on the proﬁt is smaller than the actual proﬁt which can be obtained by a valid selection. . it should be impossible to ﬁnd a set of ui satisfying (7. This leads to N xij = 1. . j = 2. .

In this case (7. 7.9) for i = 4. Node j is visited immediately after Node i. ≤ 4 4 u5 − u4 + 5 ≤ 7.9) reads ui − uj ≤ N − 1 This will certainly be true since all the values we assigned to u variables lie between 0 and N − 1. setting the ui variables according to the order in which we visit the nodes. We can show that this satisﬁes all the constraints in (7. The constraints (7. we wish to show that if we consider a valid TSP tour (with no subtours) then we can ﬁnd an assignment of ui values that satisﬁes all these constraints. j = 5 and for i = 5. So we can compute that ui − uj + N xij = −1 + N = N − 1 and hence the inequality in (7. j) for which xij = 0.9) for each (i. these problems oﬀer fun challenges.5 IP in Recreational Mathematics 95 To see why it is impossible to satisfy these constraints if there are subtours.8). For if there were a polynomial-time algorithm to solve the general integer programming problem. then we could use it to solve the TSP. This is best illustrated with an example: 1→2→5→4→3→1 yields u1 = 0. This can be used to prove that the general integer programming problem is NP-complete.4. The basic idea is to set u1 = 0 and then follow the tour starting at Node 1. A similar argument works more generally: If there are subtours then there is always a subtour that doesn’t involve Node 1 and writing down the constraints (7. j) in this subtour gives a set of equations that. consider (7.5 IP in Recreational Mathematics Integer Programming also applies to problems in recreational math such as the challenging brain teasers found in puzzle books available at bookstores or on the World Wide Web at sites such as rec. j = 4 yield u4 − u5 + 5 Adding these together gives 10 ≤ 8 which cannot be satisﬁed no matter what values we assign to u4 and u5 . Whether tinkering with such problems in your mind or applying IP.puzzles.1 NP-completeness of integer programming We just showed that it is possible to reformulate any traveling salesman problem as an integer programming problem. u3 = 4. u4 = 3. So we cannot possibly satisfy both of the constraints above simultaneously.9) is satisﬁed (with equality in this case). u2 = 1. Next. consider the pairs (i. . and hence have a polynomial-time algorithm for the TSP. First. j) for which xij = 1. when added together. and hence uj = ui + 1. u5 = 2.9). This means we use the edge from Node i to j in our tour. for example. consider the pairs (i.7. leads to cancellation of all the ui values and an inequality that can never be satisﬁed. The second subtour deﬁned by x45 = x54 = 1 does not involve Node 1. On the other hand.

Since the problem serves as a brain teaser. .3: The well-guarded castle puzzle involves 12 guards posted around a square castle.5. we deﬁne 8 decision variables xi for i = 1. Other soldiers are posted along the face of a wall and watch only 1 side of the castle. a soldier on a corner can watch 2 walls while a soldier along a wall watches only 1. In such cases. we must know the number of soldiers positioned along each wall and at each corner. (a) Each guard is depicted with a solid dot.1 (b).5. 2. This leads to the following constraints: southern wall constraint: eastern wall constraint: northern wall constraint: western wall constraint: x1 + x2 + x3 x3 + x4 + x5 x5 + x6 + x7 x7 + x8 + x1 =5 =5 =5 =5 . A.5. and King Husky declares that the castle be guarded on every side. the best way to determine these variables is to put oneself in the shoes of the decision maker and ask the question “What do I need to know in order to fully determine the solution?” In the case of the well-guarded castle. by B. 7. Kordemsky.5. the king declared that 5 soldiers watch each wall. When posted on a corner. this puzzle serves as an illustrative example of formulating such problems as integer programming problems. a soldier can watch 2 walls.2 Formulation of the IP The solution process begins by determining the decision variables. Imagine a much larger problem with thousands of soldiers. Aid King Husky in securing the kingdom by ﬁnding an arrangement of the soldiers such that there are 5 soldiers watching each wall. .1 (a). The positions are numbered counterclockwise beginning at the southwest corner as seen in Figure 7. [Kor92] The Well-guarded Castle There is unrest in the land. Guards will be posted along a wall or at a corner of the square castle. Hence. Therefore. 8. where xi equals the number of soldiers posted at position i. Again. . As in linear programming.96 Integer Programming x x x x x7 x6 x5 North T x castle x x x8 x x1 x2 (b) x3 castle x4 x x (a) x x Figure 7. you may be able to derive an answer by inspection or trial and error. the process needed to ﬁnd a solution could easily move beyond the scope of mental calculation. Recall. 7. Note that 1 soldier is posted at each corner of the castle and 2 soldiers on each wall. . (b) The associated IP can be formulated with 8 decision variables. The king orders 12 of his bravest soldiers to guard the stone fortress and initially plans to post 2 guards on each wall and 1 soldier on each corner as depicted in Figure 7. we begin with a simple brain teaser based on a puzzle from the book. The Moscow Puzzles.1 An Example Puzzle To illustrate IP’s in recreational math.

the choice of an objective function determines which of such solutions is optimal! It should be noted that in some cases. x3 = x7 = 4 and x2 = x4 = x6 = x8 = 1. Still. Yet. The interested reader is encouraged to visit the web site. “Puzzles: Integer Programming in Recreational Mathematics” (http://www. there are several solutions to this puzzle. Yet. and xi is an integer decision variable of Q. posing an IP does not imply that it is solved. Take a moment and consider what objective function could produce this answer. A 0-1 LP (also called a 0-1 IP) is a linear programming problem where decision variables can be considered as boolean variables attaining only the values 0 or 1.6 Recasting an IP as a 0-1 LP 97 What is the puzzle’s objective? That is. Moreover.7. or by symmetry. then additional constraints may be needed. i = 1. what is a suitable objective function? As you may have noticed. Assume Q is an IP. an IP that does not enforce integral solutions is called a relaxed LP. if you desire a particular solution. Suppose King Husky wants the total number of soldiers watching the corners to be at a minimum. Imagine presenting King Husky with the solution that 1. We introduce the decision variables . The entire model for the well-guarded castle puzzle can now be succinctly stated as P : min x1 + x3 + x5 + x7 subject to x1 + x2 + x3 x3 + x4 + x5 x5 + x6 + x7 x7 + x8 + x1 8 i=1 =5 =5 =5 =5 = 12 xi 0 ≤ xi ≤ 5. It is also possible to reduce the number of guards on the corners such that x1 = x5 = x3 = x7 = 2. For instance.chlond. Even a cruel leader would recognize the foolishness of such an action. One constraint remains. P posed the well-guarded castle puzzle as an IP. every IP can be recast as a 0-1 LP.uk/academic/puzzles. .demon. and x2 = x4 = x6 = x8 = 1. Presenting King Husky with a properly posed IP and no solution might result in one’s being thrown over that mighty wall! Solving P as an LP allows variables in the solution to contain nonzero decimal values. which implies 8 i=1 xi = 12. the guards can be positioned such that x1 = x5 = 4 and x2 = x4 = x6 = x8 = 1. Techniques such as branch and bound can be applied to solve a 0-1 LP. Our solution can then be recast from the 0-1 LP back to the integer solution of the corresponding IP. solving an IP as a relaxed LP can yield intuition on the ﬁnal solution. 2. we will look at other puzzles and pose the problems as IP problems. 8 In class.html) to ﬁnd additional puzzles. Then. . Objective functions to maximize x1 + x5 or x3 + x7 could produce these answers.6 Recasting an IP as a 0-1 LP In the last section. 7. Such a solution relaxes the constraint that the decision variables are integers. task. if not impossible. King Husky (like an executive of a company who hires a consulting ﬁrm to model a phenomenon) demands an answer from a model that he can trust. As we see.1 soldiers be posted on the southern wall.co. respectively. our objective function becomes: min x1 + x3 + x5 + x7 . The king issues 12 guards. deciphering the optimal or even near optimal integer solution from a real numbered solution is often a diﬃcult. Hence. . .

Completing similar substitutions for each decision variable in P . . . Therefore. dik and let k xi = j=1 2j−1 dij (7. . . + 2k−1 dik . where dij = 0 or 1. . . Then. For P . 2. In this chapter. 2. 3. . it is suitable to let x1 = d11 + 2d12 + 4d13 . . Similarly. the solution is found. 2. . Note that the decision variables dij can also be viewed as the binary representation of the associated value of xi . 8. .10) = di1 + 2di2 + .98 Integer Programming di1 . Integer programming can model a diverse range of problems. . the bound is incremented by a ﬁxed value (say. j = 1. Such a problem can be solved using a branch and bound algorithm or linear programming software. d12 . What if our problem did not have an upper bound on the decision variables? In such cases. k. and d13 . i = 1. the IP is recast to a 0-1 LP with a large bound on each decision variable. you learned techniques and some of the diﬃculties of solving IP problems. the corresponding 0-1 LP becomes: min d11 + 2d12 + 4d13 + d31 + 2d32 + 4d33 + d51 + 2d52 + 4d73 + d71 + 2d72 + 4d73 subject to 8 i=1 3 j=1 2j−1 dij = 12 =5 =5 =5 =5 d11 + 2d12 + 4d13 + d21 + 2d22 + 4d23 + d31 + 2d32 + 4d33 d31 + 2d32 + 4d33 + d41 + 2d42 + 4d43 + d51 + 2d52 + 4d53 d51 + 2d52 + 4d53 + d61 + 2d62 + 4d63 + d71 + 2d72 + 4d73 d71 + 2d72 + 4d73 + d81 + 2d82 + 4d83 + d11 + 2d12 + 4d13 0 ≤ dij ≤ 1. di2 . . . between 10 and 50) until the solution no longer changes at which point. Note that if d11 = d13 = 1 and d12 = 0 then x1 = 3. j = 1. the decision variables are bounded below by 0 and above by 5. . every possible value of x1 can be determined by a suitable linear combination of d11 .

. Complete the formulation of Model 3: Toll Booth Scheduling.1.7. Jan-Feb 2004. 3. Explain at least one equation on the conventional side and on the composite variables side. Wilson. Barnhart. No. Give a short answer (1-2 sentences) for each question below. Complete the formulation of Model 11: A Packing Problem. pp. (a) What sort of mathematical model is being used in the software they call VOLCANO? (b) How can VOLCANO save money for UPS? (c) What is the biggest factor in this savings? (d) What sort of improvements on the routes did VOLCANO ﬁnd that the planners didn’t see? Why? (e) How has UPS actually used the suggested solution to their problem? (f) Look at the appendix. 34. Read ”UPS Optimizes Its Air Network” by Armacost. (g) Is this a good modeling paper? Why or why not? (h) Which section did you ﬁnd most informative? . Ware.7 Exercises 99 7.7 Exercises 1. 2. 15-25. Interfaces Vol.

100 Integer Programming .

which start with identical sequences for some particular piece of DNA: Species 1 Species 2 TAGCTTCATCGTTGACTTCTACTAAAAGCA TAGCTTCATCGTTGACTTCTACTAAAAGCA 101 . while “CGA” is code for “Arginine”. and only 20 diﬀerent amino acids. the construction of phylogenies. deoxyribonucleic acid. but many are not and will simply be carried down to future generations.) Each gene is a blueprint for creating a particular protein. By comparing similar sequences of DNA from diﬀerent species it is often possible to deduce information about how the species are related and how they evolved. one strand of DNA. let’s ﬁrst look at how some strings might evolve if random perturbations occur in each generation (with small probability). For example. A protein is a complex molecule made by stringing together a diﬀerent set of building blocks. The genetic material of all living organisms consist of molecules of DNA.1 The problem There are many important problems in modern genetics which require statistical and discrete mathematical modeling and algorithms. We start with a very superﬁcial (and not completely accurate) introduction to molecular biology.Math 381 Notes — University of Washington —Winter 2011 Chapter 8 Evolutionary Trees 8. but typically consisting of hundreds or thousands of bases. So a gene can be represented as a sequence of these letters.) Consider two species starting to diverge. are also important but harder to deal with. e. (Actually base pairs. A gene. When DNA is replicated from one generation to the next. Note that there are 43 = 64 distinct strings of 3 bases. To get a feel for this. and guanine (G). that show which species gave rise to others. Other mutations. thymine (T). cytosine (C). The four basic molecules (called bases) are: adenine (A). in various orders.. We will look at just one of these. (One important and diﬃcult problem in this connection is the sequence alignment problem. We consider only one type of mutation. AAG is code for “Lysine”. such as insertion or deletion of several bases in a sequence. mistakes are sometimes made. since DNA is a double strand. so the string AAGCGA codes for these two amino acids. which is important in the replication mechanism. the 20 amino acids.g. Some codons mean diﬀerent things. substitutions in which one base is replaced by another. is a code which tells how to assemble one particular protein. so typically several diﬀerent triples (or codons) code for the same amino acid. Some mutations are fatal for the oﬀspring. For our purposes. like “stop” at the end of a protein. or evolutionary trees. trying to ﬁgure out how sequences from diﬀerent organisms are best matched up. Some random mutations lead to changes which are beneﬁcial and drive the evolutionary process. or double helix. Each set of three bases determines one amino acid. These mutations may cause resulting proteins to not function properly. AAGCGA. these can be viewed simply as long chains of four types of molecules strung together.

though we expect more similarity between 2 and 3 than between these and Species 1. Repeat this process. these have become Species 1 Species 2 CAGCTTCATCGTTGACTTCTACTAAAAGCA TAGCTTCATCGTTGACTTCCACTAAAAGCA Suppose that at this point we split oﬀ another species.m on the webpage). We are looking for the tree with the best value. The basic question in determining phylogenies is the following. can we infer the relationship between the species. replicating Species 2 to obtain Species 3: Species 1 Species 2 Species 3 CAGCTTCATCGTTGACTTCTACTAAAAGCA TAGCTTCATCGTTGACTTCCACTAAAAGCA TAGCTTCATCGTTGACTTCCACTAAAAGCA and now evolve each independently for 100 generations to obtain Species 1 Species 2 Species 3 CAGCTTCATCGTTGACTTCTACTAAAAGTA TAGCTTCATCGTTGACTTGCACTAAAAGCT TAGCTTCATCGTTGACCTCCACTAAAAGCA At this point all three are diﬀerent. The basic idea is that we need a way to compare two possible trees with one another.102 Evolutionary Trees Now suppose these species diverge. suppose we now replicate Species 1 as Species 5. and usually many diﬀerent sequences are used). and say that which one is a better ﬁt to the given data. reproducing independently. After 100 generations with a 0.1% chance of mutation of each base in each generation. This requires some sort of goodness measure that we can apply to a given tree T to obtain a value f (T ).1. and construct the proper evolutionary tree? For a given set of species there are many possible trees. replicating Species 3 as Species 4 and then evolving again to obtain Species Species Species Species 1 2 3 4 CAGCTTCATCGTTGACTTCTACTAAAAGTA TTGCTTCATCGTTGAATTGGACTAAAAGCT TAGATTCATCGTTGACCTCCGCTAAAAGCA TAGCTTCATAGTTGACCTCCTCTAAAAGCA Finally. Species Species Species Species Species 1 2 3 4 5 CAGCTTCATCGTTGACTTCTACTAAAAGTA TTGCTTCATCGTTGAATTGGACTAAAAGCT TAGATTCATCGTTGACCTCCGCTAAAAGCA TAGCTTCATAGTTGACCTCCTCTAAAAGCA CAGCTTCATCGTTGACTTCTACTAAAAGTA and evolve again to obtain Species Species Species Species Species 1 2 3 4 5 CAGCTTAATCGTTCACTTCTACTAAAAGTA TTGCTCCAACGTTGAATTCGACTAAAGGCT TAGATTCATCGTTGACCTCCGCCAAAAGCG TAGCTTCATAGTTGACCTCCTCTTAAAGCA CAGCTTCATCGTTGACTTCTTATATAAGTA The relationship between these ﬁve species is illustrated in Figure 8. so we must solve an optimization problem to maximize this function. Given a set of DNA sequences such as the ones above (but often much longer in practice. . which is deﬁned over all possible trees. This can be modeled by Monte Carlo simulation (see evolution. with some small chance of mutation at each site in each generation.

which deﬁnes the function f (T ). shown for two distinct potential trees. while the tree shown in (b) requires 2 mutations. Using the tree of Figure 8.2: Possible assignments of bases at each branch point for the ﬁrst site. Once we have a speciﬁc measure. we need to eﬃciently search through all possible trees to locate the best. In general we cannot simply enumerate them all because the number of possible trees grows exponentially in the number of species.8. The bases seen can be explained. (b) A distinct tree. suppose we want to compare the two possible trees shown in Figure 8. 3. for example.2. There are three basic steps in ﬁnding the “optimal” tree: 1.2(a). Finally. parsimony and likelihood.2 Maximum parsimony “Maximum parsimony” is a term used by geneticists when constructing ancestral trees. only one mutation is required to explain this data.) .2 Maximum parsimony 103 (a) (b) 1 5 2 3 4 1 2 3 4 5 Figure 8. by the histories indicated in Figure 8. (This is in fact the true history of this site from the simulation that generated these species. Consider the ﬁrst base in each of the ﬁve DNA strings. Decide how we should measure the goodness of one tree against another. 8. and the idea here is to ﬁnd the tree which requires the least number of mutations to explain the given data.1: (a) The true phylogeny for the example in the text. The problem of ﬁnding the best is typically NP-complete. As an example.1(a) and (b) for the ﬁve species above. as shown by the line cutting one branch of the tree. (a) T T T T C T T T (b) 1 C 5 C 2 T 3 T 4 T 1 C 2 T 3 T 4 T 5 C Figure 8. 2. Parsimony means frugality or simplicity. The tree shown in ﬁgure (a) requires only 2 mutations to explain the bases observed at this site. Below we will brieﬂy consider two possibilities. we need an eﬃcient algorithm to compute a goodness value f (T ) for any potential tree T .

The maximum likelihood approach is more closely tied to probability and the statistical nature of the data. Keep a counter called parsimony which is initialized to zero.) We can deﬁne the parsimony of a tree to be the (minimum) total number of mutations required to explain all of the variation seen. (Note all of these trees are fundamentally distinct from the tree in Figure 8.T} 1 {C} 5 {C} 2 {T} 3 {T} 4 {T} 1 {C} 2 {T} 3 {T} 4 {T} 5 {C} Figure 8. the union of the two sets. 8. Note in particular that it does not give any information on the order in which branching occurred. . in the sense that it may converge to the wrong tree as we include more data. by increasing the number of sequences considered from each species. This algorithm will be discussed in class. To compute the parsimony we must consider each base in turn (all 30 of them in our example) and ﬁnd the assignment of bases to branch points which gives the least number of mutations. This same tree could just as well be drawn as shown in Figure 8. it has some limitations. but here’s the basic idea. As far as the parsimony measure is concerned. Using the tree of Figure 8.T} {T} {C. we take the intersection of the two sets whenever this is nonempty. if we assign letters to the branch points diﬀerently. the tree in Figure 8.4(b). since the evolutionary tree must have a mutation along one branch or the other below this node. working our way up from the leaves toward the root. At each node on the tree we determine a set S of possible bases.3 Maximum likelihood Parsimony makes no assumptions about the rates of mutations or about the time elapsed between branching of species.1(a) is equivalent to the tree shown in Figure 8. Fitch’s Algorithm. there must have been at least two mutations to produce this pattern of bases. and hence the parsimony. so maximizing parsimony gives only an unrooted tree. but never less.1. (Perhaps more.3: Illustration of Fitch’s algorithm applied to the ﬁrst site using two diﬀerent trees. The set at each leaf consists simply of the base observed in the corresponding species.T} (b) {T} {T} {T} {C} {C. which has a diﬀerent parsimony.2(b). • If S1 ∩ S2 = ∅. then we set S = S1 ∪ S2 .) Maximizing parsimony can be shown to fail in some cases. on the other hand. At each node further up we compute the set S based on the sets S1 and S2 of its two children according to these rules: • If S1 ∩ S2 = ∅ then we set S = S1 ∩ S2 . we perform the following algorithm.104 Evolutionary Trees (a) {C. All of these trees have the same parsimony. which is illustrated in Figure 8. In other words. While the idea of parsimony seems attractive. or the time scale between branchings.4(a). In particular. There is an eﬃcient algorithm for doing this. More generally it is impossible to say where the root of the tree is.3. For each site in the DNA sequence. In this case we also increment parsimony by 1. it is fallible if diﬀerent sites evolve with diﬀerent mutation rates.

Maximum likelihood estimation is a standard statistical tool used in many ﬁelds.62. The value of p for which L is larger is “more likely” to be correct. the standard technique from calculus. what value of p makes the observed data most likely to have arisen from 100 ﬂips of the coin? In this context L is called the likelihood. evaluating L from (8.1(a) in terms of parsimony.1) for p = 0. but we can’t judge this without comparing it to the likelihood of getting exactly 62 heads if p had some diﬀerent value.62 is more likely than p = 0. that it is about 18 times more likely that p = 0. This is easy to maximize by diﬀerentiating with respect to p and setting the derivative to zero. This gives 38 62 + =0 p 1−p from which we easily ﬁnd p = 0.62. but we don’t know what p is. if the true probability of heads is p. but we can instead consider log L. This “length” depends on the time elapsed (number of generations) as well as the mutation rate per generation. (8. Since log L is a monotonically increasing function of L.082. along with a given tree structure. The basic idea is easily explained by a standard example from coin tossing. To determine the most likely value of p. This may seem like a small likelihood. what value of p will maximize this expression? In other words. Suppose we are given a trick coin which has a probability p of coming up heads. We want to derive this mathematically. We can compare the value of L with one value of p against the value of L with some other value of p.62) = 0. Recall that. for example.5) = 18.62 gives L(0. We ﬂip the coin 100 times and observe 62 heads. so it is much less probable that one would roll exactly 62 heads with a fair coin.3 Maximum likelihood 105 (a) (b) 2 1 3 2 1 5 3 4 5 4 Figure 8. . We can compute.1) We can now ask. Hence the value p = 0. The same idea can be used in comparing two potential phylogenies.5 since L(0.33. For example.5. This seems tough since it’s a complicated expression. we also make some hypothesis about the “length” of the branches in terms of the probability of a mutation at each site between one branching and the next.0045. we must maximize L with respect to p. This is the probability of getting exactly 62 heads in 100 tosses with a coin which is biased to land on heads 62% of the time. then the probability of observing exactly 62 heads in 100 tosses is obtained from the binomial distribution as L(p) = 100 62 p62 (1 − p)38 . not just genetics. Often statisticians look at the likelihood ratio and say. for example.8.5) = 0.62 than that p = 0. Taking the log converts all the products into sums and simpliﬁes things: log L = log 100 62 + 62 log(p) + 38 log(1 − p). In this case.4: Two trees which are equivalent to the tree shown in Figure 8. if we ﬁnd the value of p which maximizes log L then this also maximizes L. What is the “most likely” value of p? The obvious answer is 0. that L(0.62)/L(0.

there are 1 · 3 · 5 · 7 = 105 distinct unrooted trees. This by itself wouldn’t generally lead to the optimal tree. For the example we have considered with 5 species. however. multiplying by the next odd integer each time we add a new species. so it is also necessary to consider “local rearrangements” where the structure of some small piece of the tree is rearranged to try to get a better tree. It may be possible to start with only two species and add species one at a time. . we are still faced with the daunting task of ﬁnding the tree which maximizes this measure. With six species the number grows to 1 · 3 · 5 · 7 · 9 = 945 and this pattern continues. in a recursive manner starting at the leaves and working up to the root. In practice various search strategies are used to try to ﬁnd the optimal tree (or at least a good tree) without enumerating and checking all possible trees. This grows faster than the factorial function. deciding in each case where to insert the additional species in the existing tree (as might naturally be done if a new species is discovered). We now want to choose the tree structure which will maximize this likelihood. Even if we work simply with unrooted trees without trying to also estimate the length of the branches. and so we can view it as a likelihood function for that tree. and have an algorithm for computing this measure for any given tree. and in general the problem of ﬁnding the optimal tree is NP-complete. A branch and bound algorithm can sometimes be developed to determine what paths are worth pursuing. parsimony or likelihood. the number of possible trees grows rapidly with the number of species. since it’s a bit complicated to explain. It can be done. for example. We won’t go into the manner in which this likelihood is computed. 8.4 Searching the forest for the optimal tree Once we have decided on a measure of goodness. then it turns out it is possible to compute the probability that the observed genetic sequences from each species arose via random mutations based on this structure.106 Evolutionary Trees If we hypothesize a particular structure for the tree. This probability depends on the tree. See [Fel81] for more details.

9.1(a) shows a hypothetical map of a forest region divided into sections based on some characteristics such as • The age of the trees. recreation? The reading assignment in Chapter 6 of [Dyk84] explores a relatively simple case. harvesting schedules must often be planned out far in advance. such as harvesting.g. Because trees grow so slowly. due to topography or road access. We will explore just a few of the issues involved in timber harvesting. usually decades at least. is really only used in the ﬁrst decade (or less). • The kind of trees. and a few years down the road a new optimal solution for the future is obtained based on current data and predictions. endangered species habitat. thin. • The diﬃculty of harvesting trees. new trees can be planted and harvested later.Math 381 Notes — University of Washington —Winter 2011 Chapter 9 Forest Resource Management 9. We are all familiar with the patchwork quilt appearance of our forests due to clear cutting. • The type of wildlife habitat provided. 107 . How do timber companies and the forest service decide where and when to cut? How are decisions made about allocating land for diﬀerent uses.. e. so the solution obtained by looking over the next hundred years. The problem is then to maximize proﬁt over a certain time period by harvesting at appropriate times. Of course the models are constantly being updated and rerun. Similar scheduling problems arising in deciding when to replant.1 Timber harvest scheduling Many forest resource management models lead to large-scale linear or integer programming problems. who may impose constraints on when they want them harvested). Figure 9. wildlife habitat. There are typically many constraints that exist on which combinations of trees can be harvested at a given time. Sensitivity analysis is often important: what if we change the parameters or the strategy in some way. along with the value of the trees based on their age. in which only the age of the trees is considered. how much eﬀect does that have? This is especially important because the data available is often incomplete or statistical in nature. so it is interesting to see how sensitive the results are to changes in the data. or fertilize existing trees. Once harvested.2 Spatial structure Even more interesting and challenging problems arise if we also consider the spatial arrangement of the forests available for harvesting. say. • Ownership (a company may have leased rights for harvesting trees on land belonging to others. one particular application of interest in the Northwest.

Note that we have a zero-one IP problem if each decision variable is forced to be 0 or 1. it may be possible to harvest Section 4 or Section 5 in Figure 9.j−1 + h5j + h5. we might have a constraint like h4j + h5. or on what combination of sections can be harvested based on proximity and habitat combinations. This would lead to a linear constraint of the form h4j + h5j ≤ 1 for each decade j. 9. The roads are broken up into pieces which can be built independently. Note that building Road 4 requires a bridge across the river in Section 3. Each section of road also has a cost Cij of building it in Decade j. visibility from highways. A variety of new constraints may be imposed based on the characteristics of the sections or their spatial location. parks.1(a) might correspond to a riparian zone. . or trails. coming oﬀ of an existing road at the bottom of the map. Sections 3 and 8 in Figure 9. since these “riparian zones” may be environmentally delicate. • Other environmental or social issues such as proximity to towns. so C4j is likely to be high. and 4 of these are numbered on the map. Let hij represent the fraction of Section i to be harvested in Decade j. We can then introduce decision variables rij which is 1 if Road i is built in Decade j and 0 otherwise. stating that there must be at least one decade between the two harvest periods. but not both at the same time. This is again a linear constraint which can be used in an IP approach.1: (a) Map of forest sections. For example.1(b) shows the previous map with a set of proposed roads superimposed.108 Forest Resource Management Road 2 10 6 7 Road 3 Existing Road 10 6 7 Road 4 Road 1 8 9 8 9 1 2 3 4 5 1 2 3 4 5 Figure 9. Figure 9. for example.j+1 ≤ 1 for each j. (b) With planned logging roads. • Proximity to streams or rivers. A few examples are: • Government regulations may limit the size of an area that is clearcut at any one time.3 Building roads A new set of interesting constraints arise if we couple the harvesting problem together with the problem of building new logging roads to reach the areas being harvested. • Protecting wildlife habitat may put constraints on what Sections can be harvested. If the time between harvesting adjacent sections is required to be longer.1(a).

and so we require h61 h62 h63 ≤ r21 ≤ r21 + r22 ≤ r21 + r22 + r23 and so on for each decade under consideration. One question is how to split up land into the sorts of sections indicated in the map above. or pieces of a size that one is allowed to harvest at one time. and also leads to other sorts of mathematical modeling: • What is the best cutting pattern to minimize risk of ﬁres? • How sensitive is the harvesting schedule to disruption by ﬁre? If one area is destroyed before being harvested does it throw everything oﬀ? (Sensitivity analysis. It is clear that the number of decision variables and constraints can be very large in scheduling problems of this form.9. weather data. then feed these into a larger-scale model to choose the best combination of these. etc. • Bottom-up approach would look for perhaps the best 20 solutions for each district. There is a set of constraints of this form for each road. Forest ﬁre protection is another consideration. especially if there are large areas that are fairly homogeneous but must be split up into manageable pieces. • This can lead to a solution that can’t be implemented. There is a set of constraints of this form for each section.j Pij hij − Cij rij subject to all of these various constraints. The big model doesn’t see all the low-level constraints. Or conversely there may be lots of small parcels of similar forest scattered around that must be aggregated into a single unit to make a tractable mathematical model. giving a new set of constraints. Real problems for millions of acres can easily involve 10. The problem is now to maximize i. splitting up land into large quantities that should be harvested at one time. fuel levels. Road 2 must be built in order to harvest Section 6.) • Simulation of forest ﬁre events using historic data. • Working upward guarantees implementability but may not give the optimal solution.000 variables and hundreds of thousands of constraints. So we need r21 r22 r23 ≤ r11 r11 + r12 r11 + r12 + r13 ≤ ≤ and so on for each decade.4 Some other issues A number of other issues arise in the context of forest management.4 Some other issues 109 In order to harvest a given section it is necessary to ﬁrst build roads reaching this section. One big question is whether to use a top-down or bottom-up approach to aggregating: • Traditional models start at top. 9. For example. . Must then ﬁnd lots of small parcels of this type of land to actually cut. Another set of constraints arises from the fact that we cannot build Road 2 until we have built Road 1.

where there are still huge areas of forest. • This may feed back into optimization in the woods — what sort of trees should be cut next? Perhaps particular trees must be found depending on order desired. e. saw logs into lumber? Need an algorithm that can decide in real time at the mill how to cut each tree as it arrives. • Trimming models are used in deciding how best to peel logs for veneer. • How to cut trees into logs.) Other sorts of problems arise in milling trees or the manufacturing wood products. (An important question in Canada..110 Forest Resource Management • Where should ﬂeets of tanker planes be stationed to maximize ability to ﬁght a ﬁre anywhere over a large region. and questions about where to optimally locate new mills based on where cutting will be done.g. • This could also tie into crew scheduling problems since diﬀerent cutting crews have diﬀerent specialties. • There are also truck scheduling problems between forests and mills. • Value of each size lumber will change with time as orders are ﬁlled. . for example.

N. John Wiley & Sons. 1981. Addison Wesley Longman. A new polynomial-time algorithm for linear programming. Computers and Intractability: A guide to the Theory of NP-Completeness. Karmarkar. 1993. A. E. The Stanford GraphBase: A Platform for Combinatorial Computing. 1979. Evolutionary trees from dna sequences: a maximum likelihood approach. Bell labs technical report. McGraw-Hill. Appl. Soviet Math. Linear Programming. 1955. 2:294–313. 1995.Math 381 Notes — University of Washington —Winter 2011 Bibliography [BD92] [Chv83] [DA93] [Dan63] [Dyk84] [Fel81] [Gil55] [GJ79] [HL95] [Jef78] [JT98] D. Freeman. Watson.. Minty. N. Aldous. Dolan and J. 1972. H. New York. 1992. Griﬃths. Evol. J. Dykstra. Diaconis. 1979. 1993. Trefethen. L. D. Mathematical Programming for Natural Resource Management.C54. 1984. In Complexity of Computer Computations. D. Dokl. New York. New York. George Dantzig. Garey and D. N. S. University Park Press. Jeﬀers. Plenum. R. Bayer and P. F. Introduction to operations research. Yorktown Heights. . Khachian. 1972.. McGraw-Hill. 20:191–194. Reducibility among combinatorial problems. Linear programming and extensions. Combinatorica. An Introduction to Systems Analysis: with Ecological Applications. In Inequalities III. and G. Networks and Algorithms. Vasek Chvatal. A numerical analyst looks at the ‘cutoﬀ phenomenon’ in o card shuﬄing and other markov chains. pages 150–178. Knuth. R. 1984. Watson Res. 1983. Hillier and G. F.Y. How good is the simplex algorithm. H. Center. Gilbert. Karp.. J´nsson and L. Higham. Ltd.. Chichester. 4:373–395. Ann.. pages 159–172. An Introductory Approach. Freeman. IBM Thomas J. NJ.. Princeton University Press.74. Victor Klee and G. M. Princeton. London. D. eds. W. J. J. M. G. T57. Lieberman. Mol. E. 17:368–376. pages 85–103. Felsenstein. N. J. Proc. Academic Press. R. P. In Numerical Analysis 1997. Johnson. A. J. 111 [Kar72] [Kar84] [Kha79] [KM72] [Knu93] . Prob. 1998. W. Trailing the dovetail shuﬄe to its lair. D. S. 1963. A polynomial algorithm in linear programming. Sympos. AddisonWesley. San Francisco. 1978..

Maki. Sipser. Swartzman and S. Skiena. pages 597–606. Redwood City. life. 1976. Shmoys. A. [ZP97] Z. Fred S. P. Cheeseman. 1990. [Mak73] D. Discrete mathematical models. S. Zhou and W. Smith. Addison-Wesley. A Course in Simulation. Modelling the control of bovine tuberculosis in badgers in england: culling and the release of lactating females. S. 1981. [SCCH97] G. J. Roberts.reseach. Dover. Pan. 1997. New York. Mathematical models and applications. . Prentice-Hall. Lawler. 2002. H. In A008292.com/ njas/sequences. Macmillan Publishing Company. reprint edition. The First Ten Years. J. J.. 7 is winning number. G. In New York Times. C. biological. B. Kolata. Applied combinatorics. Co. In Proceedings of the International Congress of Mathematicians. 1990. The Traveling salesman problem : a guided tour of combinatorial optimization. 1995. Applied Ecology. with applications to social.112 BIBLIOGRAPHY [Kol90] [Kor92] G. Marcel Dekker. S. PWS Pub. 34:363–374. unpublished manuscript. [LLAS90] E. Prentice-Hall. In card shuﬄing. 1990. Prentice-Hall. 1992. P. L. Englewood Cliﬀs. 1973. The on-line encyclopedia of integer sequences. Implementing Discrete Mathematics: Combinatorics and Graph Theory with Mathematica. [Sip97] [SK87] [Ski90] [Slo00] [ST02] M. B. 1997. COMAP. Daniel Spielman and Shang-Hua Teng. Fred S. Englewood Cliﬀs. page C1. Neil Sloane. Clifton-Hadley. Optimization algorithms for networks and graphs. L. Applied Ecology. CA.J. MCM. [Min78] [Ree81] [Rob76] [Rob84] [Ros90] E. Analysis of the viability of a giant panda population. 34:1375–1386.J. K. Ecological Simulation Primer. M. and environmental problems. J. New York. and D. Reeds. Introduction to the theory of computation. G. The Moscow Puzzles. Lenstra. C. Roberts. L. Ross. N. Kaluzny. 2000. Englewood Cliﬀs. volume I. 1984. Kordemsky. Rinnooy Kan. with emphasis on the social. MacMillan Publishing. 1990.att. [UMA95] UMAP Journal. Theory of shuﬄing.. Minieka. and management sciences. and R. http://www. N. Chichester. 1997. S. Smoothed analysis of algorithms. special edition of the UMAP Journal. Wiley-Interscience. 1987.. 1978. Boston.

- ASE Assign1summ
- tspreport
- A Survey Paper on Recent Developments of Input-Output Analysis
- 4
- Touring Problems - A Mathematical Approach - Vipul Naik
- Power-Driven Flip-Flop Merging and Relocation p107-Wang
- Ant Colony Optimization.pdf
- TREE.doc
- Comparison and Supply Chain Optimization of Poultry Firm Using Mixed Integer and Linear Fractional Program
- Hamilton Cycles
- 18 Topo Sort
- Weakly Hull Number of a Graph
- An Empirical Comparison of Graph Databases
- capitulos 10 y 11
- Math Suggestion
- LPP formulation & graphical new (2).ppt
- Smu Mba Operations Research Semester2 Questionpaper
- KheeMeng.Koh.Fengming.Dong.Eng.Guan.Tay.Introduction.to.Graph.Theory.H3.Mathematics.pdf
- Bialek J.W. Tracing Based Transmission Pricing of Cross-Border Trades - Fundamentals and Circular Flows
- 61664272 Ant Colony Optimization
- Brun - four Color theorem
- Primal Dual
- MS(syllabus-2009-10)
- Flow Networks
- Base Sep
- MATH 540 Week 11 Final Exam 2
- Final Study Guide
- 4 ABC and TOC
- CHAPTER 1
- Daa notes

Skip carousel

- tmp2EBB.tmp
- tmp7F8
- A Survey
- tmp4359.tmp
- Discrete Mathematics MCQ'S
- Discrete Mathematics MCQ'S
- tmp3C86.tmp
- A NEW S-(a,d) ANTIMAGIC LABELING OF A CLASS OF CIRCULANT GRAPHS
- tmpD746
- tmp30E5.tmp
- tmp20DC.tmp
- Discrete Mathematics
- A New S-(a,d) Antimagic Labelling of a Class of Generalized Petersen Graphs
- tmpC906
- tmp6B5B.tmp
- tmpDE4.tmp
- Network Bucket Testing
- tmpB0E6
- tmp18A.tmp
- tmpBF3
- tmp3C0A.tmp
- tmpE648
- Tmp 1387
- Algorithm and Data Structure
- Analysis and Design Algorithem
- Context Aware Decision Support Mobile Application for Recommendation in the Domain of Hotels
- A Survey on Subgraph Matching Algorithm for Graph Database
- tmp9F1C

Sign up to vote on this title

UsefulNot usefulClose Dialog## Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

Close Dialog## This title now requires a credit

Use one of your book credits to continue reading from where you left off, or restart the preview.

Loading