Professional Documents
Culture Documents
C H A P T E R
Dynamic Programming
7
B E 1
4
6 4 H
2
3
3 6
4 2
A C F J
4 3
4
3 3 I
4
■ FIGURE 11.1 1
The road system and costs D G 3
for the stagecoach problem. 5
insurance policies were offered to stagecoach passengers. Because the cost of the policy
for taking any given stagecoach run was based on a careful evaluation of the safety of that
run, the safest route should be the one with the cheapest total life insurance policy.
The cost for the standard policy on the stagecoach run from state i to state j, which
will be denoted by cij, is
B C D E F G H I J
A 2 4 3 B 7 4 6 E 1 4 H 3
C 3 2 4 F 6 3 I 4
D 4 1 5 G 3 3
For the stagecoach problem, we start with the smaller problem where the fortune
seeker has nearly completed his journey and has only one more stage (stagecoach run) to
go. The obvious optimal solution for this smaller problem is to go from his current state
(whatever it is) to his ultimate destination (state J). At each subsequent iteration, the prob-
lem is enlarged by increasing by 1 the number of stages left to go to complete the journey.
For this enlarged problem, the optimal solution for where to go next from each possible
state can be found relatively easily from the results obtained at the preceding iteration.
The details involved in implementing this approach follow.
Formulation. Let the decision variables xn (n ! 1, 2, 3, 4) be the immediate destina-
tion on stage n (the nth stagecoach run to be taken). Thus, the route selected is A !
x1 ! x2 ! x3 ! x4, where x4 ! J.
Let fn(s, xn) be the total cost of the best overall policy for the remaining stages, given
that the fortune seeker is in state s, ready to start stage n, and selects xn as the immedi-
ate destination. Given s and n, let xn* denote any value of xn (not necessarily unique) that
minimizes fn(s, xn), and let f n* (s) be the corresponding minimum value of fn(s, xn). Thus,
where
fn(s, xn) ! immediate cost (stage n) " minimum future cost (stages n " 1 onward)
! csxn " f n*"1(xn).
The value of csxn is given by the preceding tables for cij by setting i ! s (the current state)
and j ! xn (the immediate destination). Because the ultimate destination (state J) is reached
at the end of stage 4, f 5* ( J) ! 0.
The objective is to find f 1* (A) and the corresponding route. Dynamic programming
finds it by successively finding f 4*(s), f 3*(s), f 2*(s), for each of the possible states s and
then using f 2*(s) to solve for f 1*(A).3
Solution Procedure. When the fortune seeker has only one more stage to go (n ! 4),
his route thereafter is determined entirely by his current state s (either H or I) and his fi-
nal destination x4 ! J, so the route for this final stagecoach run is s ! J. Therefore, since
f 4*(s) ! f4(s, J) ! cs,J, the immediate solution to the n ! 4 problem is
n ! 4: s f 4*(s) x4*
H 3 J
I 4 J
When the fortune seeker has two more stages to go (n ! 3), the solution procedure
requires a few calculations. For example, suppose that the fortune seeker is in state F.
Then, as depicted below, he must next go to either state H or I at an immediate cost of
cF,H ! 6 or cF,I ! 3, respectively. If he chooses state H, the minimum additional cost af-
ter he reaches there is given in the preceding table as f 4*(H) ! 3, as shown above the H
node in the diagram. Therefore, the total cost for this decision is 6 " 3 ! 9. If he chooses
state I instead, the total cost is 3 " 4 ! 7, which is smaller. Therefore, the optimal choice
is this latter one, x3* ! I, because it gives the minimum cost f 3*(F) ! 7.
3
Because this procedure involves moving backward stage by stage, some writers also count n backward to denote
the number of remaining stages to the destination. We use the more natural forward counting for greater simplicity.
11.1 A PROTOTYPE EXAMPLE FOR DYNAMIC PROGRAMMING 441
3
H
6
3
I
4
Similar calculations need to be made when you start from the other two possible states
s ! E and s ! G with two stages to go. Try it, proceeding both graphically (Fig. 11.1)
and algebraically [combining cij and f 4*(s) values], to verify the following complete re-
sults for the n ! 3 problem.
E 4 8 4 H
F 9 7 7 I
G 6 7 6 H
The solution for the second-stage problem (n ! 2), where there are three stages to
go, is obtained in a similar fashion. In this case, f2(s, x2) ! csx " f 3*(x2). For example,
2
4
E
3
7
2
C F
4
G
6
Making similar calculations when you start from state B or D (try it) yields the fol-
lowing results for the n ! 2 problem:
B 11 11 12 11 E or F
C 7 9 10 7 E
D 8 8 11 8 E or F
In the first and third rows of this table, note that E and F tie as the minimizing value of
x2, so the immediate destination from either state B or D should be x2* ! E or F.
Moving to the first-stage problem (n ! 1), with all four stages to go, we see that the
calculations are similar to those just shown for the second-stage problem (n ! 2), except
now there is just one possible starting state s ! A, as depicted below.
11
B
2
7
4
A C
3
D
8
These calculations are summarized next for the three alternatives for the immediate
destination:
x1 ! B: f1(A, B) ! cA,B " f 2*(B) ! 2 " 11 ! 13.
x1 ! C: f1(A, C) ! cA,C " f 2*(C) ! 4 " 7 ! 11.
x1 ! D: f1(A, D) ! cA,D " f 2*(D) ! 3 " 8 ! 11.
Since 11 is the minimum, f 1*(A) ! 11 and x1* ! C or D, as shown in the following table:
A 13 11 11 11 C or D
An optimal solution for the entire problem can now be identified from the four ta-
bles. Results for the n ! 1 problem indicate that the fortune seeker should go initially to
either state C or state D. Suppose that he chooses x1* ! C. For n ! 2, the result for s ! C
is x2* ! E. This result leads to the n ! 3 problem, which gives x3* ! H for s ! E, and the
n ! 4 problem yields x4* ! J for s ! H. Hence, one optimal route is A ! C ! E !
H ! J. Choosing x1* ! D leads to the other two optimal routes A ! D ! E ! H ! J
and A ! D ! F ! I ! J. They all yield a total cost of f 1*(A) ! 11.
These results of the dynamic programming analysis also are summarized in Fig. 11.2.
Note how the two arrows for stage 1 come from the first and last columns of the
n ! 1 table and the resulting cost comes from the next-to-last column. Each of the other
11.2 CHARACTERISTICS OF DYNAMIC PROGRAMMING PROBLEMS 443
Stage: 1 2 3 4
■ FIGURE 11.2 11 4
Graphical display of the 7
dynamic programming B E 1 3
solution of the stagecoach
4
problem. Each arrow shows H
an optimal policy decision 3 3
(the best immediate 11 7 7
destination) from that state, 4
where the number by the State: A C F J
state is the resulting cost 4 3
from there to the end. 4
Following the boldface 3 1 3
arrows from A to J gives the I
three optimal solutions (the 4
three routes giving the D G
minimum total cost of 11). 8 6
arrows (and the resulting cost) comes from one row in one of the other tables in just
the same way.
You will see in the next section that the special terms describing the particular con-
text of this problem—stage, state, and policy—actually are part of the general terminol-
ogy of dynamic programming with an analogous interpretation in other contexts.