Professional Documents
Culture Documents
When we use this recursive relationship, the solution procedure moves back- 401
ward by stage-each time finding the optimal policy for that stage-until it
stage' Dynamic Programming
finds the optimal policy starting at the initial stage.
This backward movernent was demonstrated by the stagecoach problem, where
the optimal policy was found successively beginning in each state at stages 4, 3, 2,
and 1, respectively.t For all dynamic programrning problems, a table such as the
following one would be obtained foreach stage (n : N, N - 1, . . ., 1).
When this table is finally obtained for the initial stage (n : l), the problem of interest
is solved. Because the initial state is known, the initial decision is specified by xf in
this table. The optimal value of the other decision variables is then specified by the
other tables in turn according to the state ofthe system that results from the preceding
decisions.
Stagc Stags
n n*l
State:
@ Contribution
/"(s", x.) of x" .,f,1* 1(r,+ r)
t Actually, for this pmblem tlre solution procedurc can rxrve citherbxkward or forward' However, for
many pmblems (cspecially wlren the stagcs conespond lo tit t
pcriodsr, the solution proc.€due ,ru.sa move
backward.
402 one way of categorizing deterministic dynamic programming problems is by
Mathematiczl the form of the objective function. For example, the objective might be to minimize
Pmgramming the sum of the contributions from the individual stages (as for the stagecoach problem),
or to maximree such a sum, or to minimize a produd of such terms, and so on.
Another categorization is in terms of the nature of the ser of states for the respective
stages. In particular, the states sn might be representable by a discrete state variable
(as for the stagecoach problem), or by a continuous state variable, or perhaps a state
vector (roore than one variable) is required.
Several examples are presented to illustrate these various possibilities. More
important, they illustrate that these apparently major differences are actually quite
inconsequential (except in terms of computational difficulty) because the underlying
basic structure shown in Fig. 11.2 always remains the same.
The first new example arises in a much different context from the stagecoach
problem, but it has the same mathematicalformulation except that the objective is to
maximize rather than minimize a sum.
\
[: tFl
it
!
r'*-)
0
fi
0
50
/
45 70
75 g)
ll0 100
r50 130
-re.
stages in .a dynamic prograrnming formulation. The decision variables xo (n I , 2, 3) : 403
would be the number of teams to allocate to st4ge (country) n.
Dpamic Progranming
The identification of the states may not be readily appar"nt. To determine the
stat€s, we ask questions such as the following. What is it that changes from one stage
to the next? Given that the decisions have been made at the previous stages, how can
the status of the situation at the current stage be described? What information abgut
the current state of affairs is necessary to determine the optimal policy hereafter? On
these bases, an appropriate choice for the "state of the system" is
Thus, at stage 1 (country l), where all three countries remain under consideration for
allocations, rr = 5. However, at stage 2 or 3 (country 2 or 3), sn is just 5 minus the
number bf teams allocated at preceding stages. With the dynamic programming pro-
cedure of solving backward stage by stage, when we are solving at stage 2 or 3, we
shall not yet have solved for the allocations at the preceding stages. Therefore, we
shall consider every possible state we could be in at stage 2 or 3, namely, J, = 0,
1,2,3,4, or 5,
l*t p,(x) be the measure of performance from allocating r, medical teams to
counbry i, as given in Table 11.1. Thus the objective is to choose x1, x2, 13 so as to
3
Maximize ) p,(r,),
3
subject to ) r, = 5,
i= I
)t,=",
l=a
Sor.urrox Pnncrounp: Beginning with the last stage (n : 3), we note that the
values of p3(x3) are given in the last column of Table 11. I, and that these values keep
increasing as we moye down the column. Therefore, with s, medical teams still
t* available for allocation to country 3, the maximum of p3(rJ is automatically achieved
by altocating all .r3 teams, so rl :
s, and fj(sr) :
pr(sr) as shown in the following
table.
n=3:
0 0
I I
2 ,
3 3
4 4
5 5
We now illove bacl$rad to start from the next-to-last stage (n : 2). Here,
finding .rf requires elculating and comparing fdsz, +) for thc altornative valtm of
ra, natrply, 4 = a, 1, . . ., s2. To illusEate, wo &Pict this sifiration whcn s2 : 2
graphically:
This diagrarn cur€sporlds !o Ftg. 11.3 exccp that dl duee posoiblc states at stage
3 arc shown. Thus, if :2 = 0, the rcsulting 6ta& at stage 3 will be sz - xz : / -
O = 2, wlrcreas r, = 1 l,eads to state I and .r2 : 2 leads to statc 0. Thc corresponding
values of p2(xr) from the country 2 column of Table I l. I are shown along the links, 405
and the values of f!(sz - xr) from the z : 3 table are given next to the stage 3 Dpamic Programming
nodes. The required calculations for this case of sz : 2 are summarized below.
n=2: J1 0 I 2 1 4 ) ti3) x2
0 0 0 0
I 50 ?o 50 0
2 70 70 45 m 0or I
3 80 90 95 75 95 a
We now are rcady to move backward to solve the original problcm where we
arc starting fum stage I (n = l). In this case, the only state to be considered is the
starting state of f,t :
5, as depicted below.
.r, :5.
fr(5,5):pr(5) + fi(g): lfr + O:120.