Professional Documents
Culture Documents
1|Page
Dynamic Programming
(Theory and Applications)
1- The problem can be divided into stages with a policy decision required at each
stage
2- Each stage has a number of states associated with it
3- The effect of the policy decision at each stage is to transform the current state
into a state associated with the next stage (possibly according to a probability
distribution)
4- given the current state, an optimal policy for the remaining stages is independent
of the policies adopted in previous stages. This feature is known as the
optimality principal presented by Richard Bellman [1]
5- The solution procedure begins by finding the optimal policy for each state of the
last stage
6- A recursive relationship that identifies the optimal policy for each state at
stage , given the optimal policy for each state at stage is available
7- Using this recursive relationship, the solution procedure moves backward stage
by stage each time finding the optimal policy for each state of that stage, until it
finds the optimal policy when starting at the initial stage.
2|Page
5 6 7 8 9
10
2 3 4 2 7 4 6 5 1 4
8 3
1 2 4 3 3 3 2 4 6 6 3
9 4
4 4 1 5 7 3 3
Table 1
2 5
1 3 6 10
4 7
Fig. 1
Solution:
Let the decision variables be the immediate destination on
stage . Thus, the route selected would be , where
Let ( ) be the total distance of the best overall policy for the remaining
stages, given that the salesman is, in state , ready to start stage and selects as
the immediate destination. Given and , let denote the value of that
minimizes ( ), and let ( ) be the corresponding minimum value
of ( ). Then ( ) ( ). The objective is to find ( ) and the
corresponding policy. Dynamic programming does this by successively
finding ( ) ( ) ( ) and then ( ). By this way we will have the
following tables 2, 3, 4, 5 representing the solutions to the one-stage, two-stage,
three-stage, and the four-stage problems, respectively.
( )
Table 2
( ) ( )
( )
8 9
5 4 8 4 8
6 9 7 7 9
7 6 7 6 8
Table 3
3|Page
( ) ( )
( )
5 6 7
2 11 11 12 11 5, 6
3 7 9 10 7 5
4 8 8 11 8 5, 6
Table 4
( ) ( )
( )
2 3 4
1 13 11 11 11 3, 4
Table 5
From the presentation of the previous example and its solution, the following
remarks could be deduced keeping in mind that these remarks are still valid for any
other example which can be solved using the dynamic programming approach.
Remarks 1:
1) If the decision is taken optimally starting from node 1, we will get the route
with total distance 13 which is greater than the optimum
value of the problem. This means that when it is needed to take a decision at
stages for a given problem, the decision must be taken considering the whole
problem at once and the solution is applied stage by stage.
2) The method which is used for the solution is called backward computations.
However, the same problem could be solved by what is called forward
computations. In this case, the computation started from node 1 till we reach
node 10.
3) The problem has its own recurrence relation and each other example has its own
recurrence relation and in generally to get a good experience for defining the
recurrence relation for a given example, several problems must be solved, and
even in this case, it is possible to face difficulties in defining the recurrence
relation for a given problem.
4) The dynamic programming approach for solving any problem has a parametric
feature by nature. For example, in the previous problem if we want to end at
4|Page
node 8, the optimal path will be either the route or
with optimum distance 8.
Also, if it is wanted to start from a certain place before node 1, additional tables are
needed but the old ones are used as it is. In addition, if it is wanted to more beyond
node 10, then a forward computation is recommended in this case.
Solution:
This problem is considered as a two-stage problem and the states are the slack
variables values, which are continuous.
i.e.
,
subject to
( ).
The solution is obtained directly as:
( ) , then ( ) ( ),
if ( ) , we get , i.e. ,
Otherwise ( ) if
5|Page
The first stage problem:
( ) ( ( ))
( ( ))
subject to
i.e.
( ( )
Subject to
Since , therefore,
( )
and the first stage problem takes the form
( )
subject to
i.e.,
( ),
subject to
Therefore, ( ) .
subject to
Solution
This problem is considered as three-stage problem and the states are the slack
variables values, which are continuous.
6|Page
The third stage problem:
( ) ,
subject to
i.e.
( ),
subject to
( )
The solution is obtained directly as:
( )
( ) ( )
if ( ) , we get , and
otherwise, ( ) when .
Now, we will have two situations:
(i)
In this case
( )
i.e.,
( )
subject to
7|Page
( )
We will have two cases:
either ( ) , i.e., ,
or ( ) , i.e.,
(a) If , then the second stage problem takes the form:
( )
subject to
8|Page
The problem is feasible when ( ) .
(b) , then the second stage problem takes the form
( )
subject to
i.e.,
( )
subject to
from which ( ) .
(ii) , the first stage problem takes the form (for ,)
( )
subject to
i.e.,
( ),
subject to
9|Page
from which ( ) .
For case (i):
with optimum value ( ) which is matching
with
For case (ii):
and this is not matched with the condition , and
therefore, it is rejected. Then, the optimal solution of the problem is:
( )
Remarks 2:
1) The general linear programming problem which can be solved using the
dynamic programming approach must take the form:
subject to
where
In this case, the problem can be considered as an allocation problem for which
the resources are allocated to the activities .
2) To see the advantage of solving linear programming using the dynamic
programming approach consider the following parametric linear programming
problem:
,
Subject to
10 | P a g e
The second stage problem:
( ) ( )
( ( ))
Subject to
i.e.,
( ( ))
subject to
subject to
Solution:
The problem has two stages and the states are the slack variable values, which are
continuous.
11 | P a g e
The first stage problem:
( ) ( )
subject to
Remarks 3:
1) The general nonlinear programming problem which can be solved using the
dynamic programming approach must take the form:
( )
subject to
subject to
12 | P a g e
In order that the solution should be , we must have , and in this
case ( ) .
The first stage problem:
( ) ( )
subject to
.
In order that the solution should be , it is clear that we must have ,
and in this case ( ) . Therefore,
( ) *( ) +.
Example 5: The Transportation Problem [2] (Two sources, and Three destinations)
A firm has two stores , , which contains 3 , 4 units from a certain commodity
respectively, it receives orders from three customers , and for an amount
of 2,3, and 2, respectively (fig. 2).
c1 2
3 s1
c2 3
4 s2
c3 2
Fig. 2
The distances between stores and customers in kilometers are as shown in Table 6.
If the cost of transporting one unit for 1 Km is a fixed amount , use the dynamic
approach to find the optimal distribution of commodity between stores and
customers.
Customers / Stores
100 150 200
300 200 100
Table 6
Solution:
The mathematical model for this problem takes the form
( )
subject to
13 | P a g e
.
Using the dynamic approach to solve this problem, the problem has three stages
and the states are the availabilities at the stores, which are discrete. Ignoring
from the equation, we get the following three-stage problem.
( )
( )
Table 7
s1
c3 2
2 s2
Fig. 3
s1 c2 3
𝜼
5 s2 c3 2
Fig. 4
14 | P a g e
𝟓 𝜼
( ) ( )
⁄ ( )
Table 8
( ) ( )
⁄ ( )
Table 9
c1 2
𝟑 𝟐
3 s1
c2 3
4 s2
c3 2𝟑
Fig. 5
Remarks 4:
1) This problem could be extended to any number of destinations, and it is possible
to handle also the case of three sources and in the case, we use two state
parameters . But to extend the problem to handle three sources or more, it
will be extremely difficult.
15 | P a g e
determine the best choice by adding several rows to the first stage problem as
shown in table 10.
⁄ ( ) ( ) ( )
Table 10
It is clear that the best situation is when contains 5 units and contains 2
units, the solution in this case is:
References
[1] Bellman R., Dreyfus: Applied Dynamic Programming, Princeton University
Press, Princeton, New Jersey, 1962
[2] Hadley G.: Nonlinear and Dynamic Programming, Addison-Wesley Publishing
Company, Inc., Reading Mass, 1964.
[3] Hillier, Lieberman, Introduction to operations research, 9th ed., McGraw-Hill
Companies, Inc., 2010
16 | P a g e