Mpojkpojpoj

ENGSCI 760: Dynamic programming Formulations Stochastic dynamic programming Stopping problems Innite-horizon dynamic
Example 1: Innite horizon shortest path problems
Consider a shortest path problem in which we seek a never-ending

path from each node j through nodes in a directed graph. We
assume that every node has an arc leaving it, and that it takes one
time period to traverse an arc, and that the arc costs c (j, k ) are
discounted according to the time that we enter the arc. For each
node i we seek a path starting at i which has minimum discounted
cost. Of course since there is only a nite number of nodes and
arcs, the path must eventually return to some node that it has
already visited. This forms a circuit. Assuming that an optimal
solution to the problem is stationary the path from i goes to a
node on the circuit and then repeats the circuit ad innitum.
For example, nd the least-cost discounted innite shortest path in

Figure 1 below, where the discount rate = 30%. The numbers
next to the arcs are the c (j, y ).
4
4
1
2
4
Find the innite discounted shortest path from each node
We can formulate this problem as a Markov decision process by

choosing the action yj = 1, 2, . . . , n in state j to be the next node
after j on the innite path. The return Rj (y ) is then c (j, y ). We
then assign the transition probability pjk (y ) = 1, if k = y , and 0
otherwise. Then we have the dynamic programming recursion:
n
Vj =
min fc (j, y ) +
y =1,2,...,n
pjk (y )Vk g
k =1
This gives for each j,

Vj =
min fc (j, y ) + Vy g.
y =1,2,...,n
Solution by value iteration
Suppose we let Vj be the present value of an innite path starting

from node j (with discount factor ) then this satises the DP
recursion
Vj =
min fc (j, y ) + Vy g,
y =1,2,...,n
j = 1, 2, . . . , n.
Value iteration is dened by the following algorithm:

For = 1, 2, . . ., let
Vj+1 =
y =1,2,...,n
j = 1, 2, . . . , n.
1+r
c(i,j)
i\j
1
2
3
4
1
100
4
100
1
2
4
100
1
100
3
100
5
100
100
4
3
100
100
100
0.00
0.00
0.00
0.00
1
2
3
4
1
100
4
100
1
2
4
100
1
100
3
100
5
100
100
4
3
100
100
100
1
2
3
4
1
100
4
100
1
2
4
100
1
100
3
100
5
100
100
1
2
3
4
1
100
4
100
1
2
4
100
1
100
3
100
5
100
100
i\j
i\j
i\j
1.3
V/(1+r)
V/(1+r)
+c
new V
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
100.00
4.00 100.00
3.00
4.00 100.00
5.00 100.00
100.00
1.00 100.00 100.00
1.00 100.00 100.00 100.00
3.00
4.00
1.00
1.00
3.00
4.00
1.00
1.00
2.31
3.08
0.77
0.77
2.31
3.08
0.77
0.77
102.31
7.08 100.77
3.77
6.31 103.08
5.77 100.77
102.31
4.08 100.77 100.77
3.31 103.08 100.77 100.77
3.77
5.77
4.08
3.31
4
3
100
100
100
3.77
5.77
4.08
3.31
2.90
4.44
3.14
2.54
2.90
4.44
3.14
2.54
102.90
8.44 103.14
5.54
6.90 104.44
8.14 102.54
102.90
5.44 103.14 102.54
3.90 104.44 103.14 102.54
5.54
6.90
5.44
3.90
4
3
100
100
100
5.54
6.90
5.44
3.90
4.26
5.31
4.18
3.00
4.26
5.31
4.18
3.00
104.26
9.31 104.18
6.00
8.26 105.31
9.18 103.00
104.26
6.31 104.18 103.00
5.26 105.31 104.18 103.00
6.00
8.26
6.31
5.26
..
.
Solution by policy iteration

A policy is dened in this example by the vector y . Policy iteration
seeks the best vector y using an improvement algorithm like the
following:
1
Set = 1
Select an initial policy y dening the next node to visit for

every node.
For each node j compute Vj dened by the present value of

using y .
Compute a new policy using

yj+1 = arg minfc (j, y ) + Vy g
y
If y +1 is not equal to y then update the candidate policy,

and go to step 2, otherwise stop.
Start with policy y =

Discount
i
2 1 2 1
>
1.3
Next
1
2
3
4
C(i,j)
2
1
2
1
100
4
100
1
4
100
1
100
100
5
100
100
3
100
100
100
Cost
1
2
3
4
17.33331
17.33331
14.33331
14.33331
4
4
1
1
3.076923 2.366864 1.820665

3.076923 2.366864 1.820665
13.33331
13.33331
1.401 1.077 0.828705 0.637465265 0.490358

1.401 1.077 0.828705 0.637465265 0.490358
V
i
Next
1
2
3
4
Discounted
C(i,j)
4
3
2
1
100
4
100
1
4
100
1
100
100
5
100
100
3
100
100
100
17.33331
17.33331
14.33331
14.33331
13.33
13.33
11.03
11.03
0.377198 0.290153 0.223194 0.1717

0.377198 0.290153 0.223194 0.1717
c + V/(1+r)
13.33
113.33
17.33
113.33
14.33
13.33
17.33
113.33
14.33
113.33
V
11.03
111.03
16.03
111.03
111.03
11.03
14.03
111.03
111.03
111.03
14.03
16.03
14.33
14.33
Cost
1 9.231872
2 14.13042
3 11.86955
4 8.10144
3 0.769231 1.775148 0.455166

5 0.769231 2.95858 0.455166
1 10.86955
1 7.10144
1.05 0.269 0.621529 0.159366316 0.367768

1.751 0.269 1.035881 0.159366316 0.612947
V
i
Next
1
2
3
4
Discounted
C(i,j)
4
1
2
1
100
4
100
1
4
100
1
100
100
5
100
100
3
100
100
100
9.231872
14.13042
11.86955
8.10144
7.10
10.87
9.13
6.23
0.0943 0.217614 0.055799 0.1288

0.0943 0.362691 0.055799 0.2146
c + V/(1+r)
7.10
107.10
11.10
107.10
8.10
10.87
14.87
110.87
11.87
110.87
V
9.13
109.13
14.13
109.13
109.13
6.23
9.23
106.23
106.23
106.23
9.23
11.10
11.87
8.10
Cost
1 9.231872
2 11.10144
3 9.539569
4 8.10144
3 0.769231 1.775148 0.455166

4 7.10144
1 8.539569
1 7.10144
1.05 0.269 0.621529 0.159366316 0.367768
V
i
Next
1
2
3
4
Discounted
C(i,j)
4
1
2
1
100
4
100
1
4
100
1
100
100
5
100
100
3
100
100
100
9.231872
11.10144
9.539569
8.10144
7.10
8.54
7.34
6.23
0.0943 0.217614 0.055799 0.1288
c + V/(1+r)
7.10
107.10
11.10
107.10
8.10
8.54
12.54
108.54
9.54
108.54
V
7.34
107.34
12.34
107.34
107.34
6.23
9.23
106.23
106.23
106.23
9.23
11.10
9.54
8.10
Solution by linear programming

Recall that for each j,
Vj =
y =1,2,...,n
whence
Vj
c (j, y ) + Vy ,
j, y = 1, 2, . . . , n,
with equality at some y for each j. Thus the recursion can be

solved by maximizing V1 + V2 +
+ Vn subject to these
inequalities:
P: maximize
s.t.
Vj
V1 + V2 +
+ Vn
c (j, y ) + Vy , j, y = 1, 2, . . . , n.
The dual is
D: minimize
nj=1 ny =1 c (j, y )x (j, y )
n
s.t.
y =1 x (j, y ) ny =1 x (y , j ) = 1, j = 1, 2, . . . , n.
x (j, y ) 0.
The linear program to solve in the example is thus

D: minimize
s.t.
x (j, y )
The solution is Excel is
0.
The solution is given by

j yj
1
2
3
4
The Sensitivity Report yields
The objective for each j is given by

j Vj
1
2
3
4

Mpojkpojpoj

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Mpojkpojpoj

Uploaded by

Copyright:

Available Formats

ENGSCI 760: Dynamic programming Formulations Stochastic dynamic programming Stopping problems Innite-horizon dynamic

Example 1: Innite horizon shortest path problems

Consider a shortest path problem in which we seek a never-ending

For example, nd the least-cost discounted innite shortest path in

Find the innite discounted shortest path from each node

We can formulate this problem as a Markov decision process by

This gives for each j,

Solution by value iteration

Suppose we let Vj be the present value of an innite path starting

Value iteration is dened by the following algorithm:

Solution by policy iteration

Select an initial policy y dening the next node to visit for

For each node j compute Vj dened by the present value of

Compute a new policy using

If y +1 is not equal to y then update the candidate policy,

Start with policy y =

3.076923 2.366864 1.820665

1.401 1.077 0.828705 0.637465265 0.490358

0.377198 0.290153 0.223194 0.1717

3 0.769231 1.775148 0.455166

1.05 0.269 0.621529 0.159366316 0.367768

0.0943 0.217614 0.055799 0.1288

3 0.769231 1.775148 0.455166

1.05 0.269 0.621529 0.159366316 0.367768

0.0943 0.217614 0.055799 0.1288

Solution by linear programming

with equality at some y for each j. Thus the recursion can be

The linear program to solve in the example is thus

The solution is given by

The objective for each j is given by

You might also like