You are on page 1of 13

ENGSCI 760: Dynamic programming Formulations Stochastic dynamic programming Stopping problems Innite-horizon dynamic

Example 1: Innite horizon shortest path problems

Consider a shortest path problem in which we seek a never-ending


path from each node j through nodes in a directed graph. We
assume that every node has an arc leaving it, and that it takes one
time period to traverse an arc, and that the arc costs c (j, k ) are
discounted according to the time that we enter the arc. For each
node i we seek a path starting at i which has minimum discounted
cost. Of course since there is only a nite number of nodes and
arcs, the path must eventually return to some node that it has
already visited. This forms a circuit. Assuming that an optimal
solution to the problem is stationary the path from i goes to a
node on the circuit and then repeats the circuit ad innitum.

ENGSCI 760: Dynamic programming Formulations Stochastic dynamic programming Stopping problems Innite-horizon dynamic

For example, nd the least-cost discounted innite shortest path in


Figure 1 below, where the discount rate = 30%. The numbers
next to the arcs are the c (j, y ).
4

4
1

2
4

Find the innite discounted shortest path from each node

ENGSCI 760: Dynamic programming Formulations Stochastic dynamic programming Stopping problems Innite-horizon dynamic

We can formulate this problem as a Markov decision process by


choosing the action yj = 1, 2, . . . , n in state j to be the next node
after j on the innite path. The return Rj (y ) is then c (j, y ). We
then assign the transition probability pjk (y ) = 1, if k = y , and 0
otherwise. Then we have the dynamic programming recursion:
n

Vj =

min fc (j, y ) +

y =1,2,...,n

pjk (y )Vk g

k =1

This gives for each j,


Vj =

min fc (j, y ) + Vy g.

y =1,2,...,n

ENGSCI 760: Dynamic programming Formulations Stochastic dynamic programming Stopping problems Innite-horizon dynamic

Solution by value iteration

Suppose we let Vj be the present value of an innite path starting


from node j (with discount factor ) then this satises the DP
recursion
Vj =

min fc (j, y ) + Vy g,

y =1,2,...,n

j = 1, 2, . . . , n.

Value iteration is dened by the following algorithm:


For = 1, 2, . . ., let
Vj+1 =

min fc (j, y ) + Vy g,

y =1,2,...,n

j = 1, 2, . . . , n.

ENGSCI 760: Dynamic programming Formulations Stochastic dynamic programming Stopping problems Innite-horizon dynamic

1+r

c(i,j)
i\j
1
2
3
4

1
100
4
100
1

2
4
100
1
100

3
100
5
100
100

4
3
100
100
100

0.00
0.00
0.00
0.00

1
2
3
4

1
100
4
100
1

2
4
100
1
100

3
100
5
100
100

4
3
100
100
100

1
2
3
4

1
100
4
100
1

2
4
100
1
100

3
100
5
100
100

1
2
3
4

1
100
4
100
1

2
4
100
1
100

3
100
5
100
100

i\j

i\j

i\j

1.3

V/(1+r)

V/(1+r)

+c

new V

0.00
0.00
0.00
0.00

0.00
0.00
0.00
0.00
100.00
4.00 100.00
3.00
4.00 100.00
5.00 100.00
100.00
1.00 100.00 100.00
1.00 100.00 100.00 100.00

3.00
4.00
1.00
1.00

3.00
4.00
1.00
1.00

2.31
3.08
0.77
0.77

2.31
3.08
0.77
0.77
102.31
7.08 100.77
3.77
6.31 103.08
5.77 100.77
102.31
4.08 100.77 100.77
3.31 103.08 100.77 100.77

3.77
5.77
4.08
3.31

4
3
100
100
100

3.77
5.77
4.08
3.31

2.90
4.44
3.14
2.54

2.90
4.44
3.14
2.54
102.90
8.44 103.14
5.54
6.90 104.44
8.14 102.54
102.90
5.44 103.14 102.54
3.90 104.44 103.14 102.54

5.54
6.90
5.44
3.90

4
3
100
100
100

5.54
6.90
5.44
3.90

4.26
5.31
4.18
3.00

4.26
5.31
4.18
3.00
104.26
9.31 104.18
6.00
8.26 105.31
9.18 103.00
104.26
6.31 104.18 103.00
5.26 105.31 104.18 103.00

6.00
8.26
6.31
5.26

..
.

ENGSCI 760: Dynamic programming Formulations Stochastic dynamic programming Stopping problems Innite-horizon dynamic

ENGSCI 760: Dynamic programming Formulations Stochastic dynamic programming Stopping problems Innite-horizon dynamic

Solution by policy iteration


A policy is dened in this example by the vector y . Policy iteration
seeks the best vector y using an improvement algorithm like the
following:
1

Set = 1

Select an initial policy y dening the next node to visit for


every node.

For each node j compute Vj dened by the present value of


using y .

Compute a new policy using


yj+1 = arg minfc (j, y ) + Vy g
y

If y +1 is not equal to y then update the candidate policy,


and go to step 2, otherwise stop.

ENGSCI 760: Dynamic programming Formulations Stochastic dynamic programming Stopping problems Innite-horizon dynamic

Start with policy y =


Discount
i

2 1 2 1

>

1.3
Next
1
2
3
4

C(i,j)
2
1
2
1

100
4
100
1

4
100
1
100

100
5
100
100

3
100
100
100

Cost
1
2
3
4

17.33331
17.33331
14.33331
14.33331

4
4
1
1

3.076923 2.366864 1.820665


3.076923 2.366864 1.820665
13.33331
13.33331

1.401 1.077 0.828705 0.637465265 0.490358


1.401 1.077 0.828705 0.637465265 0.490358

V
i

Next
1
2
3
4

Discounted

C(i,j)
4
3
2
1

100
4
100
1

4
100
1
100

100
5
100
100

3
100
100
100

17.33331
17.33331
14.33331
14.33331

13.33
13.33
11.03
11.03

0.377198 0.290153 0.223194 0.1717


0.377198 0.290153 0.223194 0.1717

c + V/(1+r)
13.33
113.33
17.33
113.33
14.33

13.33
17.33
113.33
14.33
113.33

V
11.03
111.03
16.03
111.03
111.03

11.03
14.03
111.03
111.03
111.03

14.03
16.03
14.33
14.33

Cost

1 9.231872
2 14.13042
3 11.86955
4 8.10144

3 0.769231 1.775148 0.455166


5 0.769231 2.95858 0.455166
1 10.86955
1 7.10144

1.05 0.269 0.621529 0.159366316 0.367768


1.751 0.269 1.035881 0.159366316 0.612947

V
i

Next
1
2
3
4

Discounted

C(i,j)
4
1
2
1

100
4
100
1

4
100
1
100

100
5
100
100

3
100
100
100

9.231872
14.13042
11.86955
8.10144

7.10
10.87
9.13
6.23

0.0943 0.217614 0.055799 0.1288


0.0943 0.362691 0.055799 0.2146

c + V/(1+r)
7.10
107.10
11.10
107.10
8.10

10.87
14.87
110.87
11.87
110.87

V
9.13
109.13
14.13
109.13
109.13

6.23
9.23
106.23
106.23
106.23

9.23
11.10
11.87
8.10

Cost

1 9.231872
2 11.10144
3 9.539569
4 8.10144

3 0.769231 1.775148 0.455166


4 7.10144
1 8.539569
1 7.10144

1.05 0.269 0.621529 0.159366316 0.367768

V
i

Next
1
2
3
4

Discounted

C(i,j)
4
1
2
1

100
4
100
1

4
100
1
100

100
5
100
100

3
100
100
100

9.231872
11.10144
9.539569
8.10144

7.10
8.54
7.34
6.23

0.0943 0.217614 0.055799 0.1288

c + V/(1+r)
7.10
107.10
11.10
107.10
8.10

8.54
12.54
108.54
9.54
108.54

V
7.34
107.34
12.34
107.34
107.34

6.23
9.23
106.23
106.23
106.23

9.23
11.10
9.54
8.10

ENGSCI 760: Dynamic programming Formulations Stochastic dynamic programming Stopping problems Innite-horizon dynamic

Solution by linear programming


Recall that for each j,
Vj =

min fc (j, y ) + Vy g,

y =1,2,...,n

whence
Vj

c (j, y ) + Vy ,

j, y = 1, 2, . . . , n,

with equality at some y for each j. Thus the recursion can be


solved by maximizing V1 + V2 +
+ Vn subject to these
inequalities:
P: maximize
s.t.
Vj

V1 + V2 +
+ Vn
c (j, y ) + Vy , j, y = 1, 2, . . . , n.

ENGSCI 760: Dynamic programming Formulations Stochastic dynamic programming Stopping problems Innite-horizon dynamic

The dual is
D: minimize
nj=1 ny =1 c (j, y )x (j, y )
n
s.t.
y =1 x (j, y ) ny =1 x (y , j ) = 1, j = 1, 2, . . . , n.
x (j, y ) 0.

ENGSCI 760: Dynamic programming Formulations Stochastic dynamic programming Stopping problems Innite-horizon dynamic

The linear program to solve in the example is thus


D: minimize
s.t.

x (j, y )
The solution is Excel is

0.

ENGSCI 760: Dynamic programming Formulations Stochastic dynamic programming Stopping problems Innite-horizon dynamic

The solution is given by


j yj
1
2
3
4
The Sensitivity Report yields

ENGSCI 760: Dynamic programming Formulations Stochastic dynamic programming Stopping problems Innite-horizon dynamic

The objective for each j is given by


j Vj
1
2
3
4

You might also like