Professional Documents
Culture Documents
Session 16:
Numerical Dynamic Programming
Agenda
Discrete-time dynamic programming.
Continuous-time dynamic programming.
Methods for nite-state problems:
Value function iteration.
Policy function iteration.
Gaussian acceleration methods.
Methods for continuous-state problems:
Discretization.
Parametric approximation methods.
Projection methods.
Discrete-Time Dynamic Programming
The objective is to maximize the expected NPV of payos
E
t=0
(x
t
, u
t
, t) +W(x
T+1
)
t
(x, u),
where [0, 1) is the discount factor, and neither F() nor D() depend
explicitly on t.
The value function V (x) satises the Bellman equation
V (x) = max
uD(x)
(x, u) +E
V (x
+
)|x, u
V (x
+
)|x, u
.
Continuous-Time Dynamic Programming: Deterministic Case
The state at time is x(t) X R
n
continuous states.
The objective is to maximize the NPV of payos
T
0
e
t
(x, u, t)dt +W(x(T))
subject to the law of motion
x = f(x, u, t), x(0) = x
0
.
The Bellman equation is
V (x, t) V
t
(x, t) = max
uD(x,t)
(x, u, t) +
n
i=1
V
x
i
(x, t)f
i
(x, u, t)
with terminal condition V (x, T) = W(x).
In the autonomous innite-horizon case the Bellman equation becomes
V (x) = max
uD(x)
(x, u) +
n
i=1
V
x
i
(x)f
i
(x, u).
Continuous-Time Dynamic Programming: Stochastic Case
Continuous states. Brownian motion.
The objective is to maximize the expected NPV of payos
E
T
0
e
t
(x, u, t)dt +W(x(T))
i=1
V
xi
(x, t)f
i
(x, u, t)
+
1
2
tr
(x, u, t)(x, u, t)
V
xx
(x, t)
with terminal condition V (x, T) = W(x), where tr(A) is the trace of the matrix A.
In the autonomous innite-horizon case the Bellman equation becomes
V (x) = max
uD(x)
(x, u) +
n
i=1
V
xi
(x)f
i
(x, u) +
1
2
tr
(x, u)(x, u)
V
xx
(x)
.
Finite-State Problems
The set of states is X = {x
1
, x
2
, . . . , x
n
}. Time is discrete.
The law of motion is a controlled discrete-time, nite-state, rst-order
Markov process, where q
t
ij
(u) is the probability that the state transits
from x
i
to x
j
if the control is u at time t.
Finite-horizon case: Let V
t
i
= V (x
i
, t), i = 1, . . . , n, t = 0, . . . , T +1. The
Bellman equation is
V
t
i
= max
uD(x
i
,t)
(x
i
, u, t) +
n
j=1
q
t
ij
(u)V
t+1
j
with terminal condition V
T+1
i
= W(x
i
).
Recursive system of nonlinear equations. Solve backwards from t = T +1
to t = 0 for V
t
i
, i = 1, . . . , n.
Innite-horizon case: Let V
i
= V (x
i
), i = 1, . . . , n. The Bellman equation
is
V
i
= max
uD(x
i
)
(x
i
, u) +
n
j=1
q
ij
(u)V
j
.
System of nonlinear equations. The contraction mapping theorem en-
sures existence and uniqueness of a solution.
Finite-State Problems: Value Function Iteration
Dene the operator T pointwise by
(TV )
i
= max
uD(x
i
)
(x
i
, u) +
n
j=1
q
ij
(u)V
j
, i = 1, . . . , n.
Value function iteration:
Initialization: Choose initial guess V
0
and stopping criterion .
Step 1: Compute V
l+1
= TV
l
.
Step 2: If ||V
l+1
V
l
|| < , stop; otherwise, go to step 1.
The sequence
V
l
l=0
converges linearly at rate to V
and ||V
l+1
V
||
||V
l
V
||. Hence,
||V
l
V
||
||V
l+1
V
l
||
1
.
To ensure ||V
l+1
V
j=1
q
ij
(u)V
j
, i = 1, . . . , n.
Let U
i
= U(x
i
), i = 1, . . . , n, Q
U
= (q
ij
(U
i
))
i,j
,
U
= ((x
i
, U
i
))
i
. Then
the value V
U
of following policy U forever satises the system of linear
equations
V
U
=
U
+Q
U
V
U
V
U
=
I Q
U
U
.
Policy function iteration (a.k.a. Howard improvement):
Initialization: Choose initial guess V
0
and stopping criterion . (Or:
Choose U
0
instead of V
0
and go to step 2.)
Step 1: Compute U
l+1
= UV
l
.
Step 2: Solve
I Q
U
l+1
V
l+1
=
U
l+1
to obtain V
l+1
.
Step 3: If ||V
l+1
V
l
|| < , stop; otherwise, go to step 1.
Step 2 computes the value of following policy U
l+1
forever.
Finite-State Problems: Policy Function Iteration
Modied policy iteration with k steps: Replace step 2 by
Step 2a: Set W
0
= V
l
.
Step 2b: Compute W
j+1
=
U
l+1
+Q
U
l+1
W
j
, j = 0, . . . , k.
Step 2c: Set V
l+1
= W
k+1
.
Step 2 computes the value of following policy U
l+1
for k +1 periods.
The sequence
V
l
l=0
converges linearly to V
and
||V
l+1
V
|| min
,
(1
k
)
1
||U
l
U
|| +
k+1
||V
l
V
||.
Rate approaches
k+1
as U
l
approaches U
accelerated convergence.
Finite-State Problems: Gaussian Acceleration Methods
Idea: The Bellman equation is a system of nonlinear equations. Treat it
as such!
Pre-Gauss-Jacobi iteration (a.k.a. value function iteration):
V
l+1
i
= max
uD(x
i
)
(x
i
, u) +
n
j=1
q
ij
(u)V
l
j
, i = 1, . . . , n.
Gauss-Jacobi iteration:
V
l+1
i
= max
uD(x
i
)
(x
i
, u) +
j=i
q
ij
(u)V
l
j
1 q
ii
(u)
, i = 1, . . . , n.
Pre-Gauss-Seidel iteration:
V
l+1
i
= max
uD(x
i
)
(x
i
, u) +
n
j<i
q
ij
(u)V
l+1
j
+
n
ji
q
ij
(u)V
l
j
, i = 1, . . . , n.
Gauss-Seidel iteration:
V
l+1
i
= max
uD(x
i
)
(x
i
, u) +
j<i
q
ij
(u)V
l+1
j
+
j>i
q
ij
(u)V
l
j
1 q
ii
(u)
, i = 1, . . . , n.
Finite-State Problems: Gaussian Acceleration Methods
Idea: The Gauss-Seidel methods depend on the ordering of states. Ex-
ploit it!
Downwind (solid) and upwind (dashed) directions. Source: Judd, K. (1998), Figure
12.1.
Upwind Gauss-Seidel: In iteration l, rst order the state space such that
q
i,i+1
(U
l
i
) q
i+1,i
(U
l
i+1
), i = 1, . . . , n. Then traverse the states space in
decreasing order.
Simulated upwind Gauss-Seidel: In iteration l, rst simulate the Markov
process under U
l
. Then traverse the simulated states in decreasing order.
Alternating sweep Gauss-Seidel: In iteration l, traverse the state space
in increasing (decreasing) order if l is odd (even).
Continuous-State Problems: Discretization
Specify a nite-state problem that is similar to the continuous-state prob-
lem under consideration.
Example: Optimal growth. The Bellman equation is
V (k) = max
c[0,k+f(k)]
u(c) +V (k +f(k) c).
Replace the set of states [0, ) by K = {k
1
, k
2
, . . . , k
n
}. Choose K large
enough so that the initial and the steady state are contained in it.
To ensure landing on a point in K, take the control to be next periods
state and rewrite the Bellman equation as
V (k) = max
k
+
K
u(k +f(k) k
+
) +V (k
+
).
Remarks:
Easy and robust.
Sometimes requires reformulating the problem and/or altering the set
of states and controls.
Requires a large number of points, particularly if the state space is
multidimensional.
Inecient approximation to smooth problems.
Continuous-State Problems: Parametric Approximation Methods
Approximate the value function using the family of functions
V (x; a)
and use methods for nite-state problems to choose the parameters a.
Parametric dynamic programming with value function iteration:
Initialization: Choose a functional form for
V (x; a), where a R
m
,
and a set of points X = {x
i
}
n
i=1
, where n m. Choose initial guess
a
0
and stopping criterion .
Step 1 (maximization step): Compute
v
i
= max
uD(x
i
)
(x
i
, u) +
V (x
+
; a)dF(x
+
, x
i
, u), i = 1, . . . , n.
Step 2 (tting step): Compute a
l+1
such that
V (x; a
l+1
) approximates
the Lagrange data {(x
i
, v
i
)}
n
i=1
.
Step 3: If ||
V (x; a
l+1
)
V (x; a
l
)|| < , stop; otherwise, go to step 1.
Three interconnected components:
Numerical integration.
Maximization.
Function approximation (CompEcon toolbox: help cetools).
Continuous-State Problems: Parametric Approximation Methods
The computable approximation
T to the contraction mapping T may be
neither contractive nor monotonic.
Shape-preserving methods.
Dynamic programming and the shape of the value function. Source: Judd, K. (1998),
Figure 12.2.
Linear spline (C
0
).
Schumaker shape-preserving spline (C
1
; use envelope theorem to ob-
tain Hermite data {(x
i
, v
i
, v
}
n
i=1
).
Bilinear and simplicial interpolation (C
0
).
Could use policy function iteration instead of value function iteration,
but Gauss-Seidel methods are harder to adapt.
Continuous-State Problems: Projection Methods
The Bellman equation is a functional equation.
Approximate the value function using the family of functions
V (x; a)
and choose the parameters a such that
V (x; a) almost satises the
Bellman equation.
The residual function is dened pointwise by
R(x; a) =
V (x; a) + max
uD(x)
(x, u) +
V (x
+
; a)dF(x
+
, x, u).
Special case: Suppose the FOC ensures optimality. Then the value
function V (x) and the optimal policy function U(x) satisfy
V (x) = (x, U(x)) +
V (x
+
)dF(x
+
, x, U(x)),
0 =
u
(x, U(x)) +
V (x
+
)dF
u
(x
+
, x, U(x)).
Approximate the value function using the family of functions
V (x; a) and
the optimal policy function using
U(x; b).
Continuous-State Problems: Projection Methods
The residual function is dened pointwise by
R(x; a, b) =
R
1
(x; a, b)
R
2
(x; a, b)
V (x; a) +(x,
U(x; b) +
V (x
+
; a)dF(x
+
, x,
U(x; b))
u
(x,
U(x; b)) +
V (x
+
; a)dF
u
(x
+
, x,
U(x; b))
.
Even more special case: Suppose the FOC ensures optimality and can
be solved in closed form for U(x). Then the value function V (x) satises
V (x) = (x, U(x)) +
V (x
+
)dF(x
+
, x, U(x)).
Approximate the value function using the family of functions
V (x; a).
The residual function is dened pointwise by
R(x; a) =
V (x
+
; a)dF(x
+
, x, U(x)).
Projection methods are natural for continuous-time problems.