You are on page 1of 17

Computational Economics

Session 16:
Numerical Dynamic Programming
Agenda
Discrete-time dynamic programming.
Continuous-time dynamic programming.
Methods for nite-state problems:
Value function iteration.
Policy function iteration.
Gaussian acceleration methods.
Methods for continuous-state problems:
Discretization.
Parametric approximation methods.
Projection methods.
Discrete-Time Dynamic Programming
The objective is to maximize the expected NPV of payos
E

t=0
(x
t
, u
t
, t) +W(x
T+1
)

subject to the law of motion


Pr(x
t+1
x|x
t
, u
t
, t) = F(x, x
t
, u
t
, t), x
0
given.
Notation:
is the per-period payo.
W is the terminal payo.
x
t
X is the state; X is the set of states.
u
t
D(x
t
, t) is the control; D(x
t
, t) is the nonempty set of feasible
controls in state x
t
at time t.
Discrete-Time Dynamic Programming
The value function V (x
t
, t) is the maximum expected NPV of payos
from time t onward if the state at time t is x
t
.
The value function V (x
t
, t) satises the Bellman equation
V (x
t
, t) = max
u
t
D(x
t
,t)
(x
t
, u
t
, t) +E
t
{V (x
t+1
, t +1)|x
t
, u
t
}
with terminal condition V (x
T+1
, T +1) = W(x
T+1
).
The optimal policy function U(x
t
, t) satises
U(x
t
, t) arg max
u
t
D(x
t
,t)
(x
t
, u
t
, t) +E
t
{V (x
t+1
, t +1)|x
t
, u
t
} .
In the autonomous, discounted, innite-horizon case (x, u, t) is replaced
by

t
(x, u),
where [0, 1) is the discount factor, and neither F() nor D() depend
explicitly on t.
The value function V (x) satises the Bellman equation
V (x) = max
uD(x)
(x, u) +E

V (x
+
)|x, u

and the optimal policy function U(x) satises


U(x) arg max
uD(x)
(x, u) +E

V (x
+
)|x, u

.
Continuous-Time Dynamic Programming: Deterministic Case
The state at time is x(t) X R
n
continuous states.
The objective is to maximize the NPV of payos

T
0
e
t
(x, u, t)dt +W(x(T))
subject to the law of motion
x = f(x, u, t), x(0) = x
0
.
The Bellman equation is
V (x, t) V
t
(x, t) = max
uD(x,t)
(x, u, t) +
n

i=1
V
x
i
(x, t)f
i
(x, u, t)
with terminal condition V (x, T) = W(x).
In the autonomous innite-horizon case the Bellman equation becomes
V (x) = max
uD(x)
(x, u) +
n

i=1
V
x
i
(x)f
i
(x, u).
Continuous-Time Dynamic Programming: Stochastic Case
Continuous states. Brownian motion.
The objective is to maximize the expected NPV of payos
E

T
0
e
t
(x, u, t)dt +W(x(T))

subject to the law of motion


dx = f(x, u, t)dt +(x, u, t)dz, x(0) = x
0
,
where
f(x, u, t) is the n 1 vector of instantaneous drifts;
(x, u, t) is the n n matrix of instantaneous standard deviations;
dz is white noise.
The Bellman equation is
V (x, t) V
t
(x, t) = max
uD(x,t)
(x, u, t) +
n

i=1
V
xi
(x, t)f
i
(x, u, t)
+
1
2
tr

(x, u, t)(x, u, t)

V
xx
(x, t)

with terminal condition V (x, T) = W(x), where tr(A) is the trace of the matrix A.
In the autonomous innite-horizon case the Bellman equation becomes
V (x) = max
uD(x)
(x, u) +
n

i=1
V
xi
(x)f
i
(x, u) +
1
2
tr

(x, u)(x, u)

V
xx
(x)

.
Finite-State Problems
The set of states is X = {x
1
, x
2
, . . . , x
n
}. Time is discrete.
The law of motion is a controlled discrete-time, nite-state, rst-order
Markov process, where q
t
ij
(u) is the probability that the state transits
from x
i
to x
j
if the control is u at time t.
Finite-horizon case: Let V
t
i
= V (x
i
, t), i = 1, . . . , n, t = 0, . . . , T +1. The
Bellman equation is
V
t
i
= max
uD(x
i
,t)
(x
i
, u, t) +
n

j=1
q
t
ij
(u)V
t+1
j
with terminal condition V
T+1
i
= W(x
i
).
Recursive system of nonlinear equations. Solve backwards from t = T +1
to t = 0 for V
t
i
, i = 1, . . . , n.
Innite-horizon case: Let V
i
= V (x
i
), i = 1, . . . , n. The Bellman equation
is
V
i
= max
uD(x
i
)
(x
i
, u) +
n

j=1
q
ij
(u)V
j
.
System of nonlinear equations. The contraction mapping theorem en-
sures existence and uniqueness of a solution.
Finite-State Problems: Value Function Iteration
Dene the operator T pointwise by
(TV )
i
= max
uD(x
i
)
(x
i
, u) +
n

j=1
q
ij
(u)V
j
, i = 1, . . . , n.
Value function iteration:
Initialization: Choose initial guess V
0
and stopping criterion .
Step 1: Compute V
l+1
= TV
l
.
Step 2: If ||V
l+1
V
l
|| < , stop; otherwise, go to step 1.
The sequence

V
l

l=0
converges linearly at rate to V

and ||V
l+1
V

||
||V
l
V

||. Hence,
||V
l
V

||
||V
l+1
V
l
||
1
.
To ensure ||V
l+1
V

|| < , stop if ||V


l+1
V
l
|| (1 ).
Maximization is costliest. Exploit special structure of objective (e.g.,
concavity, monotonicity) whenever possible.
Finite-State Problems: Policy Function Iteration
Dene the operator U pointwise by
(UV )
i
arg max
uD(x
i
)
(x
i
, u) +
n

j=1
q
ij
(u)V
j
, i = 1, . . . , n.
Let U
i
= U(x
i
), i = 1, . . . , n, Q
U
= (q
ij
(U
i
))
i,j
,
U
= ((x
i
, U
i
))
i
. Then
the value V
U
of following policy U forever satises the system of linear
equations
V
U
=
U
+Q
U
V
U
V
U
=

I Q
U

U
.
Policy function iteration (a.k.a. Howard improvement):
Initialization: Choose initial guess V
0
and stopping criterion . (Or:
Choose U
0
instead of V
0
and go to step 2.)
Step 1: Compute U
l+1
= UV
l
.
Step 2: Solve

I Q
U
l+1

V
l+1
=
U
l+1
to obtain V
l+1
.
Step 3: If ||V
l+1
V
l
|| < , stop; otherwise, go to step 1.
Step 2 computes the value of following policy U
l+1
forever.
Finite-State Problems: Policy Function Iteration
Modied policy iteration with k steps: Replace step 2 by
Step 2a: Set W
0
= V
l
.
Step 2b: Compute W
j+1
=
U
l+1
+Q
U
l+1
W
j
, j = 0, . . . , k.
Step 2c: Set V
l+1
= W
k+1
.
Step 2 computes the value of following policy U
l+1
for k +1 periods.
The sequence

V
l

l=0
converges linearly to V

and
||V
l+1
V

|| min

,
(1
k
)
1
||U
l
U

|| +
k+1

||V
l
V

||.
Rate approaches
k+1
as U
l
approaches U

accelerated convergence.
Finite-State Problems: Gaussian Acceleration Methods
Idea: The Bellman equation is a system of nonlinear equations. Treat it
as such!
Pre-Gauss-Jacobi iteration (a.k.a. value function iteration):
V
l+1
i
= max
uD(x
i
)
(x
i
, u) +
n

j=1
q
ij
(u)V
l
j
, i = 1, . . . , n.
Gauss-Jacobi iteration:
V
l+1
i
= max
uD(x
i
)
(x
i
, u) +

j=i
q
ij
(u)V
l
j
1 q
ii
(u)
, i = 1, . . . , n.
Pre-Gauss-Seidel iteration:
V
l+1
i
= max
uD(x
i
)
(x
i
, u) +
n

j<i
q
ij
(u)V
l+1
j
+
n

ji
q
ij
(u)V
l
j
, i = 1, . . . , n.
Gauss-Seidel iteration:
V
l+1
i
= max
uD(x
i
)
(x
i
, u) +

j<i
q
ij
(u)V
l+1
j
+

j>i
q
ij
(u)V
l
j
1 q
ii
(u)
, i = 1, . . . , n.
Finite-State Problems: Gaussian Acceleration Methods
Idea: The Gauss-Seidel methods depend on the ordering of states. Ex-
ploit it!
Downwind (solid) and upwind (dashed) directions. Source: Judd, K. (1998), Figure
12.1.
Upwind Gauss-Seidel: In iteration l, rst order the state space such that
q
i,i+1
(U
l
i
) q
i+1,i
(U
l
i+1
), i = 1, . . . , n. Then traverse the states space in
decreasing order.
Simulated upwind Gauss-Seidel: In iteration l, rst simulate the Markov
process under U
l
. Then traverse the simulated states in decreasing order.
Alternating sweep Gauss-Seidel: In iteration l, traverse the state space
in increasing (decreasing) order if l is odd (even).
Continuous-State Problems: Discretization
Specify a nite-state problem that is similar to the continuous-state prob-
lem under consideration.
Example: Optimal growth. The Bellman equation is
V (k) = max
c[0,k+f(k)]
u(c) +V (k +f(k) c).
Replace the set of states [0, ) by K = {k
1
, k
2
, . . . , k
n
}. Choose K large
enough so that the initial and the steady state are contained in it.
To ensure landing on a point in K, take the control to be next periods
state and rewrite the Bellman equation as
V (k) = max
k
+
K
u(k +f(k) k
+
) +V (k
+
).
Remarks:
Easy and robust.
Sometimes requires reformulating the problem and/or altering the set
of states and controls.
Requires a large number of points, particularly if the state space is
multidimensional.
Inecient approximation to smooth problems.
Continuous-State Problems: Parametric Approximation Methods
Approximate the value function using the family of functions

V (x; a)
and use methods for nite-state problems to choose the parameters a.
Parametric dynamic programming with value function iteration:
Initialization: Choose a functional form for

V (x; a), where a R
m
,
and a set of points X = {x
i
}
n
i=1
, where n m. Choose initial guess
a
0
and stopping criterion .
Step 1 (maximization step): Compute
v
i
= max
uD(x
i
)
(x
i
, u) +


V (x
+
; a)dF(x
+
, x
i
, u), i = 1, . . . , n.
Step 2 (tting step): Compute a
l+1
such that

V (x; a
l+1
) approximates
the Lagrange data {(x
i
, v
i
)}
n
i=1
.
Step 3: If ||

V (x; a
l+1
)

V (x; a
l
)|| < , stop; otherwise, go to step 1.
Three interconnected components:
Numerical integration.
Maximization.
Function approximation (CompEcon toolbox: help cetools).
Continuous-State Problems: Parametric Approximation Methods
The computable approximation

T to the contraction mapping T may be
neither contractive nor monotonic.
Shape-preserving methods.
Dynamic programming and the shape of the value function. Source: Judd, K. (1998),
Figure 12.2.
Linear spline (C
0
).
Schumaker shape-preserving spline (C
1
; use envelope theorem to ob-
tain Hermite data {(x
i
, v
i
, v

}
n
i=1
).
Bilinear and simplicial interpolation (C
0
).
Could use policy function iteration instead of value function iteration,
but Gauss-Seidel methods are harder to adapt.
Continuous-State Problems: Projection Methods
The Bellman equation is a functional equation.
Approximate the value function using the family of functions

V (x; a)
and choose the parameters a such that

V (x; a) almost satises the
Bellman equation.
The residual function is dened pointwise by
R(x; a) =

V (x; a) + max
uD(x)
(x, u) +


V (x
+
; a)dF(x
+
, x, u).
Special case: Suppose the FOC ensures optimality. Then the value
function V (x) and the optimal policy function U(x) satisfy
V (x) = (x, U(x)) +

V (x
+
)dF(x
+
, x, U(x)),
0 =
u
(x, U(x)) +

V (x
+
)dF
u
(x
+
, x, U(x)).
Approximate the value function using the family of functions

V (x; a) and
the optimal policy function using

U(x; b).
Continuous-State Problems: Projection Methods
The residual function is dened pointwise by
R(x; a, b) =

R
1
(x; a, b)
R
2
(x; a, b)

V (x; a) +(x,

U(x; b) +


V (x
+
; a)dF(x
+
, x,

U(x; b))

u
(x,

U(x; b)) +


V (x
+
; a)dF
u
(x
+
, x,

U(x; b))

.
Even more special case: Suppose the FOC ensures optimality and can
be solved in closed form for U(x). Then the value function V (x) satises
V (x) = (x, U(x)) +

V (x
+
)dF(x
+
, x, U(x)).
Approximate the value function using the family of functions

V (x; a).
The residual function is dened pointwise by
R(x; a) =

V (x; a) +(x, U(x)) +


V (x
+
; a)dF(x
+
, x, U(x)).
Projection methods are natural for continuous-time problems.

You might also like