Lecture Notes: Numerical Methods For Function Approximation: 1 Functionals

Lecture Notes:
Numerical methods for function approximation

Derek Lemoine
Econ 696V
University of Arizona
First version: 2012
Last updated: February 9, 2015
1 Functionals
An infinite-horizon dynamic programming equation
V (St ) = max u(wt ) + βV (St+1 (wt ))

wt
is an example of a functional fixed-point problem. If we know V , then the maximization

problem is usually relatively straightforward. We can start from any initial conditions S0
and simulate the optimal policy path or, with stochasticity, any number of policy paths
corresponding to random draws. The challenge lies in determining the function V . This is
not just an unknown vector but an entire function. If T is an operator that maps functions
V into themselves, we in essence want to find the function V such that
V =TV .
This type of functional equation occurs in many contexts in economics, including strategic
settings, differential equations, and dynamic optimization. After describing some other op-
tions, these notes then outline how we might solve the problem using the collocation method.
We focus on infinite-horizon continuous-state discrete-time dynamic programming, and we
remain agnostic about whether the problem is stochastic.1
1
These notes draw on Miranda and Fackler (2002), who also describe methods for continuous time prob-
lems and for discrete state spaces.
1
Notes: Function Approximation Lemoine
1.1 Approximate as a linear-quadratic problem

We have already discussed how linear-quadratic settings are amenable to analytic solutions.
One attractive option therefore might be to find a certainty-equivalent steady state and
approximate the objective around it by a second-order Taylor expansion while approximating
the transition equations by a first-order Taylor expansion. However, this approach is less
than ideal for several reasons. First, these second- and first-order approximations may
not adequately represent the objective and transition functions over their entire domains.
Second, in a model with stochasticity, random shocks can take the state variable beyond the
domain of adequate approximation. Third, linear-quadratic approximation has to ignore any
possible constraints, which worsens the approximation if the constraints do in fact become
relevant. Fourth, in many interesting applications (e.g., climate change), the initial condition
may not be close to the steady state or there may not even be a steady state. These cases
frustrate methods that use local approximations around a steady state.
1.2 Impose a finite horizon

Another approach is to turn the model into a finite horizon problem and use standard
nonlinear optimization methods or even backward recursion.2 This requires specifying some
terminal period with fixed utility thereafter, and constructing the model so that this terminal
period does not seem to be substantially influencing the problem in earlier periods. However,
the finite horizon approach can create problems for models that do not inherently have such
horizons. First, stochastic models without certainty equivalence would require tracing out
the full tree of possible outcomes in advance so as to solve for all these nodes at once (or,
comparably, by backwards recursion). Second, the result is often a single optimal policy path
rather than a function subject to marginal analysis or enabling rapid simulation of multiple
paths.3 Finally, it can be hard to make the relevant policy paths fully independent of the
terminal condition.
1.3 Collocation
The rest of these notes describe various components of collocation methods for value function
iteration.4 These assume that you can bound the state space. Even if a state variable has
no natural bounds, there still may be a region which optimal policy may not escape upon
entering it. If the interesting initial conditions are inside of this region, then the bounded
problem does not lose anything. If the variable does not remain inside of a well-defined
region, then there might be means of converting it to, for instance, the [0, 1] interval.
2
See Kelly and Kolstad (2001) for a discussion and critique.
3
However, you could approximate the value function in each step of the backward recursion.
4
Related approaches iterate over the policy function or over the Euler equation.
2
The basic idea behind collocation methods is to approximate the value function as a linear
combination of basis functions. In each iteration, we begin with a vector of coefficients for
these basis functions (whether as an initial guess or from the previous iteration) and use this
value function approximant on the right-hand side of the functional equation. We then solve
the right-hand side at a set of pre-specified, carefully chosen nodes. The resulting values
generate a new approximant. We continue iterating until the coefficients cease to change by
more than a predefined tolerance. Finally, we examine our results to gain confidence that
we have a reasonable solution and not a numerical artifact.
The crucial choices in implementing this strategy are the family of basis functions and
the scheme for selecting the collocation nodes. While the optimal scheme depends on the
problem at hand and the curvature of the value function, Chebychev polynomials and nodes
work well in many economic applications. We solve the functional equation exactly at each
node and use the basis functions to interpolate between the nodes in the next round of
maximizations.
Collocation gives us a means of numerically solving complex dynamic programming prob-
lems so that we can include learning and uncertainty, do many simulations, get a complete
description of the value and policy functions, and maintain an infinite horizon.
2 Interpolation
The goal in interpolation is to approximate a function based on knowledge of its behavior or
value at a few points. The three main choices are the family of functions that we will use in
the approximation, the nodes to which we will calibrate the approximant, and the function’s
properties that we will match at those nodes. We want the approximant to be capable of
capturing the main features of the unknown function, we want to be able to calculate it
quickly, and we want to be able to manipulate it in many interesting ways once we have it.
2.1 Basis functions

Let f be the function we wish to approximate with some fˆ. We construct fˆ as a linear
combination of n linearly independent basis functions:5
n
X
fˆ(x) = cj φj (x) .
j=1
Each φj (x) is a basis function, and the coefficients cj determine how they are combined to
give the approximant fˆ. The number n of these basis functions is the degree of interpolation.
5
These are called projection methods because we are projecting f into the space formed by linear combi-
nations of the basis functions.
3
We will use facts about f at each node in order to estimate the coefficients. If we have n facts
(e.g., n nodes with the value of f at each node), then only one combination of coefficients
solves the relation. If we have more than n facts, then we can find the best combination of
coefficients by, for instance, using least squares to minimize squared deviations.
Spectral methods use basis functions that are nonzero over the entire domain of f . These
basis functions are commonly polynomials. The simplest basis, the monomial basis, is just
the power functions 1, x, x2 , x3 , .... However, this basis is not computationally convenient.
A better basis, which can be solved more accurately and more efficiently, uses Chebychev
polynomials.6 These can be solved even for interpolations of very high degree. Chebychev
polynomials are excellent at capturing smooth functions, but kinks or regions of very high
curvature can throw them off. These problems can manifest themselves as waviness in parts
of the approximant.
Finite element methods use basis functions that are nonzero only over subintervals of
the domain of f . For instance, (piecewise polynomial) splines are segments that are spliced
together at prespecified breakpoints. The higher the order of the polynomial, the higher the
derivatives that preserve continuity. A first-order (or linear) spline produces an approximant
that is a continuous function. However, its first derivatives are discontinuous step functions
(with the steps at the breakpoints) and all higher derivatives are zero almost everywhere. In
many economic applications, the derivative itself is of interest and might even be needed in
doing optimization inside of, for instance, a dynamic programming equation. A third-order
(or cubic) spline forms an approximant with differentiable first and second derivatives. It
can do better than Chebychev polynomials at capturing high curvature when its breakpoints
are concentrated in that region, and it can even capture a kink if its breakpoints are stacked
at the kink in order to allow discontinuous derivatives.
2.2 Interpolation nodes

The approximant will be constructed by evaluating the function or its derivatives at prede-
fined interpolation nodes. These nodes are specific values in the domain of f . For instance,
if we want an n degree interpolation to match the value of the function, then we need at
least n nodes. With exactly n nodes, we have
n
X
cj φj (xi ) = f (xi ) ∀i = 1, 2, ..., n . (interpolation conditions)
j=1
Note that we could instead have more nodes than basis functions (and so coefficients), as
discussed above. We can write this problem in matrix notation as
Φc = y , (interpolation equation)
6
See Judd (1998) for more on the choice of basis functions.
4
where y is the column vector of f (xi ), c is the column vector of coefficients, and Φ is the
interpolation matrix formed from the basis functions (columns) evaluated at the interpolation
nodes (rows). This form should look familiar from Ordinary Least Squares applications.
You might think that the best way to select the interpolation nodes is to evenly space them
throughout the state space (i.e., the domain of fˆ). However, this is generally suboptimal.
Evenly spaced nodes often do not produce an accurate approximant and can even lead to
worse approximations as the degree of interpolation increases. Intuitively, if the function f
is sufficiently smooth, then we might want to focus on pinning it down at its edges: pinning
it down only in the middle may lead to strange behavior at the edges where there are few
nodes, but pinning it down well along the edges should still stretch the approximant properly
over the middle of the domain.
In fact, it can be shown that a particular set of nodes is very nearly optimal for poly-
nomial approximants. These nodes, called Chebychev nodes, do in fact tend to cluster near
the end points of the interpolation interval and place fewer nodes in the middle. Further,
the approximation error goes to 0 as we increase the number of nodes and the degree of
interpolation. Chebychev nodes are not to be confused with Chebychev polynomials, as the
former could be used with other spectral methods and the latter could be used with evenly
spaced nodes. However, the two work well in combination to approximate a wide array of
smooth functions with significant computational convenience.
2.3 Properties to match

I wrote the interpolation conditions above under the assumption that we wanted to match
the value of the function. We could instead want to match derivatives at the nodes, or maybe
the value and also some derivatives. These would provide additional interpolation conditions,
and we merely need at least as many total conditions as we have basis functions. The ability
to match derivatives could come in handy when trying to approximate the solution to a
differential equation.
2.4 Curse of dimensionality

These same methods apply to multidimensional problems. For instance, the set of basis
functions becomes the tensor product of each dimension’s basis functions. However, note that
having n basis functions in each of k dimensions implies nk total coefficients and, as always,
at least that many interpolation conditions. Not only do the additional coefficients require
more storage and computational effort, but if we are iterating over a Bellman equation in the
collocation method, then at each iteration we now have to solve the dynamic programming
equation at each of these many nodes. That optimization can be quite time-consuming,
even when it can be parallelized. Additional dimensions therefore often require us to use
fewer degrees of interpolation in each dimension, find clever ways of reducing computational
5
time and effort, and/or figure out which dimensions call for more coefficients and nodes (for
instance, because of greater curvature of the value function in that dimension).7
3 Collocation
The collocation method combines interpolation with optimization to iterate over the un-
known value function in the Bellman equation. Assume that we have a well-defined Bellman
equation and can find a bounded region of the state space over which we will solve it. An
outline of the collocation method is as follows.
1. Choose the family of basis functions for the value function approximant. A common
choice is Chebychev polynomials. Also choose the order of approximation (i.e., of
interpolation).
2. Choose the collocation nodes, which are just the interpolation nodes described above.
A common choice is Chebychev nodes.
3. Select an initial guess for the value function by assigning values to the basis function
coefficients. Zero is a common starting guess, but you might have information from
previous experiments that can help you.
4. Iterate until convergence, typically defined as the basis functions’ coefficients changing
by less than a predefined amount (the tolerance):
(a) Solve the right-hand side of the Bellman equation, using the approximant in place
of the unknown continuation value.8 This step results in a value of the maximized
right-hand side at each collocation node, conditional on the approximant.
(b) Approximate the value function using the vector of values calculated in the previ-
ous step. If we have as many nodes as basis functions, the new approximant will
match the values exactly at the collocation nodes and will interpolate in between
them.
(c) Update the best guess for the value function with the new approximant.
5. Verify your final approximation: Check the value function, policy rules, and residuals
on a finer grid of nodes.9
7
Krueger and Kubler (2004) and Malin et al. (2011) describe sparse (Smolyak) grids that can speed up
computation in high-dimensional problems.
8
Stochasticity would affect us here, as we would use the approximant to calculate an expected value in
the course of the solution.
9
If you used the same grid for checking residuals as you used in forming the approximant, the residuals
would be zero by definition if you have as many nodes as coefficients (and really small otherwise). We are
really interested in how the value function might still be changing at points in the state space in between
our previously chosen nodes.
6
Miranda and Fackler (2002) have developed a set of Matlab tools meant to help out with
some of the more routine computations often undertaken in economic applications.10 Here
is an outline of the steps one might undertake in implementing the collocation method in
Matlab:
1. Code an optimization routine that solves the right-hand side of the Bellman equa-
tion.11 You may eventually call it using knitro or one of the optimization tools (such
as “fmincon”) native to Matlab.12
2. Define the domain of the state variables over which the approximation will be under-
taken.
3. Call “fundefn” to define a function space for approximation.
4. Call “funnode” to define the collocation nodes. Can then use “gridmake” to turn the
resulting cell array into a matrix.
5. Define an initial guess, which may have each basis coefficient equaling 0.
6. Loop until convergence:
(a) Step through the collocation nodes one by one, optimizing at each step: At the
values of the state variables represented by a given node, perform the optimization
using the most recent value function approximant. At the end of this inner loop,
we have each collocation node’s optimal value given the most recent approximant.
(b) Use “funfitxy” to obtain the basis function coefficients given the results of the
optimization routine. You will pass funfitxy the function space from fundefn, the
nodes, and the maximized values.
(c) Stop when the coefficients change by less than the predefined tolerance (where
the relevant change may be defined using the sup norm), when the coefficients
change by some predefined large amount that indicates a problem, or when some
predefined maximum number of iterations is reached.
7. In simulating some policy path or testing the final approximation, “funeval” will eval-
uate the approximant or its derivatives at any desired node (i.e., at any combination of
state variables). You will pass it the coefficients, the function space, the state vectors
of interest, and a vector indicating whether you want the value of the approximant or
its derivatives.
10
These are freely available as the compecon toolbox at http://www4.ncsu.edu/~pfackler/compecon/
toolbox.html.
11
With stochasticity, we might have to discretize an integral representing an expected value and loop over
that discretization.
12
A free, finite-horizon academic license is available for knitro: http://www.ziena.com/knitro.htm.
7
3.1 Some tricks and tips to be aware of

• You may want to step up the degree of approximation/interpolation over many runs so
as to solve “smaller”, faster problems first and use those solutions as starting guesses
for larger ones. A “multigrid” approach is one example.
• You may want to iterate over the approximant several times using the same set of
“optimal” controls if computing these is costly and they do not change by much on
each iteration. We have been describing “value function iteration,” but this technique
of using the same policy function is a modified form of “policy function iteration.”
Convergence in value functions can be slow; using the information in the policy function
can speed it up. Imagine you are solving for the value function after having run k
periods with your policy function.
• Check whether the maximized values jump out of the state space’s bounds.13 Such
jumps could pose convergence problems because the optimization routine is looking at
points beyond the fitted domain of the approximant.
• Keep the optimization routine as light as possible, as it may be called thousands of

times.
• Figure out which dimensions contain most of the curvature.
• Work towards a narrower state space so as to better approximate the region you care
about.
• If you know the curvature of the value function (e.g., might know it’s concave), you can
use “shape-preserving” methods to ensure that no approximant violates this knowledge
and potentially takes you in the wrong direction.
• In a multidimensional problem, you can reduce the number of basis functions (and
nodes) by using a basis of complete polynomials. Intuitively, you drop the highest-
order interactions from the tensor product to obtain a smaller set of basis functions
that performs as well asymptotically.
• You can use time as a state variable to keep track of all exogenously-evolving variables.
To do this, map the infinite time interval [0, ∞) into the [0, 1) interval. For instance,
you can define a new state variable τ as a function of time t, where τ (t) = 1 − e−zt
for some z > 0. Note that τ (0) = 0, limt→∞ τ (t) = 1, and τ 0 (t) = z e−zt > 0. The
13
A random variable with full support can pose problems here because some possible shock could always
jump you out. This will often occur even once the distribution has been discretized for numerical use.
However, jumping out with less-than-likely shocks may not cause problems for convergence because their
effects on total value will fall out once expectations are taken.
8
parameter z controls how t ≥ 0 is stretched over the [0, 1] interval, which determines
where the “artificial time” (τ ) nodes are placed in units of “real time” (t).
• There are alternate types of node schemes and basis functions which can be powerful
in certain applications. The Smolyak scheme is one example.
• Maliar and Maliar (2014) provide a review which may spark further ideas.
References
Judd, Kenneth L. (1998) Numerical Methods in Economics, Cambridge, Mass.: MIT Press.
Kelly, David L. and Charles D. Kolstad (2001) “Solving infinite horizon growth models with
an environmental sector,” Computational Economics, Vol. 18, No. 2, pp. 217–231, DOI:
10.1023/A:1021018417052.
Krueger, Dirk and Felix Kubler (2004) “Computing equilibrium in OLG models with stochas-
tic production,” Journal of Economic Dynamics and Control, Vol. 28, No. 7, pp. 1411–
1436, DOI: 10.1016/S0165-1889(03)00111-8.
Maliar, Lilia and Serguei Maliar (2014) “Numerical methods for large-scale dynamic eco-
nomic models,” in Karl Schmedders and Kenneth L. Judd eds. Handbook of Computational
Economics, Vol. 3: Elsevier, pp. 325–477, doi: 10.1016/B978-0-444-52980-0.00007-4.
Malin, Benjamin A., Dirk Krueger, and Felix Kubler (2011) “Solving the multi-country real
business cycle model using a Smolyak-collocation method,” Journal of Economic Dynamics
and Control, Vol. 35, No. 2, pp. 229–239, DOI: 10.1016/j.jedc.2010.09.015.
Miranda, Mario J. and Paul L. Fackler (2002) Applied Computational Economics and Fi-
nance, Cambridge, Massachusetts: MIT Press.

Lecture Notes: Numerical Methods For Function Approximation: 1 Functionals

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture Notes: Numerical Methods For Function Approximation: 1 Functionals

Uploaded by

Copyright:

Available Formats

Lecture Notes:

Numerical methods for function approximation

V (St ) = max u(wt ) + βV (St+1 (wt ))

is an example of a functional fixed-point problem. If we know V , then the maximization

1.1 Approximate as a linear-quadratic problem

1.2 Impose a finite horizon

2.1 Basis functions

2.2 Interpolation nodes

2.3 Properties to match

2.4 Curse of dimensionality

3. Call “fundefn” to define a function space for approximation.

6. Loop until convergence:

3.1 Some tricks and tips to be aware of

• Keep the optimization routine as light as possible, as it may be called thousands of

• Figure out which dimensions contain most of the curvature.

You might also like