You are on page 1of 43

Matlab Optimization and Integration

Paul Schrimpf

January 14, 2009

Paul Schrimpf ()

Matlab Optimization and Integration

January 14, 2009

1 / 43

This lecture focuses on two ubiquitous numerical techiniques:


1 Optimization and equation solving
I
I
2

Agents maximize utility / profits


Estimation

Integration
I

Expectations of the future or over unobserved variables

Paul Schrimpf ()

Matlab Optimization and Integration

January 14, 2009

2 / 43

Optimization

Want to solve a minimization problem:


min f (x)
x

Two basic approaches:


1
2

Heuristic methods search over x in some systematic way


Model based approaches use an easily minimized approximation to f ()
to guide their search

First, x < for intuition


Then, x <n

Paul Schrimpf ()

Matlab Optimization and Integration

January 14, 2009

3 / 43

Section Search

Form bracket x1 < x2 < x3 such that f (x2 ) < f (x1 ) and f (x3 )
Try new x (x1 , x3 ), update bracket

Paul Schrimpf ()

Matlab Optimization and Integration

January 14, 2009

4 / 43

Quadratic Interpolation

More general interpolation methods possible


e.g. Matlab uses both quadratic and cubic interpolation for line search

Paul Schrimpf ()

Matlab Optimization and Integration

January 14, 2009

5 / 43

Newton-Rhapson and Quasi-Newton

Newton: use f 0 (x) and f 00 (x) to construct parabola


xn+1 = xn

f 0 (xn )
f 00 (xn )
0

(xn1 )
Quasi-Newton: approximate f 00 (xn ) with f (xxnn)f
xn1
P
00 (x) with 1
BHHH:
approximate
fMatlab
fi 0 (x)fi 0 (x)0 January 14, 2009
Paul Schrimpf
()
Optimization
Integration
N and i=1

6 / 43

Rates of Convergence
Newtons method converges quadraticly, i.e. in a neighborhood of the
solution,
||xn+1 x ||
=C
n ||xn x ||2
lim

Parabolic interpolation and quasi-Newton methods also achieve better


than linear rates of convergence, but (usually) less than quadratic, i.e.
||xn+1 x ||
=C
n ||xn x ||r
lim

for some r (1, 2]


Can achieve faster than quadratic convergence by using more
information
Usually, happy with any rate better than 1
Paul Schrimpf ()

Matlab Optimization and Integration

January 14, 2009

7 / 43

Trust Regions

Problem: For a function that is not globally concave, quadratic


interpolation and Newton methods might prescribe an upward step
and can fail to converge
Solution: Combine them with a sectional search, or more generally, a
trust region
I

Region, R where we trust our quadratic approximation, f (x) f(x)


xn+1 = arg min fn (x)
xR

Shrink or expand R based on how much better f (xn+1 ) is than f (xn )

Brents method combines quadratic interpolation with sectional search

Paul Schrimpf ()

Matlab Optimization and Integration

January 14, 2009

8 / 43

Matlab Implementation
fminbnd()

uses Brents method

No uni-dimensional implementation of any type of Newton method,


but could use multi-dimensional versions
We used fminbnd() in lecture 1:
1

optimset('fminbnd') % this returns the default options for

2
3
4
5
6
7
8

% change some options


opt = optimset('Display','iter', ... % 'off','final','notif
'MaxFunEvals',1000,'MaxIter',1000, ...
'TolX',1e8);
% maximize the expected value
[c ev] = fminbnd(@(c) obj(c,x,parm), cLo, cHi,opt);

For all optimization functions, can also set options through a


graphical interface by using optimtool()
Paul Schrimpf ()

Matlab Optimization and Integration

January 14, 2009

9 / 43

Multi-Dimensional Optimization with Quadratic


Approximation

Basic idea is the same:


I
I

Construct a quadratic approximation to the objective


Minimize approximation over some region

Complicated by difficulty of constructing region


I

Cannot bracket a minimum in R n , would need an n 1 dimensional


manifold
Two approaches:
F
F

Use n dimensional trust region


Break problem into sequence of smaller-dimensional minimizations

Paul Schrimpf ()

Matlab Optimization and Integration

January 14, 2009

10 / 43

Directional Search Methods

Choose a direction
I

Nave approach: use basis or steepest descent directions


F

F
2

Generally fastest method

Do univariate minimization along that direction, this step is called a


line search
I
I

Very inefficient in worse case

Try new directions, keep good ones: Powells method or conjugate


gradients
Use Newton or quasi-Newton direction

Exact: find the minimum along the direction


Approximate: just find a point that is enough of an improvement

Choose a different direction and repeat

Paul Schrimpf ()

Matlab Optimization and Integration

January 14, 2009

11 / 43

Trust Region Methods

Same as one dimension:


I

Region, R where we trust our quadratic approximation, f (x) f(x)


xn+1 = arg min fn (x)
xR

Shrink or expand R based on how much better f (xn+1 ) is than f (xn )

Hybrid method: combine directional seach and a trust region


1

2
3

Use one of approaches from previous slide to choose a m < n


dimension subspace
Use trust-region method to minimize f within the subspace
Choose new subspace and repeat

Paul Schrimpf ()

Matlab Optimization and Integration

January 14, 2009

12 / 43

Matlab Implementation
fminunc() offers two algorithms
1 optimset('LargeScale','off') quasi-Newton with

approximate line-search
Needs gradient, will compute finite difference approximation if gradient
not supplied
F Hessian approximation underdetermined by f 0 (xn ) and f 0 (xn1 ) in
dimension > 1
F Build Hessian approximation with recurrence relation:
optimset('HessUpdate','bfgs') (usually better) or
optimset('HessUpdate','dfp')
optimset('LargeScale','on') hybrid method: 2-d trust region
F

with conjugate gradients


F

Needs the hessian, will compute finite difference approximation if


hessian not supplied
Can exploit sparisty pattern of Hessian
optimset('HessPattern',sparse(kron(eye(K),ones(J))))

Generally, these algorithms perform much better with user-supplied


derivatives thatn with finite-difference approximations to derivatives
I
I

optimset('GradObj','on','DerivativeCheck','on')
optimset('Hessian','on')

Paul Schrimpf ()

Matlab Optimization and Integration

January 14, 2009

13 / 43

Set Based Methods


Idea:
1
2

Evaluate function on a set of points


Use current points and function values to generate candidate new
points
Replace points in the set with new points that have lower function
values
Repeat until set collapses to a single poitn

Examples: grid search, Nelder-Mead simplex, pattern search, genetic


algorithms, simulated annealing
Pros: simple, robust
Cons: inefficient an interpolation based method will usually do
better

Paul Schrimpf ()

Matlab Optimization and Integration

January 14, 2009

14 / 43

Nelder-Mead Simplex Algorithm

uses it
N + 1 points in N dimensions form a polyhedron, move the
polyhedron by
fminsearch()

1
2
Animation

Reflect worst point across the center, expand if theres an improvement


Shrink, e.g. toward best point, other variations possible
Another Animation

Paul Schrimpf ()

Matlab Optimization and Integration

January 14, 2009

15 / 43

Pattern Search

Uses set of N + 1 or more directions, {dk }


Each iteration:
I
I
I

Evaluate f (xi + dk )
If f (xi + dk ) < f (xi ), set xi+1 = xi + dk , increase
If mink f (xi + dk ) > f (xi ), set xi+1 = xi , decrease

In Matlab, [x fval] = patternsearch(@f,x)


I

Requires Genetic Algorithm and Direct Search Toolbox

Paul Schrimpf ()

Matlab Optimization and Integration

January 14, 2009

16 / 43

Genetic Algorithm
Can find global optimum (but I do not know whether this has been
formally proven)
Begin with random population of points, {xn0 }, then
1
2
3

Compute fitness of each point f (xni )


Select more fit points as parents
Produce children by mutation xni+1 = xni + , crossover
xni+1 = xni + (1 )xmi , and elites, xni+1 = xni
Repeat until have not improved function for many iterations

In Matlab, [x fval] = ga(@f,nvars,options)


I
I

Requires Genetic Algorithm and Direct Search Toolbox


Many variations and options
F
F

Options can affect whether converge to local or global optimum


Read the documentation and/or use optimtool

Paul Schrimpf ()

Matlab Optimization and Integration

January 14, 2009

17 / 43

Simulated Annealing and Threshold Acceptance


Can find global optimum, and under certain conditions, which are
difficult to check, finds the global optimum with probability 1
Algorithm:
1
2

Random candidate point x = xi + 


Accept x i+1 = x if f (x) < f (xi ) or
F
F

simulated annealing: with probability


threshold: if f (x) < f (xi ) + T

1
1+e (f (x)f (xi ))/

Lower the temperature, , and threshold, T

In Matlab, [x,fval] = simulannealbnd(@objfun,x0)


[x,fval] = threshacceptbnd(@objfun,x0)
I
I

and

Requires Genetic Algorithm and Direct Search Toolbox


Many variations and options
F
F

Options can affect whether converge to local or global optimum


Read the documentation and/or use optimtool

Paul Schrimpf ()

Matlab Optimization and Integration

January 14, 2009

18 / 43

Constrained Optimization
min f (x)

(1)

s.t.

(2)

g (x) = 0

(3)

Quasi-Newton with directional search sequential quadratic


programming
I

Choose direction of search both to minimize function and relax


constraints

fmincon()
I
I

'LargeScale','off' does sequential quadratic programming


'LargeScale','on' only works when constraints are simple bounds

on x, it is the same as large-scale fminunc

Matlabs pattern search and genetic algorithm work for constrained


problems
Paul Schrimpf ()

Matlab Optimization and Integration

January 14, 2009

19 / 43

Solving Systems of Nonlinear Equations


Very similar to optimization
F (x) = 0 min F (x)0 F (x)
x

Largescale fsolve = Largescale fmincon applied to least-squares


problem
Mediumscale fsolve = Mediumscale lsqnonlin
I

Gauss-Newton: replace F by first order expansion


xn+1 xn = (J 0 (xn )J(xn ))1 J 0 (xn )F (xn )

Levenberg-Marquardt: add dampening to Gauss-Newton to improve


performance when first order approximation is bad
xn+1 xn = (J 0 (xn )J(xn ) + n I )1 J 0 (xn )F (xn )

Paul Schrimpf ()

Matlab Optimization and Integration

January 14, 2009

20 / 43

Derivatives
The best minimization algorithms require derivatives
Can use finite difference approximation
I
I
I

In theory: still get better than linear rate of convergence


In practice: can be inaccurate
Takes n function evaluations, user written gradient typically takes 2-5
times the work of a function evaluation

Easier analytic derivatives:


I

Use symbolic math program to get formulas e.g. Matlab Symbolic


Math Toolbox / Maple, Mathematica, Maxima
Use automatic differentiation
F

In Matlab INTLAB, ADMAT, MAD, ADiMat, or a version that we


will create in the next lecture
Switch to a language with native automatic differentiation AMPL,
GAMS

Paul Schrimpf ()

Matlab Optimization and Integration

January 14, 2009

21 / 43

Simple MLE Example: Binary Choice

1
2
3
4
5
6
7
8
9
10
11
12
13

% Script for estimating a binary choice model


% Paul Schrimpf, May 27, 2007
clear;
% set the parameters
data.N = 1000; % number of observations
data.nX = 2;
% number of x's
parm.beta = ones(data.nX,1);
% note use of function handles for distribution
% estimation assumes that the distribution is known
parm.dist.rand = @(m,n) random('norm',1,0,m,n);
parm.dist.pdf = @(x) pdf('norm',x,0,1);
parm.dist.dpdf = @(x) pdf('norm',x,0,1).*x; % derivative of pdf
parm.dist.cdf = @(x) cdf('norm',x,0,1);

Paul Schrimpf ()

Matlab Optimization and Integration

January 14, 2009

22 / 43

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

% create some data


data = simulateBinChoice(data,parm);
% set optimization options
opt = optimset('LargeScale','off', ...
'HessUpdate','bfgs', ...
'GradObj','on', ...
'DerivativeCheck','on', ...
'Display','iter', ...
'OutputFcn',@binChoicePlot);
b0 = zeros(data.nX,1);
[parm.beta like] = fminunc(@(b) binChoiceLike(b,parm,data), ...
b0,opt);
% display results
fprintf('likelihood = %g\n',like);
for i=1:length(parm.beta)
fprintf('beta(%d) = %g\n',i,parm.beta(i));
end

Paul Schrimpf ()

Matlab Optimization and Integration

January 14, 2009

23 / 43

simulateBinChoice()

1
2
3
4
5

function data=simulateBinChoice(dataIn,parm)
% ... comments and error checking omitted ...
data.x = randn(data.N,data.nX);
data.y = (data.x*parm.beta + epsilon > 0);
end

Paul Schrimpf ()

Matlab Optimization and Integration

January 14, 2009

24 / 43

1
2
3
4
5
6
7
8
9
10
11
12
13

function [like grad hess gi] = binChoiceLike(b,parm,data)


% returns the loglikelihood of 'data' for a binary choice mode
% the model is y = (x*b + eps > 0)
% ... more comments omitted
xb = data.x*b;
% l i will be N by 1, likelihood for each person
l i = parm.dist.cdf(xb);
l i(data.y) = 1l i(data.y);
if any(l i==0)
warning('likelihood = 0 for %d observations\n',sum(l i==0))
l i(l i==0) = REALMIN; % don't take log(0)!
end
like = sum(log(l i));

Paul Schrimpf ()

Matlab Optimization and Integration

January 14, 2009

25 / 43

Gradient for binChoiceLike()

1
2
3
4
5

% gradient of l i
g i = (parm.dist.pdf(xb)*ones(1,length(b))).*data.x;
g i(data.y,:) = g i(data.y,:);
% change to gradient of loglike
grad = sum(g i./(l i *ones(1,length(b))),1)';

Paul Schrimpf ()

Matlab Optimization and Integration

January 14, 2009

26 / 43

Hessian for binChoiceLike()

1
2
3
4
5
6
7
8
9
10
11

% calculate hessian
h i = zeros(length(xb),length(b),length(b));
for i=1:length(xb)
h i(i,:,:) = (parm.dist.dpdf(xb(i))* ...
data.x(i,:)'*data.x(i,:)) ...
... % make hessian of loglike
/ l i(i);
end
h i(data.y,:,:) = h i(data.y,:,:);
% make hessiane of loglike
hess = (sum(h i,1) g i'*(g i./(l i.2*ones(1,length(b)))))

Paul Schrimpf ()

Matlab Optimization and Integration

January 14, 2009

27 / 43

binChoicePlot.m
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

function stop = binChoicePlot(x,optimvalues,state)


if(length(x)==2)
if (optimvalues.iteration==0)
hold off;
end
grad = optimvalues.gradient;
f = optimvalues.fval;
plot3(f,x(1),x(2),'k*');
if (optimvalues.iteration==0)
hold on;
end
quiver3(f,x(1),x(2),0,grad(1),grad(2));
drawnow
pause(0.5);
end
stop = false;
end

Paul Schrimpf ()

Matlab Optimization and Integration

January 14, 2009

28 / 43

Numerical Integration
Want:
Z
F (x) =

f (x)(x)dx
a

Approximate:
F (x)

n
X

f (xi )wi

Different methods are different ways to choose n, xi , and wi


Quadrature: choose wi so the approximation is exact for a set of n
basis elements
Monte Carlo: set wi = n1 , choose xi randomly
Adaptive: refine n, xi , and wi until approximation error is small
Paul Schrimpf ()

Matlab Optimization and Integration

January 14, 2009

29 / 43

Quadrature
Suppose xi , n are given, need to choose wi
Let {ej (x)} be a basis for the space of functions such that
Rb
a f (x)(x)dx <
I

Rb
a

ej (x)(x)dx should be known j

{wi }ni=1

solve
n
X

Z
wi ej (xi ) =

ej (x)(x)dx forj = 1..n

(4)

i=1

Example: Newton-Cotes
I
I
I

basis = polynomials
(x) = 1
Resulting rule is exact for all polynomials of degree less than or equal
to n

Paul Schrimpf ()

Matlab Optimization and Integration

January 14, 2009

30 / 43

Gaussian Quadrature
Now suppose n is given, but we can choose both wi and xi
Same idea, {wi , xi }ni=1 solve
n
X
i=1

ej (x)(x)dx forj = 1..2n 1

wi ej (xi ) =

(5)

Exact for functions in the space spanned by {ej }2n1


j=1
If {ej } is an orthogonal polynomial basis, then {xi }ni=1 will be the
roots of en
Different gaussian quadrature rules correspond to different values of
a, b, and (x)

Paul Schrimpf ()

Matlab Optimization and Integration

January 14, 2009

31 / 43

Common Forms of Gaussian Quadrature

Interval
[1, 1]
(1, 1)
(1, 1)

(x)
1
(1 x) (1 + x)

[1, 1]
[0, )
(, )

Paul Schrimpf ()

1
1x 2

1 x2
e x
2
e x

Name
Legendre
Jacobi
Chebyshev (first kind)
Chebyshev (second kind)
Laguerre
Hermite

Matlab Optimization and Integration

January 14, 2009

32 / 43

Quadrature in Matlab

Not built in
Many implementations available at the

Mathworks file exchange

Great for single dimension integration


Multiple dimension integration is harder
I

Pn

Pn
... im =1 f (xi1 , ..xim )wim ..wim works, but needs mn function
evaluations
More sophisticated methods exist called cubature, sparse grid, or
complete polynomials see e.g. Encyclopedia of Cubature Formulas
i1

Paul Schrimpf ()

Matlab Optimization and Integration

January 14, 2009

33 / 43

Adaptive Integration

f (x)(x)dx

F (x) =
a

Idea: subdivide (a, b) into smaller intervals, use simple quadrature


rule on each interval, and repeat until convergence
e.g. trapezoid rule, Simpsons rule
trapz(), quad(), quadl()

Pro: computes integral to known accuracy


Con: care must be used when part of an objective function
I

Makes the objective function discontinuous at points where solution


goes from k to k + 1 subdivisions integration accuracy must be set
much higher than convergence criteria of optimization

Paul Schrimpf ()

Matlab Optimization and Integration

January 14, 2009

34 / 43

Monte Carlo Integration

Randomly
R draw xi from distribution p(x) (x), set
1
wi = n (x)dx
Many methods for sampling from p(x): inverse cdf, acceptance
sampling, importance sampling, Gibbs sampling, Metropolis-Hastings
Pros: simple to understand, easy to implement, scales well, requires
little a priori knowledge of f (x)
Cons: inefficient for a fixed n, the right quadrature rule will do
much better
I

P
P
ButPwhen computing something like i g ( s f (yi , xi,s )ws ), errors in
g ( s f (yi , xi,s )ws ) for different i can offset one another

Paul Schrimpf ()

Matlab Optimization and Integration

January 14, 2009

35 / 43

Useful Matlab functions for Monte Carlo Integration

1
2
3
4
5

x
y
%
t
%

= rand(2,3); % 2 by 3 matrix of x U[0,1]


= randn(10); % 10 by 10 matrix of y N(0,1)
more generally
= random('t',3,2,1); % 2 by 1, tdist with 3 df
many other distributions possible

Paul Schrimpf ()

Matlab Optimization and Integration

January 14, 2009

36 / 43

Integration Example

1
2

clear;
% some integration experiments

3
4
5
6
7
8
9

% Example 1: E[p(x)], xN(0,1), p(x) polynomial


degree = 10;
p = rand(1,degree+1); % make some polynomial
fprintf('adaptive quad, tol %.1g = %.10g\n', ...
1e10, quad(@(x) polyval(p,x).*normpdf(x), ...
30,30,1e10));

Paul Schrimpf ()

Matlab Optimization and Integration

January 14, 2009

37 / 43

Integration Example

1
2
3
4
5
6
7
8
9
10
11

fprintf(' gausshermite quadrature\n');


for n=1:((degree+1)/2+4)
% use hermite will be exact for n(degree+1)/2
int = gaussHermite(n);
% notice the change of variables
ep = polyval(p,int.x*sqrt(2))'*int.w/sqrt(pi);
if (n==round((degree+1)/2))
fprintf(' the rest should be exact \n');
end
fprintf('n=%2d E[p(x)] = %.10g\n',n,ep);
end

Paul Schrimpf ()

Matlab Optimization and Integration

January 14, 2009

38 / 43

Integration Example

1
2
3
4
5

fprintf('\n monte carlo integration\n')


for n=1:6
fprintf('n=10%d E[p(x)] = %.10g\n',n, ...
mean(polyval(p,randn(10n,1))));
end

Paul Schrimpf ()

Matlab Optimization and Integration

January 14, 2009

39 / 43

MCMC

For nonstandard distributions, it is often impossible to sample from


p(x) directly
MCMC constructs a Markov Chain, xt p(xt |xt1 ), with stationary
distribution p(x) and transition kernel, p(xt |xt1 ) that can be
sampled from
Very common in Bayesian statistics

Paul Schrimpf ()

Matlab Optimization and Integration

January 14, 2009

40 / 43

Metropolis-Hastings

General method for constructing a Markov chain


Algorithm to draw from p(x): beginning with some x0
1
2
3
4

Draw y q(xt , )
p(y )q(y ,xt )
Compute (xt , y ) = p(x
t )q(xt ,y )
Draw u U[0, 1]
If u < (xt , y ) set xt+1 = y , otherwise set xt+1 = xt

Choice of candidate density, q, affects behavior of chain


I
I

If q is too disperse, will not accept many draws


If q is too concentrated, will accept lots of draws, but theyll be close
together

Example: metropolisHastings.m

Paul Schrimpf ()

Matlab Optimization and Integration

January 14, 2009

41 / 43

Exercises

If you have written any code that involves optimization or integration, try modifying it to use a different method.

Modify the dynamic programming code from lecture 1 to allow for a continuous distribution for income. If you are
clever, will be able to evaluate E v (x 0 , y 0 ) exactly, even if Ev (x 0 , y 0 ) does not have an analytic solution.

We know that value function iteration converges linearly. Gauss-Newton and other quadratic approximation based
method converge at faster rates. Change the dynamic programming code from lecture 1 to use one of these methods
instead of value function iteration.

Change the binary choice model into a multinomial choice model. Allow for correlation between the shocks. Try to
preserve the generality of the binary model, but feel free to limit the choice of distribution if it helps.

Paul Schrimpf ()

Matlab Optimization and Integration

January 14, 2009

42 / 43

More Exercises
1

(hard) I dont know much about cubature, but Id like to learn more. Read about a few methods. Find or write some
Matlab code for one of them. Explore the accuracy of the method. To maintain a desired level of accuracy, how does
the number of points grow with the number of dimensions? Compare it monte carlo integration.

(hard) Matlab lacks an implementation of an optimization algorithm that uses interpolation in multiple dimensions.
Remedy this situation. Find or develop an algorithm and implement it.

(hard, but not as hard) Write a function for computing arbitrary Gaussian quadrature rules with polynomials as a basis.
Given integration limits, a weight function, (x), and the number of points, n, your function should return the
integration points and weights. You might want to use the following facts taken from Numerical Recipes . Let
R
hf |g i = ab f (x)g (x)(x)dx denote the inner product. Then the following recurrence relation will construct a set of
orthogonal polynomials:
p1 (x) 0
p0 (x) 1
pj+1 (x) =

hxpj |pj i
hpj |pj i

!
x

hpj |pj i
hpj1 |pj1 i

pj1 (x)

Recall that the roots of the n degree polynomial will be the integration points. If you have the roots, {xj }nj=1 . Then
the weights are given by

wj =

Paul Schrimpf ()

hpn1 |pn1 i
pn1 (xj )pn0 (xj )

Matlab Optimization and Integration

January 14, 2009

43 / 43

You might also like