You are on page 1of 83

Basic GPR Framework Basic GPR Algorithm Extensions/Augmentations Neural Networks

Numerical Methods
Lecture 5: Machine Learning Methods

Zachary R. Stangebye

University of Notre Dame

Spring 2022
Basic GPR Framework Basic GPR Algorithm Extensions/Augmentations Neural Networks

Machine Learning

• Ill-defined but commonly-used term


Basic GPR Framework Basic GPR Algorithm Extensions/Augmentations Neural Networks

Machine Learning

• Ill-defined but commonly-used term

• Connotes complex “thinking”-like operations done by


computer, e.g.,
• Identifying/categorizing images
• Predicting consumer behavior based on patterns and trends in
online sphere
• Detecting spam in incoming e-mail messages
• Web search engines
• ...
Basic GPR Framework Basic GPR Algorithm Extensions/Augmentations Neural Networks

Machine Learning

• Ill-defined but commonly-used term

• Connotes complex “thinking”-like operations done by


computer, e.g.,
• Identifying/categorizing images
• Predicting consumer behavior based on patterns and trends in
online sphere
• Detecting spam in incoming e-mail messages
• Web search engines
• ...

• Rapid adoption in “big tech” and other private industries

• Gradual, recent adoption by academics/economists


Basic GPR Framework Basic GPR Algorithm Extensions/Augmentations Neural Networks

Machine Learning: Two Kinds

1. Supervised: Maps input data to known output points

2. Unsupervised: No known outputs

• Goal is to infer natural structures present in input data

• Pattern recognition, clustering, etc.


Basic GPR Framework Basic GPR Algorithm Extensions/Augmentations Neural Networks

Machine Learning: Two Kinds

1. Supervised: Maps input data to known output points

2. Unsupervised: No known outputs

• Goal is to infer natural structures present in input data

• Pattern recognition, clustering, etc.

• We’ll focus on former since it’s more useful for economists of


all stripes
Basic GPR Framework Basic GPR Algorithm Extensions/Augmentations Neural Networks

Supervised Machine Learning: What is it really?

• Approximate solution to a functional equation


Basic GPR Framework Basic GPR Algorithm Extensions/Augmentations Neural Networks

Supervised Machine Learning: What is it really?

• Approximate solution to a functional equation

• Not so different from what we’ve been doing all semester

• Difference: Bigger scope

• Large domain/input space (hundreds/thousands of dimensions)


• Capable of extreme non-linearities/high-order interactions
Basic GPR Framework Basic GPR Algorithm Extensions/Augmentations Neural Networks

Supervised Machine Learning: What is it really?

• Approximate solution to a functional equation

• Not so different from what we’ve been doing all semester

• Difference: Bigger scope

• Large domain/input space (hundreds/thousands of dimensions)


• Capable of extreme non-linearities/high-order interactions

• Example: Image recognition

• Each pixel is dimension/input


• No restrictions on which important/how they interact
Basic GPR Framework Basic GPR Algorithm Extensions/Augmentations Neural Networks

Machine Learning: Image Recognition Example

• “Is this a picture of Sylvester Stallone? Yes or No?”


Basic GPR Framework Basic GPR Algorithm Extensions/Augmentations Neural Networks

Machine Learning: Image Recognition Example

• “Is this a picture of Sylvester Stallone? Yes or No?”

• ML finds a function f : Ω → {0, 1}


• Ω is high-dimensional pixel space
Basic GPR Framework Basic GPR Algorithm Extensions/Augmentations Neural Networks

Key Innovation
• Machine learning thus an umbrella term for a collection of
functional approximation algorithms that break the curse of
dimensionality
• Estimating/parameterizing f often referred to as “training”

• Several algorithms fall under this categorically


Basic GPR Framework Basic GPR Algorithm Extensions/Augmentations Neural Networks

Key Innovation
• Machine learning thus an umbrella term for a collection of
functional approximation algorithms that break the curse of
dimensionality
• Estimating/parameterizing f often referred to as “training”

• Several algorithms fall under this categorically

1. Neural networks
2. Random forest models
3. Gaussian process regressions
4. ...
Basic GPR Framework Basic GPR Algorithm Extensions/Augmentations Neural Networks

Key Innovation
• Machine learning thus an umbrella term for a collection of
functional approximation algorithms that break the curse of
dimensionality
• Estimating/parameterizing f often referred to as “training”

• Several algorithms fall under this categorically

1. Neural networks
2. Random forest models
3. Gaussian process regressions
4. ...

• In each case, solution time scales nearly/roughly linearly with


number of dimensions
• Why?...Nobody really understands completely (yet)
• Likely why academics got on train more slowly
Basic GPR Framework Basic GPR Algorithm Extensions/Augmentations Neural Networks

In Economics...
• Economists finally starting to adopt these methods

• Actually not difficult to implement

• Just esoteric to understand

• Potential absolutely tremendous


Basic GPR Framework Basic GPR Algorithm Extensions/Augmentations Neural Networks

In Economics...
• Economists finally starting to adopt these methods

• Actually not difficult to implement

• Just esoteric to understand

• Potential absolutely tremendous

1. Bagherpour (2017): Predict mortgage loan defaults better


than existing methods
2. Albanesi and Vamossy (2019): Predict consumer default better
than FICO scores with same data
3. Fernandez-Villaverde et al. (2019): Solve heterogeneous
agents model with no “shortcuts”
4. . . .
Basic GPR Framework Basic GPR Algorithm Extensions/Augmentations Neural Networks

For us...
• Approximate value/policy functions
Basic GPR Framework Basic GPR Algorithm Extensions/Augmentations Neural Networks

For us...
• Approximate value/policy functions

• Follow Sheidigger and Bilionis (2017). Suggest combination of

• Gaussian Process Regression (GPR)


• Active Subspace (AS)
• Value function iteration (VFI)

• Can easily globally solve DSGE model of 500 dimensions with


significant non-linearities
• Closest competitor is Smolyak projection algorithm (maxes out
around 20 dimensions when pushed really hard)
• Has other benefits over Smolyak, e.g., for some non-linear
models it often works where Smolyak would fail
Basic GPR Framework Basic GPR Algorithm Extensions/Augmentations Neural Networks

Sheidigger and Bilionis (2017)

• Basic idea

1. Use Gaussian Process Regression (GPR) and (possibly) Active


Subspace (AS) to approximate value function
2. Update approximations with a VFI until convergence
Basic GPR Framework Basic GPR Algorithm Extensions/Augmentations Neural Networks

Sheidigger and Bilionis (2017)

• Basic idea

1. Use Gaussian Process Regression (GPR) and (possibly) Active


Subspace (AS) to approximate value function
2. Update approximations with a VFI until convergence

• Hybrid dynamic-programming/projection method

• The VFI part is easy (know it already)


• We’ll need to carefully break down the GPR and AS part
Basic GPR Framework Basic GPR Algorithm Extensions/Augmentations Neural Networks

Big-Picture Idea

• In search of a function V : RD → R

• Rather than treat V as a deterministic function, form beliefs


over set of possible f ’s and update with Bayes’ rules
Basic GPR Framework Basic GPR Algorithm Extensions/Augmentations Neural Networks

Big-Picture Idea

• In search of a function V : RD → R

• Rather than treat V as a deterministic function, form beliefs


over set of possible f ’s and update with Bayes’ rules

1. Start with prior beliefs over V

2. Update prior on V using Bellman equation + Bayes’ rule

3. Posterior is updated guess in a VFI setting

4. Restart next iteration without updating prior


• Bellman equation itself does all the updating we’ll need
Basic GPR Framework Basic GPR Algorithm Extensions/Augmentations Neural Networks

Gaussian Process: Interpolation Strategy

• Shaded area = 2× Standard Deviation


• Black lines: Draws from GP
• Zero posterior volatility at observed points
Basic GPR Framework Basic GPR Algorithm Extensions/Augmentations Neural Networks

Gaussian Process Regression I

• Assume we wish to evaluate the function at N input points


(“grid points” or ”training points”): X = {x1 , . . . , xN } ⊂ RD
Basic GPR Framework Basic GPR Algorithm Extensions/Augmentations Neural Networks

Gaussian Process Regression I

• Assume we wish to evaluate the function at N input points


(“grid points” or ”training points”): X = {x1 , . . . , xN } ⊂ RD

• Before evaluating equilibrium conditions, assume model


knowledge about f (·) by assigning a Gaussian Process (GP)
prior

• Let θ be the hyper-parameters of the model...

f |X, θ ∼ N (f |m, K )

where m ∈ RN and K ∈ RN×N is positive definite


Basic GPR Framework Basic GPR Algorithm Extensions/Augmentations Neural Networks

Gaussian Process Regression II

• Mean, m given by a function

m(x1 ; θ )
 

m = m(X; θ ) = 
 .. 
. 
m(xN ; θ )

• Similarly, covariance given by a function: K = K (X, X; θ )


Basic GPR Framework Basic GPR Algorithm Extensions/Augmentations Neural Networks

Gaussian Process Regression II

• Mean, m given by a function

m(x1 ; θ )
 

m = m(X; θ ) = 
 .. 
. 
m(xN ; θ )

• Similarly, covariance given by a function: K = K (X, X; θ )


• Comes from more general cross-covariance matrix
 
k(x1 , x̂1 ; θ ) . . . k(x1 , x̂N̂ ; θ )

K (X, X̂; θ ) =  .. .. ..  ∈ RN×N̂

 . . . 
k(xN , x̂1 ; θ ) . . . k(xN , x̂N̂ ; θ )

for an arbitrary set of N̂ inputs X̂ = {x̂1 , . . . , x̂N̂ } ⊂ RD


Basic GPR Framework Basic GPR Algorithm Extensions/Augmentations Neural Networks

Gaussian Process Regression III

• Parameterize m(·; θ ) with our initial pointwise guess

• Parameterize k(·, ·; θ ) with a kernel function

• Most common is square exponential (SE) covariance function

D
!
1 X (xi − xi0 )2
kSE (x, x 0 ; θ ) = s 2 exp −
2 li2
i=1

• Implies θ = {s, l1 , . . . , lD }
Basic GPR Framework Basic GPR Algorithm Extensions/Augmentations Neural Networks

Gaussian Process Regression IV

D
!
1 X (xi − xi0 )2
kSE (x, x 0 ; θ ) = s 2 exp −
2 li2
i=1

• For SE covariance function...


• s > 0 is called the signal strength
• li > 0 is called the lengthscale of the i th input/dimension
Basic GPR Framework Basic GPR Algorithm Extensions/Augmentations Neural Networks

Gaussian Process Regression IV

D
!
1 X (xi − xi0 )2
kSE (x, x 0 ; θ ) = s 2 exp −
2 li2
i=1

• For SE covariance function...


• s > 0 is called the signal strength
• li > 0 is called the lengthscale of the i th input/dimension

• Notice
• K has diagonal row of s 2
• li governs contribution of input/dimension i to correlation
across points
• As x → x 0 , correlation approaches 1. Implies continuously
differentiable functions
Basic GPR Framework Basic GPR Algorithm Extensions/Augmentations Neural Networks

GPR Measurement
• Evaluate the equilibrium conditions at our guess/prior

t i = f (xi ), for i = 1, . . . , N
Basic GPR Framework Basic GPR Algorithm Extensions/Augmentations Neural Networks

GPR Measurement
• Evaluate the equilibrium conditions at our guess/prior

t i = f (xi ), for i = 1, . . . , N

• Helps approximation to assume that we observe the output


with some i.i.d. measurement error with noise sn2
• Gives rise to t

t i |f (x i ), sn ∼ N (t i |f (xi ), sn2 )

where sn2 is another hyperparameter (we’ll keep it small)

• Independence in measurement noise/across observations =⇒

Likelihood : t|X, θ , sn ∼ N (t|m, K + sn2 IN )


Basic GPR Framework Basic GPR Algorithm Extensions/Augmentations Neural Networks

Choosing the Hyperparameters

• Done by simple maximum (log) likelihood (MLE)

θ ∗ , sn∗ = arg max log p(t|X, θ , sn )


θ ,sn
Basic GPR Framework Basic GPR Algorithm Extensions/Augmentations Neural Networks

Choosing the Hyperparameters

• Done by simple maximum (log) likelihood (MLE)

θ ∗ , sn∗ = arg max log p(t|X, θ , sn )


θ ,sn

• Solve with simple gradient-based method (BFGS) from


multiple starting points for robustness
Basic GPR Framework Basic GPR Algorithm Extensions/Augmentations Neural Networks

GPR Posterior
• Combine prior GP with likelihood to get posterior GP
 
f (·)|X, t, θ ∗ , sn∗ ∼ N f (·)|m̃(·), k̃(·, ·)

where
−1
m̃(x) = m(x; θ ∗ ) + K (x, X; θ ∗ ) K + (sn∗ )2 IN (t − m)
| {z } | {z } | {z }
1×N N×N N×1

and
−1
k̃(x, x0 ) = k(x, x0 ; θ ∗ )−K (x, X; θ ∗ ) K + (sn∗ )2 IN K (X, x; θ ∗ )
| {z } | {z } | {z }
1×N N×N N×1
Basic GPR Framework Basic GPR Algorithm Extensions/Augmentations Neural Networks

Relation to Projection Methods

• Posterior mean ⇐⇒ Linear combination of SE basis functions


N
X

m̃(x) = m(x; θ ) + ai k(x, xi ; θ ∗ )
i=1
−1
where a = K + (sn∗ )2 IN (t − m)
Basic GPR Framework Basic GPR Algorithm Extensions/Augmentations Neural Networks

Relation to Projection Methods

• Posterior mean ⇐⇒ Linear combination of SE basis functions


N
X

m̃(x) = m(x; θ ) + ai k(x, xi ; θ ∗ )
i=1
−1
where a = K + (sn∗ )2 IN (t − m)

• Typically m(x; θ ∗ ) = 0, but works for any prior


Basic GPR Framework Basic GPR Algorithm Extensions/Augmentations Neural Networks

General Description: DSGE Model

• Dynamic, stochastic, discrete-time, infinite-horizon economy


with D dimensions
• Exogenous/endogenous states x ∈ RD
• Control variables, c, chosen from C(x)
• Law of motion: x0 ∼ F (·|x, c)
• Bellman equation

TV (x) = max u(c, x) + βEF |x,c V (x̃0 )


 
c∈C(x)
Basic GPR Framework Basic GPR Algorithm Extensions/Augmentations Neural Networks

General Description: DSGE Model

• Dynamic, stochastic, discrete-time, infinite-horizon economy


with D dimensions
• Exogenous/endogenous states x ∈ RD
• Control variables, c, chosen from C(x)
• Law of motion: x0 ∼ F (·|x, c)
• Bellman equation

TV (x) = max u(c, x) + βEF |x,c V (x̃0 )


 
c∈C(x)

• Equilibrium given by V = TV
• Optimal policy function: p(x) ∈ C(x)
Basic GPR Framework Basic GPR Algorithm Extensions/Augmentations Neural Networks

Full Algorithm: DSGE Model

• Initial guess if V ∞ . Then at each iteration step s...


Basic GPR Framework Basic GPR Algorithm Extensions/Augmentations Neural Networks

Full Algorithm: DSGE Model

• Initial guess if V ∞ . Then at each iteration step s...

1. Generate (randomly) relatively small training set (grid points)


of size ns , xs1:n . Evaluate Bellman to get

tis = TV s−1 (xsi ) for i ∈ 1, . . . , ns

2. Apply GPR to “learn” update, Vsurrogate , which is the


posterior mean. Set V s = Vsurrogate
3. Repeat until V s converges to V s−1 at 10, 000 random,
non-training-input points
Basic GPR Framework Basic GPR Algorithm Extensions/Augmentations Neural Networks

Full Algorithm: Notes


• Note we “reset” prior variance and mean for each iteration

• Early priors don’t filter down through later iterations

• Not true Bayesian approach, but much more efficient


Basic GPR Framework Basic GPR Algorithm Extensions/Augmentations Neural Networks

Full Algorithm: Notes


• Note we “reset” prior variance and mean for each iteration

• Early priors don’t filter down through later iterations

• Not true Bayesian approach, but much more efficient

• Once done (or in each iteration), “learn” an approximation for


policy function too
Basic GPR Framework Basic GPR Algorithm Extensions/Augmentations Neural Networks

Full Algorithm: Notes


• Note we “reset” prior variance and mean for each iteration

• Early priors don’t filter down through later iterations

• Not true Bayesian approach, but much more efficient

• Once done (or in each iteration), “learn” an approximation for


policy function too
• While GPR easy to code yourself, I recommend
GaussianProcesses.jl package
• Treats Gaussian Processes as own class of variables

• Much faster built-in training

• Less code development time/room for errors


Basic GPR Framework Basic GPR Algorithm Extensions/Augmentations Neural Networks

Training Inputs
• In each iteration, select (5 to 10) × D training inputs (grid
points)
• This is where curse of dimensionality blatantly disappears!
Basic GPR Framework Basic GPR Algorithm Extensions/Augmentations Neural Networks

Training Inputs
• In each iteration, select (5 to 10) × D training inputs (grid
points)
• This is where curse of dimensionality blatantly disappears!

• Typically drawn uniformly uniformly from [x, x̄]D for


well-behaved problems
• Not always efficient for complex/irregular state spaces

• Here, ergodic distributions more concentrated


Basic GPR Framework Basic GPR Algorithm Extensions/Augmentations Neural Networks

Examples

1. Simple consumption-saving model with uncertainty

• Endowment economy (y ) follows an AR(1)

b0
 
+ βEỹ 0 |y V (ỹ , b 0 )
 
V (y , b) = max u y − b +
b0 1+r
Basic GPR Framework Basic GPR Algorithm Extensions/Augmentations Neural Networks

Examples

1. Simple consumption-saving model with uncertainty

• Endowment economy (y ) follows an AR(1)

b0
 
+ βEỹ 0 |y V (ỹ , b 0 )
 
V (y , b) = max u y − b +
b0 1+r

2. Production economy with adjustment costs (see previous


slides)
Basic GPR Framework Basic GPR Algorithm Extensions/Augmentations Neural Networks

Ergodic Training Inputs

• If state space is irregular, then begin with uniform draws in


early iterations, but in later iterations...

1. Simulate (non-equilibrium) law of motion for n periods to


derive estimate of ergodic set

Xergodic = {xi : 1 ≤ i ≤ n}

2. Fit a histogram/Gaussian model mixture to the ergodic


distribution. Call this density, ρestimated (x)
3. Draw N training inputs from ρestimated (x) rather than [x, x̄]D
Basic GPR Framework Basic GPR Algorithm Extensions/Augmentations Neural Networks

Ergodic Training Inputs

• If state space is irregular, then begin with uniform draws in


early iterations, but in later iterations...

1. Simulate (non-equilibrium) law of motion for n periods to


derive estimate of ergodic set

Xergodic = {xi : 1 ≤ i ≤ n}

2. Fit a histogram/Gaussian model mixture to the ergodic


distribution. Call this density, ρestimated (x)
3. Draw N training inputs from ρestimated (x) rather than [x, x̄]D

• Gets it right “where it counts”


Basic GPR Framework Basic GPR Algorithm Extensions/Augmentations Neural Networks

Ergodic Training Inputs: My Thoughts/Experience

• Really crucial to get reliable convergence

• Two useful tips


Basic GPR Framework Basic GPR Algorithm Extensions/Augmentations Neural Networks

Ergodic Training Inputs: My Thoughts/Experience

• Really crucial to get reliable convergence

• Two useful tips

1. After “enough” iterations drawing from the ergodic


distribution, fix the training inputs for later iterations
• Continue to check convergence with updated ergodic
distributions
Basic GPR Framework Basic GPR Algorithm Extensions/Augmentations Neural Networks

Ergodic Training Inputs: My Thoughts/Experience

• Really crucial to get reliable convergence

• Two useful tips

1. After “enough” iterations drawing from the ergodic


distribution, fix the training inputs for later iterations
• Continue to check convergence with updated ergodic
distributions

2. Draw training inputs from “typical set” rather than straight


ergodic distribution
• Ensures we’re getting a ‘disperse’ or ‘representative’ draw

• Fills relevant space more efficiently


Basic GPR Framework Basic GPR Algorithm Extensions/Augmentations Neural Networks

Typical Sets

• Information-theoretic concept

• Given N i.i.d. draws from a distribution, what does a ‘typical’


sequence look like
• Distribution of realized draws somehow matches underlying
distribution
• Not the most likely sequence, e.g., repeated mode
Basic GPR Framework Basic GPR Algorithm Extensions/Augmentations Neural Networks

Typical Sets

• Information-theoretic concept

• Given N i.i.d. draws from a distribution, what does a ‘typical’


sequence look like
• Distribution of realized draws somehow matches underlying
distribution
• Not the most likely sequence, e.g., repeated mode

• Defined for a level of ‘typicality’ 1 − 

• Typical set AN () ⊂ X N for domain X


Basic GPR Framework Basic GPR Algorithm Extensions/Augmentations Neural Networks

Atypical Set: Example

0.4

0.3

0.2

0.1

0.0
-4 -2 0 2 4
Basic GPR Framework Basic GPR Algorithm Extensions/Augmentations Neural Networks

Typical Set: Example

0.4

0.3

0.2

0.1

0.0
-4 -2 0 2 4
Basic GPR Framework Basic GPR Algorithm Extensions/Augmentations Neural Networks

Typical Set: Formal Definition

• {x1 , x2 , . . . , xN } ∈ AN () iff


N
1 X
− log(p(xi )) − H(x̃) <
N
i=1

where H(x̃) is the entropy of the random variable x̃, i.e., if


domain of random variable is X , then
X
H(x̃) = − p(x) log(p(x))
x∈X
Basic GPR Framework Basic GPR Algorithm Extensions/Augmentations Neural Networks

Irregular State Space Example: Arellano (2008)

• Sovereign default model with short-term bonds (full


description in earlier lectures)

u(y −b+q(y , b 0 )b 0 )+βE max{VR (ỹ 0 , b 0 ), VD (ỹ 0 )}


 
VR (y , b) = max
0 b

VD (y ) = u(ydef (y )) + βE πVR (ỹ 0 , 0) + (1 − π)VD (ỹ 0 )


 

1
q(y , b 0 ) = E 1{VR (ỹ 0 , b 0 ) ≥ VD (ỹ 0 )}
 
1+r
Basic GPR Framework Basic GPR Algorithm Extensions/Augmentations Neural Networks

Irregular State Space Example: Arellano (2008)

0.030
1.2

0.025

1.1
0.020

1.0 0.015

0.010
0.9

0.005

0.8
0
0.0 0.1 0.2 0.3
Basic GPR Framework Basic GPR Algorithm Extensions/Augmentations Neural Networks

Really Big State Spaces


• GPR loses accuracy for LARGE dimensionality (D >> 20 or
so)
• Happens because they rely on Euclidian distance to define
input-space correlations (less informative as D grows)
Basic GPR Framework Basic GPR Algorithm Extensions/Augmentations Neural Networks

Really Big State Spaces


• GPR loses accuracy for LARGE dimensionality (D >> 20 or
so)
• Happens because they rely on Euclidian distance to define
input-space correlations (less informative as D grows)
• In these cases, rely on active subspaces
• Translate action from large state space to a lower-dimension
one that captures its essential features
Basic GPR Framework Basic GPR Algorithm Extensions/Augmentations Neural Networks

Really Big State Spaces


• GPR loses accuracy for LARGE dimensionality (D >> 20 or
so)
• Happens because they rely on Euclidian distance to define
input-space correlations (less informative as D grows)
• In these cases, rely on active subspaces
• Translate action from large state space to a lower-dimension
one that captures its essential features

• Assume function can be well-approximated by

f (x) ≈ h(WT x)
for some W ∈ RD×d that projects the high-dimensional input
space the a lower-dimensionsal active subspace, Rd
• h : Rd → R is called the link function
• Repeat whole previous algorithm on h
Basic GPR Framework Basic GPR Algorithm Extensions/Augmentations Neural Networks

Choosing W

1. Define a D × D matrix
Z
C = (∇f (x))(∇f (x))T ρ(x)dx

where ρ is uniform over the state space


Basic GPR Framework Basic GPR Algorithm Extensions/Augmentations Neural Networks

Choosing W

1. Define a D × D matrix
Z
C = (∇f (x))(∇f (x))T ρ(x)dx

where ρ is uniform over the state space


2. Since C is positive definite, decompose

Λ VT
C = VΛ

where Λ is a diagonal matrix with eigenvalues of C in


decreasing order and V is orthonormal matrix of
corresponding eigenvectors
Basic GPR Framework Basic GPR Algorithm Extensions/Augmentations Neural Networks

Choosing W

1. Define a D × D matrix
Z
C = (∇f (x))(∇f (x))T ρ(x)dx

where ρ is uniform over the state space


2. Since C is positive definite, decompose

Λ VT
C = VΛ

where Λ is a diagonal matrix with eigenvalues of C in


decreasing order and V is orthonormal matrix of
corresponding eigenvectors
3. W = V1 , which correspond to eigenvectors of d largest
eigenvalues
Basic GPR Framework Basic GPR Algorithm Extensions/Augmentations Neural Networks

In practice

• Set d to something a bit less than 20, e.g., 15


• Cannot construct C analytically. Instead
1. Draw large N points uniformly from state space
2. Estimate numerical gradient at each point
3. Define
N
1 X i i T
CN = g (g )
N
i=1

4. Perform singular value decomposition (SVD) on CN to get Λ


and V
5. Back out W from V as before
Basic GPR Framework Basic GPR Algorithm Extensions/Augmentations Neural Networks

In practice

• Set d to something a bit less than 20, e.g., 15


• Cannot construct C analytically. Instead
1. Draw large N points uniformly from state space
2. Estimate numerical gradient at each point
3. Define
N
1 X i i T
CN = g (g )
N
i=1

4. Perform singular value decomposition (SVD) on CN to get Λ


and V
5. Back out W from V as before

• Using this, Scheidigger and Bilionis (2017) accurately and


globally solve a DSGE model with 500 continuous states! By
far the most powerful tool in your toolbox
Basic GPR Framework Basic GPR Algorithm Extensions/Augmentations Neural Networks

In practice
• Most interesting structural models have dimensionality
significantly less than 20
• Might not be true of more empirically oriented frameworks
Basic GPR Framework Basic GPR Algorithm Extensions/Augmentations Neural Networks

In practice
• Most interesting structural models have dimensionality
significantly less than 20
• Might not be true of more empirically oriented frameworks
• But, could use this in following way. If you want to choose
parameters to match moments/calibrate
1. Add each parameter of interest as a dimension with
deterministic, constant law of motion
2. Solve model once
3. Select parameters as points in state space. Model dynamics
respond immediately
4. Can calibrate dozens of parameters while solving the model
only once
Basic GPR Framework Basic GPR Algorithm Extensions/Augmentations Neural Networks

Alternate Algorithms

• Many other machine learning algorithms

• Tend to share feature that they kill curse of dimensionality


Basic GPR Framework Basic GPR Algorithm Extensions/Augmentations Neural Networks

Alternate Algorithms

• Many other machine learning algorithms

• Tend to share feature that they kill curse of dimensionality

• Most popular alternative in economics: Neural Networks

• Approximate function with layers of simple, non-linear


functions (“neurons/nodes”)
Basic GPR Framework Basic GPR Algorithm Extensions/Augmentations Neural Networks

Neural Networks Visually


Basic GPR Framework Basic GPR Algorithm Extensions/Augmentations Neural Networks

Neural Networks in Broad Terms (I)

• A node/neuron often called “perceptron” and is a simple


(basis) function referred to as an “activation function,” e.g.,

(
0, x ≤ 0
(ReLU) φ(x) =
x, x > 0
Basic GPR Framework Basic GPR Algorithm Extensions/Augmentations Neural Networks

Neural Networks in Broad Terms (I)

• A node/neuron often called “perceptron” and is a simple


(basis) function referred to as an “activation function,” e.g.,

(
0, x ≤ 0
(ReLU) φ(x) =
x, x > 0
1
(Sigmoid) φ(x) =
1 + e −x
Basic GPR Framework Basic GPR Algorithm Extensions/Augmentations Neural Networks

Neural Networks in Broad Terms (I)

• A node/neuron often called “perceptron” and is a simple


(basis) function referred to as an “activation function,” e.g.,

(
0, x ≤ 0
(ReLU) φ(x) =
x, x > 0
1
(Sigmoid) φ(x) =
1 + e −x

(ArcTan) φ(x) = tan−1 (x)


...
Basic GPR Framework Basic GPR Algorithm Extensions/Augmentations Neural Networks

Neural Networks in Broad Terms (II)

• Impulse of a perceptron in a layer, l, described by a set of


coefficients, θ l for a layer l

• Denote number of perceptrons in a layer by Nl

• Every perceptron has weights, θ W


l,i ∈ R
Nl−1 , and constant

bias, θ bl , and functional form for output


 
Nl−1
X
b W
ol,i (x) = φ θl,i + θl,i,j ol−1,j (x)
j=1
Basic GPR Framework Basic GPR Algorithm Extensions/Augmentations Neural Networks

Neural Networks in Broad Terms (III)

• “Training” the neural network is the process of estimating


{θθ l }M
l=1 for a network of M layers

• Typically done with gradient-based algorithm


Basic GPR Framework Basic GPR Algorithm Extensions/Augmentations Neural Networks

Neural Networks in Broad Terms (III)

• “Training” the neural network is the process of estimating


{θθ l }M
l=1 for a network of M layers

• Typically done with gradient-based algorithm

• Similar in spirit to projection methods

• Use a non-linear, iterated combination of simple activation


functions rather than a simple linear combination of orthogonal
basis functions
• Solved in a very similar way (black box solver over coefficients)

• Much more difficult to demonstrate results theoretically/derive


confidence metrics, but tends to work really well in practice!
Basic GPR Framework Basic GPR Algorithm Extensions/Augmentations Neural Networks

Neural Networks in a VFI


• Based on Ferndandez-Villaverde et al. (2020), which I’ll call
FHN and own experience
• Use neural networks to approximate equilibrium objects
(value/policy/pricing functions, etc.)
• Simplest working neural network is an MLP (Multi-layer
perceptron): At least one hidden layer.
1. Initial layer always same size as dimension (D)
2. Middle layer size (N) a choice (FHN set N = 16)
• Can increase to deal with more non-linearities
3. Final layer size = 1 (since it’s a function)
• FHN find that a single-layer MLP appears sufficient even for
complex problems in structural economics
• Same issues with choice of training inputs as GPR
• Less sensitive to changing parameter choice than GPR as
neurons do not correspond to training inputs
Basic GPR Framework Basic GPR Algorithm Extensions/Augmentations Neural Networks

Neural Networks in a VFI II


• Choice of activation function in final layer depends on output
• Bounded problem: Sigmoid or tanh
• Unbounded problem: Softmax or relu
• I have found that the approximation works best when domain
is scaled to [0, 1]D and range is scaled into [0, 1]
Basic GPR Framework Basic GPR Algorithm Extensions/Augmentations Neural Networks

Training Neural Networks


• Most routines written for big data
1. Start with parameter guess: θi
2. Randomly sample domain of training inputs: “batch”
3. Update θi+1 with a gradient descent (not Newton step) with
fixed constant
4. Repeat with a new batch from current θi+1 : “epoch”
Basic GPR Framework Basic GPR Algorithm Extensions/Augmentations Neural Networks

Training Neural Networks


• Most routines written for big data
1. Start with parameter guess: θi
2. Randomly sample domain of training inputs: “batch”
3. Update θi+1 with a gradient descent (not Newton step) with
fixed constant
4. Repeat with a new batch from current θi+1 : “epoch”
• Our data requirements are much smaller and our need for
accuracy in iterations is higher. Better to
1. Use entire training set as one “batch”
2. Use a better optimizer than gradient descent, i.e., BFGS,
Nelder-Mead, etc.
• I recommend using the FluxOptTools package, as such
algorithms are not built into Flux

You might also like