Professional Documents
Culture Documents
LECTURE NOTES ON
ECONOMIC DYNAMICS
Peter N. Ireland
Department of Economics
Boston College
irelandp@bc.edu
http://www2.bc.edu/~irelandp/ec720.html
Copyright (c) 2008 by Peter N. Ireland. Redistribution is permitted for educational and research
purposes, so long as no changes are made. All copies much be provided free of charge and must include
this copyright notice.
Two Useful Theorems
Two theorems will prove quite useful in all of our discussions of dynamic optimization:
the Kuhn-Tucker Theorem and the Envelope Theorem. Let’s consider each of these in turn.
x ∈ R choice variable
F : R → R objective function, continuously differentiable
c ≥ G(x) constraint, with c ∈ R and G : R → R, also continuously differentiable.
Probably the easiest way to solve this problem is via the method of Lagrange multipliers.
The mathematical foundations that allow for the application of this method are given
to us by Lagrange’s Theorem or, in its most general form, the Kuhn-Tucker Theorem.
1
L2 (x∗ , λ∗ ) = c − G(x∗ ) ≥ 0, (2)
λ∗ ≥ 0, (3)
and
λ∗ [c − G(x∗ )] = 0. (4)
Proof Consider two possible cases, depending on whether or not the constraint is binding
at x∗ .
If c > G(x∗ ), then let λ∗ = 0. Clearly, (2)-(4) are satisfied, so it only remains to show
that (1) must hold. With λ∗ = 0, (1) holds if and only if
F 0 (x∗ ) = 0. (5)
We can show that (5) must hold using a proof by contradiction. Suppose that
instead of (5), it turns out that
F 0 (x∗ ) < 0.
Then, by the continuity of F and G, there must exist an ε > 0 such that
But this result contradicts the assumption that x∗ maximizes F (x) subject to
c ≥ G(x). Similarly, if it turns out that
F 0 (x∗ ) > 0,
then by the continuity of F and G there must exist an ε > 0 such that
But, again, this result contradicts the assumption that x∗ maximizes F (x) subject
to c ≥ G(x). This establishes that (5) must hold, completing the proof for case 1.
If c = G(x∗ ), then let λ∗ = F 0 (x∗ )/G0 (x∗ ). This is possible, given the assumption
that G0 (x∗ ) 6= 0. Clearly, (1), (2), and (4) are satisfied, so it only remains to show
that (3) must hold. With λ∗ = F 0 (x∗ )/G0 (x∗ ), (3) holds if and only if
We can show that (6) must hold using a proof by contradiction. Suppose that
instead of (6), it turns out that
2
One way that this can happen is if F 0 (x∗ ) > 0 and G0 (x∗ ) < 0. But if these
conditions hold, then the continuity of F and G implies the existence of an ε > 0
such that
F (x∗ + ε) > F (x∗ ) and c = G(x∗ ) > G(x∗ + ε),
which contradicts the assumption that x∗ maximizes F (x) subject to c ≥ G(x).
If, instead, F 0 (x∗ )/G0 (x∗ ) < 0 because F 0 (x∗ ) < 0 and G0 (x∗ ) > 0, then the
continuity of F and G implies the existence of an ε > 0 such that
Notes:
a) The theorem can be extended to handle cases with more than one choice variable
and more than one constraint: see Dixit or Simon-Blume.
b) Equations (1)-(4) are necessary conditions: If x∗ is a solution to the optimization
problem, then there exists a λ∗ such that (1)-(4) must hold. But (1)-(4) are not
sufficient conditions: if x∗ and λ∗ satisfy (1)-(4), it does not follow automatically
that x∗ is a solution to the optimization problem.
Despite point (b) listed above, the Kuhn-Tucker theorem is extremely useful in practice.
Suppose that we are looking for the solution x∗ to the constrained optimization problem
then x∗ and the associated λ∗ must satisfy the first-order condition (FOC) obtained
by differentiating L by x and setting the result equal to zero:
c ≥ G(x∗ ). (2)
λ∗ ≥ 0. (3)
3
And finally, we know that the complementary slackness condition
λ∗ [c − G(x∗ )] = 0, (4)
must hold: If λ∗ > 0, then the constraint must bind; if the constraint does not bind,
then λ∗ = 0.
In searching for the value of x that solves the constrained optimization problem, we only
need to consider values of x∗ that satisfy (1)-(4).
Two pieces of terminology:
Thus, in solving the problem in this way, we are using the Lagrangian to turn a constrained
optimization problem into an unconstrained optimization problem, where we seek to
maximize L(x, λ) rather than simply F (x).
One final note:
Our general constraint, c ≥ G(x), nests as a special case the nonnegativity constraint
x ≥ 0, obtained by setting c = 0 and G(x) = −x.
So nonnegativity constraints can be introduced into the Lagrangian in the same way
as all other constraints. If we consider, for example, the extended problem
max F (x) subject to c ≥ G(x) and x ≥ 0,
x
4
Of course, in (10 ), µ∗ ≥ 0 in general and µ∗ = 0 if x∗ > 0. So a close inspection reveals
that these two approaches to handling nonnegativity constraints lead in the end
to the same results.
Dixit, Chapter 5.
Simon-Blume, Chapter 19.
Now, let’s generalize the problem by allowing the functions F and G to depend on a
parameter θ ∈ R. The problem can now be stated as
First, given θ, find the value of x∗ that solves the constrained optimization problem.
Second, substitute this value of x∗ , together with the given value of θ, into the objec-
tive function to obtain
V (θ) = F (x∗ , θ)
Now suppose that we want to investigate the properties of this function V . Suppose, in
particular, that we want to take the derivative of V with respect to its argument θ.
As the first step in evaluating V 0 (θ), consider solving the constrained optimization problem
for any given value of θ by setting up the Lagrangian
We know from the Kuhn-Tucker theorem that the solution x∗ to the optimization problem
and the associated value of the multiplier λ∗ must satisfy the complementary slackness
condition:
λ∗ [c − G(x∗ , θ)] = 0
5
Use this last result to rewrite the expression for V as
V (θ) = F (x∗ , θ) = F (x∗ , θ) + λ∗ [c − G(x∗ , θ)]
So suppose that we tried to calculate V 0 (θ) simply by differentiating both sides of this
equation with respect to θ:
V 0 (θ) = F2 (x∗ , θ) − λ∗ G2 (x∗ , θ).
In principle, this formula may not be correct. The reason is that x∗ and λ∗ will themselves
depend on the parameter θ, and we must take this dependence into account when
differentiating V with respect to θ.
However, the envelope theorem tells us that our formula for V 0 (θ) is, in fact, correct. That
is, the envelope theorem tells us that we can ignore the dependence of x∗ and λ∗ on θ
in calculating V 0 (θ).
To see why, for any θ, let x∗ (θ) denote the solution to the problem: max F (x, θ) subject to
c ≥ G(x, θ), and let λ∗ (θ) be the associated Lagrange multiplier.
Theorem (Envelope) Let F and G be continuously differentiable functions of x and θ.
For any given θ, let x∗ (θ) maximize F (x, θ) subject to c ≥ G(x, θ), and let λ∗ (θ) be
the value of the associated Lagrange multiplier. Suppose, further, that x∗ (θ) and λ∗ (θ)
are also continuously differentiable functions, and that the constraint qualification
G1 [x∗ (θ), θ] 6= 0 holds for all values of θ. Then the maximum value function defined by
V (θ) = max F (x, θ) subject to c ≥ G(x, θ)
x
satisfies
V 0 (θ) = F2 [x∗ (θ), θ] − λ∗ (θ)G2 [x∗ (θ), θ]. (7)
Proof The Kuhn-Tucker theorem tells us that for any given value of θ, x∗ (θ) and λ∗ (θ)
must satisfy
L1 [x∗ (θ), λ∗ (θ)] = F1 [x∗ (θ), θ] − λ∗ (θ)G1 [x∗ (θ), θ] = 0, (1)
and
λ∗ (θ){c − G[x∗ (θ), θ]} = 0. (4)
In light of (4),
V (θ) = F [x∗ (θ), θ] = F [x∗ (θ), θ] + λ∗ (θ){c − G[x∗ (θ), θ]}
Differentiating both sides of this expression with respect to θ yields
V 0 (θ) = F1 [x∗ (θ), θ]x∗0 (θ) + F2 [x∗ (θ), θ]
+λ∗0 (θ){c − G[x∗ (θ), θ]} − λ∗ (θ)G1 [x∗ (θ), θ]x∗0 (θ)
−λ∗ (θ)G2 [x∗ (θ), θ]
which shows that, in principle, we must take the dependence of x∗ and λ∗ on θ into
account when calculating V 0 (θ).
6
Note, however, that
V 0 (θ) = {F1 [x∗ (θ), θ] − λ∗ (θ)G1 [x∗ (θ), θ]}x∗0 (θ)
+F2 [x∗ (θ), θ] + λ∗0 (θ){c − G[x∗ (θ), θ]} − λ∗ (θ)G2 [x∗ (θ), θ],
which by (1) reduces to
V 0 (θ) = F2 [x∗ (θ), θ] + λ∗0 (θ){c − G[x∗ (θ), θ]} − λ∗ (θ)G2 [x∗ (θ), θ]
Clearly, (8) holds for any θ such that the constraint is binding.
For θ such that the constraint is not binding, (4) implies that λ∗ (θ) must equal zero.
Furthermore, by the continuity of G and x∗ , if the constraint does not bind at θ, there
exists a ε∗ > 0 such that the constraint does not bind for all θ + ε with ε∗ > |ε|. Hence,
(4) also implies that λ∗ (θ + ε) = 0 for all ε∗ > |ε|. Using the definition of the derivative
∗0 λ∗ (θ + ε) − λ∗ (θ) 0
λ (θ) = lim = lim = 0,
ε→0 ε ε→0 ε
To evaluate V for any given value of θ, use the same two-step procedure as before. First,
find the value x∗ (θ) that solves the unconstrained maximization problem for that value
of θ. Second, substitute that value of x back into the objective function to obtain
V (θ) = F [x∗ (θ), θ].
7
Now differentiate both sides of this expression through by θ, carefully taking the dependence
of x∗ on θ into account:
But, if x∗ (θ) is the value of x that maximizes F given θ, we know that x∗ (θ) must be a
critical value of F :
F1 [x∗ (θ), θ] = 0.
Hence, for the unconstrained problem, the envelope theorem implies that
The envelope theorem for this constrained problem tells us that we can also ignore the
dependence of x∗ on θ when differentiating V with respect to θ, but only if we start by
adding the complementary slackness condition to the maximized objective function to
first obtain
V (θ) = F [x∗ (θ), θ] + λ∗ (θ){c − G[x∗ (θ), θ]}.
In taking this first step, we are actually evaluating the entire Lagrangian at the optimum,
instead of just the objective function. We need to take this first step because for the
constrained problem, the Kuhn-Tucker condition (1) tells us that x∗ (θ) is a critical
point, not of the objective function by itself, but of the entire Lagrangian formed by
adding the product of the multiplier and the constraint to the objective function.
And what gives the envelope theorem its name? The “envelope” theorem refers to a
geometrical presentation of the same result that we’ve just worked through.
To see where that geometrical interpretation comes from, consider again the simpler, un-
constrained optimization problem:
8
Following along with our previous notation, let x∗ (θ) denote the solution to this problem
for any given value of θ, so that the function x∗ (θ) tells us how the optimal choice of
x depends on the parameter θ.
Also, continue to define the maximum value function V in the same way as before:
Now let θ1 denote a particular value of θ, and let x1 denote the optimal value of x associated
with this particular value θ1 . That is, let
x1 = x∗ (θ1 ).
After substituting this value of x1 into the function F , we can think about how F (x1 , θ)
varies as θ varies–that is, we can think about F (x1 , θ) as a function of θ, holding x1
fixed.
In the same way, let θ2 denote another particular value of θ, with θ2 > θ1 let’s say. And
following the same steps as above, let x2 denote the optimal value of x associated with
this particular value θ2 , so that
x2 = x∗ (θ2 ).
Once again, we can hold x2 fixed and consider F (x2 , θ) as a function of θ.
The geometrical presentation of the envelope theorem can be derived by thinking about the
properties of these three functions of θ: V (θ), F (x1 , θ), and F (x2 , θ).
One thing that we know about these three functions is that for θ = θ1 :
where the first equality and the second inequality both follow from the fact that, by
definition, x1 maximizes F (x, θ1 ) by choice of x.
Another thing that we know about these three functions is that for θ = θ2 :
At θ1 , V (θ) coincides with F (x1 , θ), which lies above F (x2 , θ).
At θ2 , V (θ) coincides with F (x2 , θ), which lies above F (x1 , θ).
And we could find more and more values of V by repeating this procedure for more
and more specific values of θi , i = 1, 2, 3, ....
In other words:
9
V(ө)
The Envelope Theorem
F(x2,ө)
F(x1,ө)
ө1 ө2 ө
V (θ) traces out the “upper envelope” of the collection of functions F (xi , θ), formed
by holding xi = x∗ (θi ) fixed and varying θ.
Moreover, V (θ) is tangent to each individual function F (xi , θ) at the value θi of θ for
which xi is optimal, or equivalently:
which is the same analytical result that we derived earlier for the unconstrained
optimization problem.
To generalize these arguments so that they apply to the constrained optimization problem
simply use the fact that in most cases (where the appropriate second-order conditions
hold) the value x∗ (θ) that solves the constrained optimization problem for any given
value of θ also maximizes the Lagrangian function
so that
Now just replace the function F with the function L in working through the arguments
from above to conclude that
V 0 (θ) = L3 [x∗ (θ), λ∗ (θ), θ] = F2 [x∗ (θ), θ] − λ∗ (θ)G2 [x∗ (θ), θ],
3 Two Examples
3.1 Utility Maximization
A consumer has a utility function defined over consumption of two goods: U (c1 , c2 )
Prices: p1 and p2
Income: I
Budget constraint: I ≥ p1 c1 + p2 c2 = G(c1 , c2 )
The consumer’s problem is:
10
The Kuhn-Tucker theorem tells us that if we set up the Lagrangian:
L(c1 , c2 , λ) = U (c1 , c2 ) + λ(I − p1 c1 − p2 c2 )
Then the optimal consumptions c∗1 and c∗2 and the associated multiplier λ∗ must satisfy the
FOC:
L1 (c∗1 , c∗2 , λ∗ ) = U1 (c∗1 , c∗2 ) − λ∗ p1 = 0
and
L2 (c∗1 , c∗2 , λ∗ ) = U2 (c∗1 , c∗2 ) − λ∗ p2 = 0
Move the terms with minus signs to the other side, and divide the first of these FOC by
the second to obtain
U1 (c∗1 , c∗2 ) p1
∗ ∗
= ,
U2 (c1 , c2 ) p2
which is just the familiar condition that says that the optimizing consumer should set
the slope of his or her indifference curve, the marginal rate of substitution, equal to
the slope of his or her budget constraint, the ratio of prices.
Now consider I as one of the model’s parameters, and let the functions c∗1 (I), c∗2 (I), and
λ∗ (I) describe how the optimal choices c∗1 and c∗2 and the associated value λ∗ of the
multiplier depend on I.
In addition, define the maximum value function as
V (I) = max U(c1 , c2 ) subject to I ≥ p1 c1 + p2 c2
c1 ,c2
The envelope theorem tells us that we can ignore the dependence of c∗1 and c∗2 on I in
calculating
V 0 (I) = λ∗ (I),
which gives us an interpretation of the multiplier λ∗ as the marginal utility of income.
11
r = rental rate for capital
w = wage rate
Suppose that the firm takes its output y as given, and chooses inputs k and l to minimize
costs. Then the firm solves
where the term involving the multiplier λ is subtracted rather than added in the case of
a minimization problem, the Kuhn-Tucker conditions (1)-(4) continue to apply, exactly
as before.
Thus, according to the Kuhn-Tucker theorem, the optimal choices k∗ and l∗ and the asso-
ciated multiplier λ∗ must satisfy the FOC:
L1 (k ∗ , l∗ , λ∗ ) = r − λ∗ f1 (k∗ , l∗ ) = 0 (9)
and
L2 (k∗ , l∗ , λ∗ ) = w − λ∗ f2 (k∗ , l∗ ) = 0 (10)
Move the terms with minus signs over to the other side, and divide the first FOC by the
second to obtain
f1 (k∗ , l∗ ) r
∗ ∗
= ,
f2 (k , l ) w
which is another familiar condition that says that the optimizing firm chooses factor
inputs so that the marginal rate of substitution between inputs in production equals
the ratio of factor prices.
y = f (k∗ , l∗ ) (11)
Then (9)-(11) represent 3 equations that determine the three unknowns k ∗ , l∗ , and λ∗ as
functions of the model’s parameters r, w, and y. In particular, we can think of the
functions
k∗ = k∗ (r, w, y)
and
l∗ = l∗ (r, w, y)
as demand curves for capital and labor: strictly speaking, they are conditional (on y)
factor demand functions.
12
Now define the minimum cost function as
The envelope theorem tells us that in calculating the derivatives of the cost function, we
can ignore the dependence of k∗ , l∗ , and λ∗ on r, w, and y.
Hence:
C1 (r, w, y) = k∗ (r, w, y),
C2 (r, w, y) = l∗ (r, w, y),
and
C3 (r, w, y) = λ∗ (r, w, y).
The first two of these equations are statements of Shephard’s lemma; they tell us that
the derivatives of the cost function with respect to factor prices coincide with the
conditional factor demand curves. The third equation gives us an interpretation of the
multiplier λ∗ as a measure of the marginal cost of increasing output.
Thus, our two examples illustrate how we can apply the Kuhn-Tucker and envelope theorems
in specific economic problems.
The two examples also show how, in the context of specific economic problems, it is often
possible to attach an economic interpretation to the multiplier λ∗ .
13
The Maximum Principle
Here, we will explore the connections between two popular ways of solving dynamic
optimization problems, that is, problems that involve optimization over time. The first
solution method is just a straightforward application of the Kuhn-Tucker theorem; the second
solution method relies on a result known as the maximum principle.
We’ll being by briefly noting the basic features that set dynamic optimization problems
apart from purely static ones. Then we’ll go on consider the connections between the Kuhn-
Tucker theorem and the maximum principle in both discrete and continuous time.
Reference:
a) We need to index the variables that enter into the problem by t, in order to keep track
of changes in those variables that occur over time.
c) We need to introduce constraints that describe the evolution of stock variables over time:
e.g., larger flows of savings or investment today will lead to larger stocks of wealth or
capital tomorrow.
1
yt = stock variable
zt = flow variable
Objective function:
X
T
β t F (yt , zt ; t)
t=0
Following Dixit, we can allow for a wider range of possibilities by letting the functions as
well as the variables depend on the time index t.
Q(yt , zt ; t) ≥ yt+1 − yt
or
yt + Q(yt , zt ; t) ≥ yt+1
for all t = 0, 1, ..., T
c ≥ G(yt , zt ; t)
y0 given
yT +1 ≥ y ∗
The dynamic optimization problem can now be stated as: choose sequences {zt }Tt=0 and
{yt }Tt=1
+1
to maximize the objective function subject to all of the constraints.
Notes:
a) It is important for the application of the maximum principle that the problem
be additively time separable: that is, the values of F , Q, and G at time t must
depend on the values of yt and zt only at time t.
b) Although the constraints describing the evolution of the stock variable and applying
to the variables within each period can each be written in the form of a single
equation, it must be emphasized that these constraints must hold for all t =
0, 1, ..., T . That is, each of these equations actually describes T + 1 constraints.
2
2.2 The Kuhn-Tucker Formulation
Let’s begin by applying the Kuhn-Tucker Theorem to solve this problem. That is, let’s set
up the Lagrangian and take first-order conditions.
Set up the Lagrangian, recognizing that the constraints must hold for all t = 0, 1, ..., T :
X
T X
T X
T
t
L= β F (yt, zt ; t)+ π t+1 [yt +Q(yt , zt ; t)−yt+1 ]+ λt [c−G(yt , zt ; t)]+φ(yT +1 −y ∗ )
t=0 t=0 t=0
The Kuhn-Tucker theorem tells us that the solution to this problem must satisfy the FOC
for the choice variables zt for t = 0, 1, ..., T and yt for t = 1, 2, ..., T + 1.
or
π t+1 − π t = −[β t Fy (yt , zt ; t) + π t+1 Qy (yt , zt ; t) − λt Gy (yt , zt ; t)] (2)
for all t = 1, 2, ..., T .
FOC for yT +1 :
−π T +1 + φ = 0
Let’s assume that the problem is such that the constraint governing the evolution of the
stock variable always holds with equality, as will typically be the case in economic
applications. Then another condition describing the solution to the problem is
Finally, let’s write down the initial condition for the stock variable and the complementary
slackness condition for the constraint on the terminal value of the stock:
y0 given (4)
φ(yT +1 − y ∗ ) = 0
or, using the FOC for yT +1 :
π T +1 (yT +1 − y ∗ ) = 0 (5)
Notes:
3
a) Together with the complementary slackness condition
λt [c − G(yt , zt ; t)] = 0,
which implies either
λt = 0 or c = G(yt , zt ; t),
we can think of (1)-(3) as forming a system of four equations in four unknowns
yt , zt , πt , λt . This system of equations determines the problem’s solution.
b) Equations (2) and (3), linking the values of yt and π t at adjacent points in time, are
examples of difference equations. They must be solved subject to two boundary
conditions:
The initial condition (4).
The terminal, or transversality, condition (5).
c) The analysis can also be applied to the case of an infinite time horizon, where
T = ∞. In this case, (1) must hold for all t = 0, 1, 2, ..., (2) must hold for all
t = 1, 2, 3, ..., (3) must hold for all t = 0, 1, 2, ..., and (5) becomes a condition on
the limiting behavior of πt and yt :
lim π T +1 (yT +1 − y ∗ ) = 0. (6)
T →∞
4
Now notice the following:
a) The first-order condition for the static optimization problem on the right-hand side
of (7):
β t Fz (yt , zt ; t) + π t+1 Qz (yt , zt ; t) − λt Gz (yt , zt ; t) = 0 (10)
for all t = 0, 1, ..., T .
b) The pair of difference equations:
π t+1 − π t = −Hy (yt , π t+1 ; t) (11)
for all t = 1, 2, ..., T and
yt+1 − yt = Hπ (yt , π t+1 ; t) (12)
for all t = 0, 1, ..., T , where the derivatives of H can be calculated using the
envelope theorem.
5
c) The initial condition
y0 given (4)
d) The terminal, or transversality, condition
π T +1 (yT +1 − y ∗ ) = 0 (5)
According to the maximum principle, there are two ways of solving discrete time dynamic
optimization problems, both of which lead to the same answer:
a) Set up the Lagrangian for the dynamic optimization problem and take first-order
conditions for all t = 0, 1, ..., T .
b) Set up the Hamiltonian for the problem and derive the first-order and envelope con-
ditions (10)-(12) for the static optimization problem that appears in the definition
of the Hamiltonian.
Accordingly, suppose now that the variable t, instead of taking on discrete values t =
0, 1, ..., T , takes on continuous values t ∈ [0, T ], where as before, T can be finite or
infinite.
ρ ≥ 0 = discount rate
Example:
6
β = 0.95
ρ = 0.05
β t for t = 1 is 0.95
e−ρt for t = 1 is 0.951, or approximately 0.95
Consider next the constraint describing the evolution of the stock variable.
In the discrete time case, the interval between time periods is just ∆t = 1.
or
y(t + ∆t) − y(t)
Q(y(t), z(t); t) ≥
∆t
In the limit as the interval ∆t goes to zero, this last expression simplifies to
for all t ∈ [0, T ], where ẏ(t) denotes the derivative of y(t) with respect to t.
The constraint applying to variables at a given point in time remains the same:
c ≥ G(y(t), z(t); t)
Note once again that these constraints must hold for all t ∈ [0, T ]. Thus, each of the two
equations from above actually represents an entire continuum of constraints.
Finally, the initial and terminal constraints for the stock variable remain unchanged:
y(0) given
y(T ) ≥ y ∗
The dynamic optimization problem can now be stated as: choose functions z(t) for t ∈
[0, T ] and y(t) for t ∈ (0, T ] to maximize the objective function subject to all of the
constraints.
7
3.2 The Kuhn-Tucker Formulation
Once again, let’s begin by setting up the Lagrangian and taking first-order conditions:
Z T Z T
−ρt
L = e F (y(t), z(t); t)dt + π(t)[Q(y(t), z(t); t) − ẏ(t)]dt
0 0
Z T
+ λ(t)[c − G(y(t), z(t); t)]dt + φ[y(T ) − y ∗ ]
0
Now we are faced with a problem: y(t) is a choice variable for all t ∈ [0, T ], but ẏ(t) appears
in the Lagrangian.
To solve this problem, use integration by parts:
Z T½ ¾ Z T Z T
d
[π(t)y(t)] dt = π̇(t)y(t)dt + π(t)ẏ(t)dt
0 dt 0 0
Z T Z T
π(T )y(T ) − π(0)y(0) = π̇(t)y(t)dt + π(t)ẏ(t)dt
0 0
Z T Z T
− π(t)ẏ(t)dt = π̇(t)y(t)dt + π(0)y(0) − π(T )y(T )
0 0
Before taking first-order conditions, note that the multipliers π(t) and λ(t) are functions of t
and that the corresponding constraints appear in the form of integrals. These features
of the Lagrangian reflect the fact that the constraints must hold for all t ∈ [0, T ].
FOC for z(t), t ∈ [0, T ]:
e−ρt Fz (y(t), z(t); t) + π(t)Qz (y(t), z(t); t) − λ(t)Gz (y(t), z(t); t) = 0 (13)
for all t ∈ [0, T ]
FOC for y(t), t ∈ (0, T ):
e−ρt Fy (y(t), z(t); t) + π(t)Qy (y(t), z(t); t) + π̇(t) − λ(t)Gy (y(t), z(t); t) = 0
or
π̇(t) = −[e−ρt Fy (y(t), z(t); t) + π(t)Qy (y(t), z(t); t) − λ(t)Gy (y(t), z(t); t)]
for all t ∈ (0, T ).
8
If we require all functions of t to be continuously differentiable, then this last equation will
also hold for t = 0 and t = T , so that we can write
π̇(t) = −[e−ρt Fy (y(t), z(t); t) + π(t)Qy (y(t), z(t); t) − λ(t)Gy (y(t), z(t); t)] (14)
Assume, as before, that the constraint governing ẏ(t) holds with equality:
φ[y(T ) − y ∗ ] = 0
or
π(T )[y(T ) − y ∗ ] = 0 (17)
or in the infinite-horizon case
Notes:
9
3.3 An Alternative Formulation
As before, define the Hamiltonian for this problem as
st c ≥ G(y(t), z(t); t)
As before, the Hamiltonian is a maximum value function. And as before, the maximization
problem of the right-hand side is a static one.
H(y(t), π(t); t) = max e−ρt F (y(t), z(t); t) + π(t)Q(y(t), z(t); t) + λ(t)[c − G(y(t), z(t); t)]
z(t)
Hy (y(t), π(t); t) = e−ρt Fy (y(t), z(t); t)+π(t)Qy (y(t), z(t); t)−λ(t)Gy (y(t), z(t); t) (20)
and
Hπ (y(t), π(t); t) = Q(y(t), z(t); t) (21)
where z(t) solves the optimization problem on the right-hand side of (19) and must
therefore satisfy the FOC:
e−ρt Fz (y(t), z(t); t) + π(t)Qz (y(t), z(t); t) − λ(t)Gz (y(t), z(t); t) = 0. (22)
and
ẏ(t) = Hπ (y(t), π(t); t). (24)
10
for all t ∈ [0, T ],
c ≥ G(y(t), z(t); t)
for all t ∈ [0, T ],
y(0) given,
and
y(T ) ≥ y ∗ .
Associated with this problem, define the Hamiltonian
H(y(t), π(t); t) = max e−ρt F (y(t), z(t); t) + π(t)Q(y(t), z(t); t) (19)
z(t)
a) The first-order condition for the static optimization problem on the right-hand side
of (19):
e−ρt Fz (y(t), z(t); t) + π(t)Qz (y(t), z(t); t) − λ(t)Gz (y(t), z(t); t) = 0 (22)
for all t ∈ [0, T ].
b) The pair of differential equations
π̇(t) = −Hy (y(t), π(t); t) (23)
and
ẏ(t) = Hπ (y(t), π(t); t) (24)
for all t ∈ [0, T ], where the derivatives of H can be calculated using the envelope
theorem.
c) The initial condition
y(0) given. (16)
d) The terminal, or transversality, condition
π(T )[y(T ) − y ∗ ] = 0 (17)
in the case where T < ∞ or
lim π(T )[y(T ) − y ∗ ] = 0. (18)
T →∞
Once again, according to the maximum principle, there are two ways of solving continuous
time dynamic optimization problems, both of which lead to the same answer:
a) Set up the Lagrangian for the dynamic optimization problem and take first-order
conditions for all t ∈ [0, T ].
b) Set up the Hamiltonian for the problem and derive the first-order and envelope con-
ditions (22)-(24) for the static optimization problem that appears in the definition
of the Hamiltonian.
11
4 Two Examples
4.1 Life-Cycle Saving
Consider a consumer who is employed for T + 1 years: t = 0, 1, ..., T .
w = constant annual labor income
kt = stock of assets at the beginning of period t = 0, 1, ..., T + 1
k0 = 0
kt can be negative for t = 1, 2, ..., T , so that the consumer is allowed borrow.
However, kT +1 must satisfy
kT +1 ≥ k∗ > 0
where k∗ denotes saving required for retirement.
r = constant interest rate
total income during period t = w + rkt
ct = consumption
Hence,
kt+1 = kt + w + rkt − ct
or equivalently,
kt + Q(kt , ct ; t) ≥ kt+1
where
Q(kt , ct ; t) = Q(kt , ct ) = w + rkt − ct
for all t = 0, 1, ..., T
Utility function:
X
T
β t ln(ct )
t=0
The consumer’s problem: choose sequences {ct }Tt=0 and {kt }Tt=1
+1
to maximize the utility
function subject to all of the constraints.
For this problem:
kt = stock variable
ct = flow variable
12
FOC for ct :
βt
= πt+1 (25)
ct
Difference equations for π t and kt :
π t+1 − π t = −Hk (kt , π t+1 ; t) = −π t+1 r (26)
and
kt+1 − kt = Hπ (kt , π t+1 ; t) = w + rkt − ct (27)
Equations (25)-(27) represent a system of three equations in the three unknowns ct , π t , and
kt . They must be solved subject to the boundary conditions
k0 = 0 given (28)
and
πT +1 (kT +1 − k∗ ) = 0 (29)
We can use (25)-(29) to deduce some key properties of the solution even without solving
the system completely.
Note first that (25) implies that
βT
π T +1 = > 0.
cT
Hence, it follows from (29) that
kT +1 = k ∗ .
Thus, the consumer saves just enough for retirement and no more.
Next, note that (26) implies
π t+1 − π t = −π t+1 r
(1 + r)π t+1 = π t (30)
Use (25) to obtain
βt β t−1
πt+1 = and π t =
ct ct−1
and substitute these expressions into (30) to obtain
βt β t−1
(1 + r) =
ct ct−1
β 1
(1 + r) =
ct ct−1
ct
= β(1 + r) (31)
ct−1
Equation (31) reveals that the optimal growth rate of consumption is constant, and is faster
for a more patient consumer, with a higher value of β, and a consumer that faces a
higher interest rate r.
13
4.2 Optimal Growth
Consider an economy in which output is produced with capital according to the production
function
F (kt ) = ktα ,
where 0 < α < 1.
ct = consumption
or
kt+1 − kt = ktα − δkt − ct
Our first example had a finite horizon and was cast in discrete time. So for the sake of
variety, make this second example have an infinite horizon in continuous time.
The continuous time analog to the capital accumulation constraint shown above is just
or
Q(k(t), c(t); t) ≥ k̇(t),
where
Q(k(t), c(t); t) = Q(k(t), c(t)) = k(t)α − δk(t) − c(t)
for all t ∈ [0, ∞)
Initial condition:
k(0) given
The problem: choose continuously differentiable functions c(t) and k(t) for t ∈ [0, ∞) to
maximize utility subject to all of the constraints.
14
To solve this problem, set up the Hamiltonian:
and
k̇(t) = Hπ (k(t), π(t); t) = k(t)α − δk(t) − c(t). (34)
Equations (32)-(34) form a system of three equations in the three unknowns c(t), π(t), and
k(t). How can we solve them?
αk(t)α−1 − δ − ρ = 0
or µ ¶1/(α−1)
δ+ρ
k(t) = = k∗
α
15
And since α − 1 < 0, (35) also implies that ċ(t) < 0 when k(t) > k∗ and ċ(t) > 0 when
k(t) < k∗ .
Equation (34) implies that k̇(t) = 0 when
or
c(t) = k(t)α − δk(t).
We can illustrate these conditions graphically using a phase diagram, which reveals that:
and used this definition to derive the optimality conditions (10)-(12) and either (5) or
(6), depending on whether the horizon is finite or infinite.
The Hamiltonian, when defined as above, is often called the present-value Hamiltonian,
because β t F (yt , zt ; t) measures the present value at time t = 0 of the payoff F (yt , zt ; t)
received at time t > 0.
16
c’ = 0 isocline or locus (k = k*)
steady state
(k*,c*)
c*
k’ = 0 isocline or locus
(c = kα – δk)
saddle path or
stable manifold
k* k
For each
F h value
l off k0, there
th iis a unique
i value
l off c0 that
th t
leads the system to converge to the steady state.
The present-value Hamilton stands in contrast to the current-value Hamiltonian, defined
by multiplying both sides of (7) by β −t :
where the last line states the definition of the current-value Hamiltonian H̃(yt , θt+1 ; t)
and where
θt+1 = β −t π t+1 ⇒ π t+1 = β t θt+1
and
μt = β −t λt ⇒ λt = β t μt
Let’s consider rewriting the optimality conditions (10)-(12) and (5) in terms of the current
value Hamiltonian H̃(yt , θt+1 ; t).
To do this, note first that by definition
Hence
Hy (yt , π t+1 ; t) = β t H̃y (yt , θt+1 ; t)
and
∂
Hπ (yt , π t+1 ; t) = [β t H̃(yt , β −t π t+1 ; t)]
∂π t+1
= β t β −t H̃θ (yt , β −t π t+1 ; t)
= H̃θ (yt , θt+1 ; t)
17
(5) can be rewritten
π T +1 (yT +1 − y ∗ ) = 0 (5)
β T θT +1 (yT +1 − y ∗ ) = 0 (50 )
Thus, when the maximum principle in discrete time is stated in terms of the current-value
Hamiltonian instead of the present-value Hamiltonian, (10)-(12) and (5) or (6) are
replaced by (100 )-(120 ) and (50 ) or (60 ).
We can use the same types of transformations in the case of continuous time, where the
present-value Hamiltonian is defined by
st c ≥ G(y(t), z(t); t)
= max e−ρt F (y(t), z(t); t) + π(t)Q(y(t), z(t); t)
z(t)
where the last line defines the current-value Hamiltonian H̃(y(t), θ(t); t) and where
In the case of continuous time, the optimality conditions derived from (19) are (22)-(24)
and either (17) or (18). Let’s rewrite these conditions in terms of the current-value
Hamiltonian H̃(y(t), θ(t); t).
18
Hence
Hy (y(t), π(t); t) = e−ρt H̃y (y(t), θ(t); t)
and
∂
Hπ (y(t), π(t); t) = [e−ρt H̃(y(t), eρt π(t); t)]
∂π(t)
= e−ρt eρt H̃θ (y(t), eρt π(t); t)
= H̃θ (y(t), θ(t); t)
and, finally,
∂ −ρt
π̇(t) = [e θ(t)] = −ρe−ρt θ(t) + e−ρt θ̇(t)
∂t
In light of these results, (22) can be rewritten
e−ρt Fz (y(t), z(t); t) + π(t)Qz (y(t), z(t); t) − λ(t)Gz (y(t), z(t); t) = 0 (22)
Fz (y(t), z(t); t) + eρt π(t)Qz (y(t), z(t); t) − eρt λ(t)Gz (y(t), z(t); t) = 0
Fz (y(t), z(t); t) + θ(t)Qz (y(t), z(t); t) − μ(t)Gz (y(t), z(t); t) = 0 (220 )
Thus, when the maximum principle in continuous time is stated in terms of the current-
value Hamiltonian instead of the present-value Hamiltonian, (22)-(24) and (17) or (18)
are replaced by (220 )-(240 ) and (170 ) or (180 ).
19
Dynamic Programming
We have now studied two ways of solving dynamic optimization problems, one based
on the Kuhn-Tucker theorem and the other based on the maximum principle. These two
methods both lead us to the same sets of optimality conditions; they differ only in terms of
how those optimality conditions are derived.
Here, we will consider a third way of solving dynamic optimization problems: the method
of dynamic programming. We will see, once again, that dynamic programming leads us to the
same set of optimality conditions that the Kuhn-Tucker theorem does; once again, this new
method differs from the others only in terms of how the optimality conditions are derived.
While the maximum principle lends itself equally well to dynamic optimization problems
set in both discrete time and continuous time, dynamic programming is easiest to apply in
discrete time settings. On the other hand, dynamic programing, unlike the Kuhn-Tucker
theorem and the maximum principle, can be used quite easily to solve problems in which
optimal decisions must be made under conditions of uncertainty.
Thus, in our discussion of dynamic programming, we will begin by considering dynamic
programming under certainty; later, we will move on to consider stochastic dynamic pro-
gramming.
Reference:
1
1 > β > 0 discount factor
Q(yt , zt ; t) ≥ yt+1 − yt
or
yt + Q(yt , zt ; t) ≥ yt+1
for all t = 0, 1, 2, ...
c ≥ G(yt , zt ; t)
y0 given
Notes:
Set up the Lagrangian, recognizing that the constraints must hold for all t = 0, 1, 2, ...:
X
∞ X
∞ X
∞
t
L= β F (yt , zt ; t) + μ̃t+1 [yt + Q(yt , zt ; t) − yt+1 ] + λ̃t [c − G(yt , zt ; t)]
t=0 t=0 t=0
2
It will be convenient to define
λt = β −t λ̃t =⇒ λ̃t = β t λt
and to rewrite the Lagrangian in terms of μt+1 and λt instead of μ̃t+1 and λ̃t :
X
∞ X
∞ X
∞
L= β t F (yt , zt ; t) + β t+1 μt+1 [yt + Q(yt , zt ; t) − yt+1 ] + β t λt [c − G(yt , zt ; t)]
t=0 t=0 t=0
Now, let’s suppose that somehow we could solve for μt as a function of the state variable
yt :
μt = W (yt ; t)
μt+1 = W (yt+1 ; t + 1) = W [yt + Q(yt , zt ; t); t + 1]
W (yt ; t) = F1 (yt , zt ; t) + βW [yt + Q(yt , zt ; t); t + 1][1 + Q1 (yt , zt ; t)] − λt G1 (yt , zt ; t) (2)
we can think of (1) and (2) as forming a system of four equations in three unknown
variables yt , zt , and λt and one unknown function W (:, t). This system of equations
determines the problem’s solution.
Note that since (3) is in the form of a difference equation, finding the problem’s solution
involves solving a difference equation.
3
1.3 An Alternative Formulation
Now let’s consider the same problem in a slightly different way.
For any given value of the initial state variable y0 , define the value function
X
∞
v(y0 ; 0) = max β t F (yt , zt ; t)
{zt }∞ ∞
t=0 ,{yt }t=1
t=0
subject to
y0 given
yt + Q(yt , zt ; t) ≥ yt+1 for all t = 0, 1, 2, ...
c ≥ G(yt , zt ; t) for all t = 0, 1, 2, ...
subject to
yt given
yt+j + Q(yt+j , zt+j ; t + j) ≥ yt+j+1 for all j = 0, 1, 2, ...
c ≥ G(yt+j , zt+j ; t + j) for all j = 0, 1, 2, ...
Now consider expanding the definition of the value function by separating out the time t
components:
X
∞
v(yt ; t) = max [F (yt , zt ; t) + max β j F (yt+j , zt+j ; t + j)]
zt ,yt+1 {zt+j }∞ ∞
j=1 ,{yt+j }j=2
j=1
subject to
yt given
yt + Q(yt , zt ; t) ≥ yt+1
yt+j + Q(yt+j , zt+j ; t + j) ≥ yt+j+1 for all j = 1, 2, 3, ...
c ≥ G(yt , zt ; t)
c ≥ G(yt+j , zt+j ; t + j) for all j = 1, 2, 3, ...
4
Next, relabel the time indices:
X
∞
v(yt ; t) = max [F (yt , zt ; t) + β max β j F (yt+1+j , zt+1+j ; t + 1 + j)]
zt ,yt+1 {zt+1+j }∞ ∞
j=0 ,{yt+1+j }j=1
j=0
subject to
yt given
yt + Q(yt , zt ; t) ≥ yt+1
yt+j+1 + Q(yt+1+j , zt+1+j ; t + 1 + j) ≥ yt+1+j+1 for all j = 0, 1, 2, ...
c ≥ G(yt , zt ; t)
c ≥ G(yt+1+j , zt+1+j ; t + 1 + j) for all j = 0, 1, 2, ...
Now notice that together, the components for t + 1 + j, j = 0, 1, 2, ... define v(yt+1 ; t + 1),
enabling us to simplify the statement considerably:
subject to
yt given
yt + Q(yt , zt ; t) ≥ yt+1
c ≥ G(yt , zt ; t)
subject to
yt given
c ≥ G(yt , zt ; t)
Equation (5) is called the Bellman equation for this problem, and lies at the heart of the
dynamic programming approach.
Note that the maximization on the right-hand side of (5) is a static optimization problem,
involving no dynamic elements.
5
And by the envelope theorem:
v 0 (yt ; t) = F1 (yt , zt ; t) + βv 0 [yt + Q(yt , zt ; t); t + 1][1 + Q1 (yt , zt ; t)] − λt G1 (yt , zt ; t) (7)
we can think of (6) and (7) as forming a system of four equations in three unknown
variables yt , zt , and λt and one unknown function v(:, t). This system of equations
determines the problem’s solution.
Note once again that since (3) is in the form of a difference equation, finding the problem’s
solution involves solving a difference equation.
But more important, notice that (6) and (7) are equivalent to (1) and (2) with
Thus, we have two ways of solving this discrete time dynamic optimization problem, both
of which lead us to the same set of optimality conditions:
a) Set up the Lagrangian for the dynamic optimization problem and take first order
conditions for zt , t = 0, 1, 2, ... and yt , t = 1, 2, 3, ....
b) Set up the Bellman equation and take the first order condition for zt and then
derive the envelope condition for yt .
One question remains: How, in practice, can we solve for the unknown value functions
v(:, t)?
6
2 Example 1: Optimal Growth
Here, we will modify the optimal growth example that we solved earlier using the maximum
principle in two ways:
a) We will switch to discrete time in order to facilitate the use of dynamic program-
ming.
b) Set the depreciation rate for capital equal to δ = 1 in order to obtain a very special
case in which an explicit solution for the value function can be found.
Production function:
F (kt ) = ktα
where 0 < α < 1
Initial condition:
k0 given
kt = state variable
ct = control variable
Now guess that the value function takes the time-invariant form
7
Using the guess for v, the Bellman equation becomes
FOC for ct :
1 βF
− α =0 (9)
ct kt − ct
Envelope condition for kt :
F αβF kα−1
= α t (10)
kt kt − ct
Together with the binding constraint
kt+1 = ktα − ct ,
F αβF kα−1
= α t (10)
kt kt − ct
µ ¶
α 1
F kt − F ktα = αβF ktα
1 + βF
µ ¶
1
1− = αβ
1 + βF
Hence
1
= 1 − αβ (12)
1 + βF
Or, equivalently,
1
1 + βF =
1 − αβ
1 αβ
βF = −1=
1 − αβ 1 − αβ
α
F = (13)
1 − αβ
8
Numerical Solutions to the Optimal Growth Model with Complete Depreciation
Generated using equations (14) and (15). Each example sets α = 0.33 and β = 0.99.
0.5 0.2
0.4 0.15
0.3
0.1
0.2
0.1 0.05
0 0
0 5 10 15 20 0 5 10 15 20
Example 2: k(0) = 1
0.8 1.2
1
0.6
0.8
0.4 0.6
0.4
0.2
0.2
0 0
0 5 10 15 20 0 5 10 15 20
In both examples, c(t) converges to its steady state value of 0.388 and k(t) converges to its steady-state value of 0.188.
Substitute (12) into (11) to obtain
ct = (1 − αβ)ktα (14)
which shows that it is optimal to consume the fixed fraction 1 − αβ of output.
Evolution of capital:
kt+1 = ktα − ct = ktα − (1 − αβ)ktα = αβktα (15)
which is in the form of a difference equation for kt .
Equations (14) and (15) show how the optimal values of ct and kt+1 depend on the state
variable kt and the parameters α and β. Given a value for k0 , these two equations can
be used to construct the optimal sequences {ct }∞ ∞
t=0 and {kt }t=1 .
For the sake of completeness, substitute (14) and (15) back into (8) to solve for E:
At = beginning-of-period assets
ct = consumption
saving = st = At + yt − ct
9
Evolution of assets:
At+1 = (1 + r)st = (1 + r)(At + yt − ct )
Note: µ ¶
1
At + yt − ct = At+1
1+r
µ ¶
1
At = At+1 + ct − yt
1+r
Similarly, µ ¶
1
At+1 = At+2 + ct+1 − yt+1
1+r
Combining these last two equalities yields
µ ¶2 µ ¶
1 1
At = At+2 + (ct+1 − yt+1 ) + (ct − yt )
1+r 1+r
Equation (16) takes the form of an infinite horizon budget constraint, indicating that over
the infinite horizon beginning at any period t, the consumer’s sources of funds include
assets At and the present value of current and future labor income, while the consumer’s
use of funds is summarized by the present value of current and future consumption.
The consumer’s problem: choose the sequences {st }∞ ∞
t=0 and {At }t=1 to maximize the utility
function
X∞ X
∞
β t u(ct ) = β t u(At + yt − st )
t=0 t=0
subject to the constraints
A0 given
and
(1 + r)st ≥ At+1
for all t = 0, 1, 2, ...
10
To solve the problem via dynamic programming, note first that
At = state variable
st = control
FOC for st :
−u0 (At + yt − st ) + β(1 + r)v0 [(1 + r)st ; t + 1] = 0
and
v0 (At ; t) = u0 (ct ) (18)
v 0 (At+1 ; t + 1) = u0 (ct+1 )
u0 (ct ) = u0 (ct+1 )
or
ct = ct+1
And since this last equation must hold for all t = 0, 1, 2, ..., it implies
11
Now, return to (16):
X∞ µ ¶j X∞ µ ¶j
1 1
At + yt+j = ct+j . (16)
j=0
1 + r j=0
1 + r
X∞ µ ¶j X∞
1
At + yt+j = ct βj (20)
j=0
1 + r j=0
= (1 + β + β 2 + ...) − (β + β 2 + β 3 + ...)
= 1
or " #
X∞ µ ¶j
1
ct = (1 − β) At + yt+j (21)
j=0
1 + r
Equation (21) indicates that it is optimal to consume a fixed fraction 1−β of wealth at each
date t, where wealth consists of value of current asset holdings and the present dis-
counted value of future labor income. Thus, (21) describes a version of the permanent
income hypothesis.
yt = state variable
zt = control variable
12
εt+1 = random shock, which is observed at the beginning of t + 1
εt is known ...
... but εt+1 is still viewed as random.
The shock εt+1 may be serially correlated, but will be assumed to have the Markov property
(i.e., to be generated by a Markov process): the distribution of εt+1 depends on εt , but
not on εt−1 , εt−2 , εt−3 , ....
Now, the full state of the economy at the beginning of each period is described jointly by
the pair of values for yt and εt , since the value for εt is relevant for forecasting, that is,
forming expectations of, future values of εt+j , j = 1, 2, 3, ....
Objective function:
X
∞
E0 β t F (yt , zt , εt )
t=0
E0 = expected value as of t = 0
Thus, the value of yt+1 does not become known until εt+1 is observed at the beginning of
t + 1 for all t = 0, 1, 2, ...
y0 given
Notes:
13
First, we have added the shock εt to the objective function for period t and the
shock εt+1 to the constraint linking periods t and t + 1.
And second, we have assumed that the planner cares about the expected value
of the objective function.
b) For simplicity, the functions F and Q are now assumed to be time-invariant,
although now they depend on the shock as well as on the state and control variable.
c) For simplicity, we have also dropped the second set of constraints, c ≥ G(yt , zt ).
Adding them back is straightforward, but complicates the algebra.
d) In the presence of uncertainty, the constraint
must hold, not only for all t = 0, 1, 2, ..., but for all possible realizations of εt+1
as well. Thus, this single equation can actually represent a very large number of
constraints.
e) The Kuhn-Tucker theorem can still be used to solve problems that feature uncer-
tainty. But because problems with uncertainty can have a very large number of
constraints, the Kuhn-Tucker theorem can become very difficult to apply in prac-
tice, since one may have to introduce a very large number of Lagrange multipliers.
Dynamic programming, therefore, can be an easier and more convenient way to
solve dynamic stochastic optimization problems.
subject to
y0 and ε0 given
yt + Q(yt , zt , εt+1 ) ≥ yt+1 for all t = 0, 1, 2, ... and all εt+1
More generally, for any period t and any values of yt and εt , define
X
∞
v(yt , εt ) = max Et β j F (yt+j , zt+j , εt+j )
{zt+j }j=0 ,{yt+j }∞
∞
j=1
j=0
subject to
yt and εt given
yt+j + Q(yt+j , zt+j , εt+j+1 ) ≥ yt+j+1 for all j = 0, 1, 2, ... and all εt+j+1
Note once again that the value function is a maximum value function.
14
Now separate out the time t components:
X
∞
v(yt , εt ) = max [F (yt , zt , εt ) + max Et β j F (yt+j , zt+j , εt+j )]
zt ,yt+1 {zt+j }∞ ∞
j=1 ,{yt+j }j=2
j=1
subject to
yt and εt given
yt + Q(yt , zt , εt+1 ) ≥ yt+1 for all εt+1
yt+j + Q(yt+j , zt+j , εt+j+1 ) ≥ yt+j+1 for all j = 1, 2, 3, ... and all εt+j+1
subject to
yt and εt given
yt + Q(yt , zt , εt+1 ) ≥ yt+1 for all εt+1
yt+j+1 + Q(yt+1+j , zt+1+j , εt+1+j+1 ) ≥ yt+1+j+1 for all j = 0, 1, 2, ... and all εt+1+j+1
FACT (Law of Iterated Expectations): For any random variable Xt+j , realized at time t+j,
j = 0, 1, 2, ...:
Et Et+1 Xt+j = Et Xt+j .
To see why this fact holds true, consider the following example:
Hence
εt+2 = ρεt+1 + η t+2 , with Et+1 ηt+2 = 0
or
εt+2 = ρ2 εt + ρη t+1 + ηt+2 .
It follows that
and therefore
Et Et+1 εt+2 = Et (ρ2 εt + ρηt+1 ) = ρ2 εt .
It also follows that
Et εt+2 = Et (ρ2 εt + ρηt+1 + η t+2 ) = ρ2 εt .
15
So that in this case as in general
subject to
yt and εt given
yt + Q(yt , zt , εt+1 ) ≥ yt+1 for all εt+1
yt+j+1 + Q(yt+1+j , zt+1+j , εt+1+j+1 ) ≥ yt+1+j+1 for all j = 0, 1, 2, ... and all εt+1+j+1
subject to
yt and εt given
yt + Q(yt , zt , εt+1 ) ≥ yt+1 for all εt+1
Note that the maximization on the right-hand side of (22) is a static optimization problem,
involving no dynamic elements.
Note also that by substituting the constraints into the value function, we are left with
an unconstrained problem. Unlike the Kuhn-Tucker approach, which requires many
constraints and many multipliers, dynamic programming in this case has no constraints
and no multipliers.
The FOC for zt is
F2 (yt , zt , εt ) + βEt {v1 [yt + Q(yt , zt , εt+1 ), εt+1 ]Q2 (yt , zt , εt+1 )} = 0 (23)
16
The envelope condition for yt is:
v1 (yt , εt ) = F1 (yt , zt , εt ) + βEt {v1 [yt + Q(yt , zt , εt+1 ), εt+1 ][1 + Q1 (yt , zt , εt+1 )]} (24)
Equations (23)-(24) coincide exactly with the first-order conditions for zt and yt that we
would have derived through a direct application of the Kuhn-Tucker theorem to the
original, dynamic stochastic optimization problem.
Together with the binding constraint
we can think of (23) and (24) as forming a system of three equations in two unknown
variables yt and zt and one unknown function v. This system of equations determines
the problem’s solution, given the behavior of the exogenous shocks εt .
Note that (25) is in the form of a difference equation; once again, solving a dynamic
optimization problem involves solving a difference equation.
a) Introducing n ≥ 1 assets
b) Allowing returns on each asset to be random
As in example 2, we will not be able to solve explicitly for the value function, but we will
be able to learn enough about its properties to derive some useful economic results.
Since we are extending the example in two ways, assume for simplicity that the consumer
receives no labor income, and therefore must finance all of his or her consumption by
investing.
At = beginning-of-period financial wealth
ct = consumption
sit = savings allocated to asset i = 1, 2, ..., n
Hence,
X
n
At = ct + sit
i=1
17
... but Rit+1 is still viewed as random.
Hence
X
n
At+1 = Rit+1 sit
i=1
does not become known until the beginning of t+1, even though the sit must be chosen
during t.
Utility:
X
∞ X
∞ X
n
t t
E0 β u(ct ) = E0 β u(At − sit )
t=0 t=0 i=1
The problem can now be stated as: choose contingency plans for sit for all i = 1, 2, ..., n
and t = 0, 1, 2, ... and At for all t = 0, 1, 2, ... to maximize
X
∞ X
n
t
E0 β u(At − sit )
t=0 i=1
subject to
A0 given
and
X
n
Rit+1 sit ≥ At+1
i=1
for all t = 0, 1, 2, ... and all possible realizations of Rit+1 for each i = 1, 2, ..., n.
As in the general case, the returns can be serially correlated, but must have the Markov
property.
At = state variable
sit , i = 1, 2, ...n = control variables
Rt = [R1t , R2t , ...Rnt ] = vector of random returns
FOC:
X
n Xn
0
−u (At − sit ) + βEt Rit+1 v1 ( Rit+1 sit , Rt+1 ) = 0
i=1 i=1
18
Envelope condition:
X
n
0
v1 (At , Rt ) = u (At − sit )
i=1
Use the constraints to rewrite the FOC and envelope conditions more simply as
Since the envelope condition must hold for all t = 0, 1, 2, ..., it implies
Equation (26) generalizes (19) to the case where there is more than one asset and where
the asset returns are random. It must hold for all assets i = 1, 2, ..., n, even though
each asset may pay a different return ex-post.
Keeping in mind that (27) must hold for all assets, suppose that there is a risk-free asset,
f f
with return Rt+1 that is known during period t. Then Rt+1 must satisfy
f
1 = Rt+1 Et mt+1
or
1
Et mt+1 = f
(28)
Rt+1
19
FACT: For any two random variables x and y,
cov(x, y) = E[(x − μx )(y − μy )], where μx = E(x) and μy = E(y).
Hence,
cov(x, y) = E[xy − μx y − xμy + μx μy ]
= E(xy) − μx μy − μx μy + μx μy
= E(xy) − μx μy
= E(xy) − E(x)E(y)
Or, by rearranging,
E(xy) = E(x)E(y) + cov(x, y)
Equation (29) indicates that the expected return on asset i exceeds the risk-free rate only
if Rit+1 is negatively correlated with mt+1 .
Does this make sense?
Consider that an asset that acts like insurance pays a high return Rit+1 during bad
economic times, when consumption ct+1 is low. Therefore, for this asset:
covt (Rit+1 , ct+1 ) < 0 ⇒ covt [Rit+1 , u0 (ct+1 )] > 0
f
⇒ covt (Rit+1 , mt+1 ) > 0 ⇒ Et Rit+1 < Rt+1 .
This implication seems reasonable: assets that work like insurance often have
expected returns below the risk-free return.
Consider that common stocks tend to pay a high return Rit+1 during good economic
times, when consumption ct+1 is high. Therefore, for stocks:
covt (Rit+1 , ct+1 ) > 0 ⇒ covt [Rit+1 , u0 (ct+1 )] < 0
f
⇒ covt (Rit+1 , mt+1 ) < 0 ⇒ Et Rit+1 > Rt+1 .
This implication also seems to hold true: historically, stocks have had expected
returns above the risk-free return.
Recalling once more that (29) must hold for all assets, consider in particular the asset whose
return happens to coincide exactly with the representative consumer’s intertemporal
marginal rate of substitution:
m
Rt+1 = mt+1 .
20
For this asset, equation (29) implies
m f f m
Et Rt+1 − Rt+1 = −Rt+1 covt (Rt+1 , mt+1 )
f f f
Et mt+1 − Rt+1 = −Rt+1 covt (mt+1 , mt+1 ) = −Rt+1 vart (mt+1 )
or
f
f Et mt+1 − Rt+1
−Rt+1 = (30)
vart (mt+1 )
21
Eigenvalues and Eigenvectors
Reference:
Since A is singular if and only if its determinant is zero, we can calculate the eigenvalues
of A by solving the characteristic equation
where I is the identity matrix and det(A − rI) is the characteristic polynomial of A.
1
so that the characteristic polynomial is a second order polynomial. Thus, the charac-
teristic equation (1) can be solved using the quadratic formula:
a11 + a22 ± [(a11 + a22 )2 − 4(a11 a22 − a12 a21 )]1/2
r= .
2
This example reveals that a 2 × 2 matrix has two eigenvalues. More generally, an n × n
matrix has n eigenvalues.
Recall, next, that a square matrix B is singular if and only if there exists a nonzero vector
x such that Bx = 0.
This fact tells us that if r is an eigenvalue of A, so that A − rI is singular, then there exists
a nonzero vector v such that
(A − rI)v = 0. (2)
This vector v is called an eigenvector of A corresponding to the eigenvalue r. Note that (2)
is equivalent to
Av = rv, (3)
so that an eigenvector must also satisfy (3).
Example: Consider the 2 × 2 matrix
∙ ¸
−1 3
A= .
2 0
The characteristic equation is
∙ ¸
−1 − r 3
0 = det = r(1 + r) − 6 = r2 + r − 6 = (r + 3)(r − 2),
2 −r
so that the eigenvalues of A are r1 = −3 and r2 = 2.
The eigenvector v1 corresponding to the eigenvalue r1 = −3 must satisfy
∙ ¸∙ ¸ ∙ ¸
−1 + 3 3 v11 0
=
2 3 v12 0
2v11 + 3v12 = 0
v12 = −(2/3)v11
2
In fact, we might have seen this earlier by examining equation (2), defining an eigenvector:
(A − rI)v = 0. (2)
Clearly, if a vector v1 satisfies (2), then so does the vector αv1 for any α 6= 0.
as is ∙ ¸
1
.
1
Let A be an n × n matrix, and consider the problem of finding a nonsingular matrix P such
that
P −1 AP = D, (4)
where D is a diagonal matrix.
To solve this problem, calculate the n eigenvalues of A, r1 , r2 , ..., rn , along with the cor-
responding eigenvectors v1 , v2 , ..., vn . Then form the matrix P using the eigenvectors
as its columns: £ ¤
P = v1 v2 ... vn .
And form the matrix D by placing the eigenvalues on the diagonal and zeros everywhere
else: ⎡ ⎤
r1 0 ... 0
⎢ 0 r2 ... 0 ⎥
D=⎢ ⎥
⎣ ... ... ... ... ⎦ .
0 0 ... rn
3
Since the ri ’s are eigenvalues and the vi ’s are eigenvectors, the definition (3) tells us that
(5) must hold.
Note: An n × n matrix that does not have n linearly independent eigenvectors is called
nondiagonalizable or defective.
Let’s check that (4) holds for the matrix A that we considered in one of our earlier examples:
∙ ¸
−1 3
A= .
2 0
For this choice of A, we’ve already found the eigenvalues r1 = −3 and r2 = 2, as well as
the corresponding eigenvectors
∙ ¸ ∙ ¸
3 1
v1 = and v2 = .
−2 1
4
Now calculate
∙ ¸∙ ¸∙ ¸
−1 1 1 −1 −1 3 3 1
P AP =
5 2 3 2 0 −2 1
∙ ¸∙ ¸
1 −3 3 3 1
=
5 4 6 −2 1
∙ ¸
1 −15 0
=
5 0 10
∙ ¸
−3 0
=
0 2
∙ ¸
r1 0
=
0 r2
= D
exactly as required by (4).
3 Complex Eigenvalues
All of these results carry over to the case where the square matrix A has complex eigenvalues,
as the following example illustrates.
Let ∙ ¸
1 −2
A= .
2 1
For this choice of A, the characteristic polynomial is
∙ ¸
1 − r −2
det = (1 − r)2 + 4 = 1 − 2r + r2 + 4 = r2 − 2r + 5.
2 1−r
5
Together r1 and v1 must satisfy
(A − r1 I)v1 = 0
½∙ ¸ ∙ ¸¾ ∙ ¸ ∙ ¸
1 −2 1 + 2i 0 α11 + β 11 i 0
− =
2 1 0 1 + 2i α12 + β 12 i 0
∙ ¸∙ ¸ ∙ ¸
−2i −2 α11 + β 11 i 0
= .
2 −2i α12 + β 12 i 0
Hence, in particular,
−2i(α11 + β 11 i) − 2(α12 + β 12 i) = 0
−2iα11 + 2β 11 − 2α12 − 2β 12 i = 0 + 0i,
which requires that
2(β 11 − α12 ) = 0 or α12 = β 11
and
−2(α11 + β 12 )i = 0i or β 12 = −α11 .
Evidently, v1 takes the more specific form
∙ ¸
α11 + β 11 i
v1 = .
β 11 − α11 i
Now let’s find the eigenvector v2 corresponding to the eigenvalue r2 . This second eigenvector
will be of the general form ∙ ¸
α21 + β 21 i
v2 = .
α22 + β 22 i
Together r2 and v2 must satisfy
(A − r2 I)v2 = 0
½∙ ¸ ∙ ¸¾ ∙ ¸ ∙ ¸
1 −2 1 − 2i 0 α21 + β 21 i 0
− =
2 1 0 1 − 2i α22 + β 22 i 0
∙ ¸∙ ¸ ∙ ¸
2i −2 α21 + β 21 i 0
= .
2 2i α22 + β 22 i 0
Hence, in particular,
2i(α21 + β 21 i) − 2(α22 + β 22 i) = 0
2iα21 − 2β 21 − 2α22 − 2β 22 i = 0 + 0i,
which requires that
−2(β 21 + α22 ) = 0 or α22 = −β 21
and
2(α21 − β 22 )i = 0i or β 22 = α21 .
6
Evidently, v2 takes the more specific form
∙ ¸
α21 + β 21 i
v2 = .
−β 21 + α21 i
Based on these results, form the matrix P using v1 and v2 as its columns:
∙ ¸
1+i 1+i
P = .
1 − i −1 + i
7
Differential Equations
Regardless of whether we use the Kuhn-Tucker theorem or the maximum principle to solve
a dynamic optimization problem in continuous time, we must ultimately solve a system of
differential equations. Thus, we will now consider differential equations and their solutions
in more detail. We will begin by considering the solution to a single differential equation
and then go on to consider the solution to systems of multiple differential equations.
1.1 Introduction
Let’s start by considering an example: a bank account with continuous compounding of
interest.
1
Thus, in general, (1) has many solutions.
Why isn’t the solution determined uniquely in this example? To see why, just note
that in order to know the amount of funds in a bank account, one needs to know
more than just the rate of interest; one also needs to know the amount that was
initially deposited in the account.
Let y0 denote the amount initially deposited. Then y(t) must satisfy both (1) and
the initial condition
y(0) = y0 . (2)
This initial condition determines a specific value for k:
y(t) = kert =⇒ y(0) = k =⇒ k = y0 .
Hence, the unique function that satisfies both (1) and (2) is
y(t) = y0 ert .
Notes:
a) Obviously, knowing the value of y(t) at any date t would allow one to find the
exact value of k. But problems like finding a function y(t) that satisfies both (1)
and the initial condition (2) are so common that they have a special name: they
are called initial value problems.
b) The many solutions to (1) and the particular solution to (1) and (2) are all functions
y(t). This fact makes solving a differential equation more complicated than solving
a simple algebraic equation that has as its solution an unknown variable.
a) The definition describes first order differential equations that involve only the
first derivative of the unknown function. More generally, an ith order differential
equation involves derivatives up to and including the ith derivative of y(t).
b) The definition describes ordinary differential equations that relate the value of a
function of a single variable to the value of its derivative. Differential equations
that link the value of a function of several variables to the values of its partial
derivatives are called partial differential equations.
2
c) Again, it is important to emphasize that solving a differential equation involves
finding an unknown function y(t).
d) Sometimes the solution to a differential equation will be a constant function of the
form y(t) = c, where c is a constant. These constant solutions are called steady
states, stationary solutions, stationary points, or equilibria. Since ẏ(t) = 0 when
y(t) = c, the steady states of a differential equation such as
ẏ(t) = F [y(t)]
0 = F (c).
e) As in our first example, the full set of solutions y(t) can often be indexed by a
parameter k, and written y(t, k). If every solution to the differential equation can
be achieved by letting k take on different values, then y(t, k) is called a general
solution of the differential equation.
ẏ(t) = ay(t).
y(t) = keat ,
ẏ(t) = ay(t) + b.
These two examples are among the few for which explicit solutions are available. Since these
examples are both ones in which y(t) enters the equation linearly, they are examples
of linear differential equations.
3
Fortunately, there is a general result on the existence and uniqueness of solutions to initial
value problems that is very easy to apply.
Theorem (Fundamental Theorem of Differential Equations) Consider the initial
value problem described by the differential equation
ẏ(t) = F [y(t), t]
and the initial condition
y(0) = y0 .
If F is continuous at (y0 , 0), then there exists a continuously differentiable function
y(t) defined on an interval I = (−a, a) for some a > 0 such that y(0) = y0 and
ẏ(t) = F [y(t), t] for all t ∈ I. That is, the function y(t) solves the problem on I.
Moreover, if F is continuously differentiable at (y0 , 0), then the solution y(t) is unique.
In some cases, the number a > 0 referred to in the theorem can be arbitrarily large. For
example, consider the initial value problem:
ẏ(t) = ry(t) and y(0) = y0 .
We know that the solution is
y(t) = y0 ert
and this solution clearly applies for all t ∈ (−∞, ∞).
But in other cases, a > 0 must be finite. For example, consider the initial value problem
1
ẏ(t) = and y(0) = 0.
t2 −1
A solution to this problem is given by
µ ¶
1 1−t
y(t) = ln
2 1+t
since
1
y(0) = ln(1) = 0
2
and
µ ¶∙ ¸
1 1+t (1 + t)(−1) − (1 − t)
ẏ(t) =
2 1−t (1 + t)2
µ ¶µ ¶
1 1 −2
=
2 1−t 1+t
∙ ¸
1
= −
(1 − t)(1 + t)
µ ¶
1
= −
1 − t2
1
= 2
.
t −1
4
But the solution µ ¶
1 1−t
y(t) = ln
2 1+t
is only defined for t ∈ (−1, 1), so in the theorem, a must be less than or equal to 1.
Notes:
Now, let’s get a better feel for how phase diagrams work by considering some simpler
examples. In the earlier, optimal growth example, we had a system of two differential
equations, so that the phase diagram had to have two dimensions. In simpler examples,
with only a single differential equation, the phase diagram has just one dimension.
0 = c(2 − c).
5
(3) also implies that ẏ(t) > 0 whenever
y(t) < 0 and 2 < y(t), but these two conditions cannot hold simultaneously.
Thus, we know that (3) requires
ẏ(t) = 0 whenever y(t) = 0 or y(t) = 2
ẏ(t) > 0 whenever 2 > y(t) > 0
ẏ(t) < 0 whenever y(t) > 2 or 0 > y(t)
These results can be illustrated graphically using a one-dimensional phase diagram.
The phase diagram reveals that:
If y(0) = y0 < 0, then y(t) decreases forever.
If y(0) = y0 = 0, then y(t) remains at 0 forever.
If y(0) = y0 > 0, then y(t) converges to 2.
In this example, the stationary solution y(t) = 2 is asymptotically stable, since for all
values of y0 close to 2, the solution converges to the steady state.
On the other hand, the stationary solution y(t) = 0 is unstable, since y(t) moves away
from y0 6= 0, even when y0 is arbitrarily close to 0.
6
Example 1: y’(t) = y(t)[2-y(t)]
0 2 y
ONE-DIMENSIONAL
PHASE DIAGRAMS
-1 0 1
2 Systems of Differential Equations
Reference:
2.1 Introduction
Typically, the process of solving a dynamic optimization problem in continuous time involves
solving a system of differential equations. The statement of the maximum principle,
for example, involves two differential equations: one describing the evolution of the
stock variable and the other describing the behavior of a Lagrange multiplier. So we
need to move on now to consider the solution of systems of differential equations.
A general system of two differential equations can be written as
A solution to this system is a pair of functions x(t) and y(t) that satisfy (4) for all t.
Notes:
a) The system (4) is a first order system because it involves only the first derivatives
of the unknown functions. The system is autonomous if the functions F and G
do not specifically involve t; otherwise, the system is nonautonomous.
b) A solution to (4) will typically involve two parameters, k1 and k2 , such that by
varying the values of these parameters, one can obtain every solution to (4). In
this case, the solution written as x(t, k1 , k2 ) and y(t, k1 , k2 ) is the general solution
to (4).
c) The general solution to a system of n first order differential equations will contain
n parameters k1 , k2 , ..., kn .
d) The problem of finding a particular solution to (4) that also satisfies the initial
conditions x(0) = x0 and y(0) = y0 is called an initial value problem.
e) In our optimal growth example, we had a system of two differential equations and
we were trying to find the two unknown functions, y(t) and π(t), that solved that
system of equations. There, we had only one initial condition, y(0) given. But
the terminal, or transversality condition, gave us the other boundary condition
that we need to identify the particular solution.
Fact 1) Every second order differential equation can be written as a system of two
first order equations as follows. Starting from the second order equation
7
define the new function
v(t) = ẏ(t).
Then
v̇(t) = ÿ(t),
so that (5) is equivalent to
ẏ(t) = v(t)
v̇(t) = F [v(t), y(t), t]
Thus, it is without loss of generality that we’ve focused mainly on first order
differential equations.
Fact 2) Every nonautonomous differential equation can be written as a system of two
autonomous equations as follows. Starting from the nonautonomous equation
v̇(t) = 1
ẏ(t) = F [y(t), v(t)]
Thus, it is also without loss of generality that we’ve focused mainly on autonomous
differential equations.
Fact 3) The existence and uniqueness results for scalar differential equations also hold
for systems of differential equations.
8
where ⎡ ⎤
x1 (t)
⎢ x2 (t) ⎥
x(t) = ⎢⎣ ...
⎥,
⎦
xn (t)
⎡ ⎤
ẋ1 (t)
⎢ ẋ2 (t) ⎥
ẋ(t) = ⎢
⎣ ...
⎥,
⎦
ẋn (t)
and ⎡ ⎤
a11 a12 ... a1n
⎢ a21 a22 ... a2n ⎥
A=⎢
⎣ ...
⎥.
... ... ... ⎦
an1 an2 ... ann
In the simplest case, A is diagonal, so that (7) is just a system of n self-contained equations,
each of which takes the form
ẋi (t) = aii xi (t).
xi (t) = ki eaii t
for i = 1, 2, ..., n.
But even in the more general case where A is not diagonal, we can often solve the linear
system (7) almost as easily by drawing on our results having to do with eigenvalues,
eigenvectors, and diagonalizable matrices.
Begin by calculating the eigenvalues r1 , r2 , ..., rn of A, together with the associated eigen-
vectors v1 , v2 , ..., vn . Then form the matrix
£ ¤
P = v1 v2 ... vn
with the eigenvalues along the diagonal and zeros everywhere else. If the eigenvectors
are linearly independent, then we know from before that
P −1 AP = D.
9
Next, define a new vector of functions
z(t) = P −1 x(t),
so that
x(t) = P z(t),
ż(t) = P −1 ẋ(t),
and
ẋ(t) = P ż(t).
And since the matrix D is diagonal, (8) is just a system of n self-contained equations, each
of which takes the form
żi (t) = ri zi (t)
and therefore has the familiar solution
zi (t) = ki eri t .
Finally, with these solutions for the zi (t) in hand, undo the transformation to find the
solutions for the xi (t):
x(t) = P z(t)
⎡ ⎤
z1 (t)
£ ¤ ⎢ z2 (t) ⎥
= v1 v2 ... vn ⎢⎣ ...
⎥
⎦
zn (t)
⎡ ⎤
k1 er1 t
£ ¤ ⎢ k2 er2 t ⎥
= v1 v2 ... vn ⎢⎣ ...
⎥
⎦
rn t
kn e
= k1 er1 t v1 + k2 er2 t v2 + ... + kn ern t vn
10
Suppose now that the eigenvalues r1 , r2 , ..., rn are real numbers. If all of the ri are negative,
then
lim eri t = 0
t→∞
for all i = 1, 2, ..., n, which implies that for any choice of k1 , k2 , ..., kn corresponding
to any set of initial conditions x1 (0), x2 (0), ..., xn (0),
Thus, if all of the eigenvalues of A are real and negative, then the stationary solution
x(t) = 0 of (7) is asymptotically stable.
On the other hand, even if just one of the eigenvalues ri is positive, so that
lim eri t = ∞,
t→∞
so that for any choices of k1 and k2 corresponding to any initial conditions x(0) =
x0 and y(0) = y0 ,
lim x(t) = 0 and lim y(t) = 0.
t→∞ t→∞
But let’s draw the phase diagram anyway, to make sure that it illustrates these
results.
Equation (9a) implies
ẋ(t) = 0 whenever x(t) = 0
ẋ(t) > 0 whenever x(t) < 0
ẋ(t) < 0 whenever x(t) > 0
Equation (9b) implies
ẏ(t) = 0 whenever y(t) = 0
ẏ(t) > 0 whenever y(t) < 0
11
ẏ(t) < 0 whenever y(t) > 0
These conditions can be illustrated using a two-dimensional phase diagram, which
shows that the stationary solution with x(t) = 0 and y(t) = 0 is asymptotically
stable.
This system is nonlinear, and is difficult to solve explicitly. But we can still charac-
terize the solution using a phase diagram.
Begin by considering stationary solutions of the form x(t) = x and y(t) = y. Equations
(10a) and (10b) imply that these solutions can be found by solving the equations
0 = y − x2
0 = −y.
Clearly, the only stationary solution has x(t) = 0 and y(t) = 0.
Next, note that (10a) implies
ẋ(t) = 0 whenever y(t) = x(t)2
ẋ(t) > 0 whenever y(t) > x(t)2
ẋ(t) < 0 whenever y(t) < x(t)2
Equation (10b) implies
ẏ(t) = 0 whenever y(t) = 0
ẏ(t) > 0 whenever y(t) < 0
ẏ(t) < 0 whenever y(t) > 0
The phase diagram reveals that the stationary solution with x(t) = 0 and y(t) = 0
is unstable: although there are some initial conditions, such as those labelled
as points A, B, and C, from which the system converges, there are other initial
conditions, such as those labelled as D and E, which are close to the steady state
but which to not lead to convergence.
3 A Linearized System
Before finishing with our discussion of differential equations, let’s return once again to the
system of differential equations that we derived when using the maximum principle to
solve the optimal growth example in continuous time.
12
Example 1: y
x’=0 (x=0)
x’(t) = -x(t), y’(t) = -y(t)
B
A
y’=0 (y=0)
D C
TWO-DIMENSIONAL
PHASE DIAGRAM
Example 2: y
x’(t) = y(t) - x(t)2
y’(t) = -y(t)
x’=0 (y=x2)
A B
y’=0 (y=0)
E
TWO-DIMENSIONAL
PHASE DIAGRAM
That system consisted of two nonlinear difference equations: one for the capital stock,
ċ(t) = c(t)[αk(t)α−1 − δ − ρ]
Earlier, we used a phase diagram to characterize the solution to this system. The diagram
showed us that for each possible value of k(0), there exists a unique value of c(0) such
that the system converges to a steady state, with
µ ¶1/(α−1)
∗ δ+ρ
lim k(t) = k =
t→∞ α
and
lim c(t) = c∗ = k∗α − δk∗
t→∞
There is an alternative way of analyzing this system that relies on algebra rather than
geometry.
This alternative method involves taking a first order Taylor approximation around the
steady state (k∗ , c∗ ) to the expressions on the right hand side of each of the two equa-
tions, thereby approximating the nonlinear system by a linear system for which an
explicit solution exists.
since µ ¶
∗α−1 δ+ρ
αk −δ =α − δ = ρ.
α
where
θ = α(α − 1)c∗ k ∗α−2 < 0.
13
Now define the new variables
x(t) = k(t) − k∗
and
y(t) = c(t) − c∗ ,
so that x(t) is the deviation of k(t) from its steady state level and y(t) is the deviation
of c(t) from its steady state level. Note that these definitions imply that
ẋ(t) = k̇(t)
and
ẏ(t) = ċ(t).
If we let ∙ ¸
x(t)
z(t) = ,
y(t)
then these two equations can be written in matrix form as
∙ ¸ ∙ ¸∙ ¸
ẋ(t) ρ −1 x(t)
ż(t) = = = Az(t).
ẏ(t) θ 0 y(t)
We know that this system of linear differential equations has the general solution
where r1 and r2 are the eigenvalues of A and v1 and v2 are the corresponding eigenvec-
tors.
0 = det(A − rI)
∙ ¸
ρ − r −1
= det
θ 0−r
= r2 − ρr + θ.
14
The quadratic formula then implies that the eigenvalues are
ρ − [ρ2 − 4θ]1/2
r1 =
2
and
ρ + [ρ2 − 4θ]1/2
r2 =
2
Since ρ > 0 and θ < 0,
ρ2 − 4θ > ρ2
so that
ρ − [ρ2 − 4θ]1/2 ρ − (ρ2 )1/2
r1 = < =0
2 2
while
ρ + [ρ2 − 4θ]1/2 ρ − (ρ2 )1/2
r2 = > = 0.
2 2
We now know that the general solution takes the form
where r1 < 0 and r2 > 0. Thus, the requirement that the system converge to the
steady state, so that
∙ ¸ ∙ ¸
x(t) k(t) − k ∗
lim z(t) = = = 0,
t→∞ y(t) c(t) − c∗
Now consider ∙ ¸
k(0) − k∗
z(0) = = q1 v1 .
c(0) − c∗
This equation shows that the constant q1 can be chosen to satisfy the initial condition
k(0) given. This value of q1 , in turn, determines the unique value of c(0) that puts the
system on the saddle path towards (k∗ , c∗ ).
Once again, we can conclude that for each possible value of k(0), there exists a unique value
of c(0) such that the system converges to a steady state.
It turns out that many economic applications share the general structure of this example.
15
That is, a dynamic optimization problem with economic content will often give rise to
a system of n nonlinear differential equations that can be approximated by a linear
system of the form
ż(t) = Az(t),
where the n × 1 vector ∙ ¸
x(t)
z(t) =
y(t)
is made up of the n1 × 1 vector x(t) of predetermined variables, whose initial values
x(0) are given, and the n2 × 1 vector y(t) of nonpredetermined, or jump, variables,
whose initial values can adjust to place the system on the saddle path towards the
steady state, where n1 + n2 = n.
Order the eigenvalues of A so that r1 < r2 < ... < rn , and write the general solution to the
linear system as
z(t) = q1 er1 t v1 + q2 er2 t v2 + ... + qn ern t vn
If the first n1 eigenvalues are negative and the remaining n2 eigenvalues of A are positive,
then the requirement that the system converge to the steady state, so that
lim z(t) = 0
t→∞
Using ∙ ¸
x(0)
z(0) = = q1 v1 + q2 v2 + ... + qn1 vn1 ,
y(0)
the n1 constants q1 , q2 , ..., qn1 can be chosen to satisfy the n1 initial conditions given
by x(0). These values of q1 , q2 , ..., qn1 then determine the unique values in y(0) that
place the system on the saddle path towards the steady state.
Thus, in general, it is often said that an economic model has a unique solution if and only
if the number of negative eigenvalues is exactly equal to the number of predetermined
variables.
16
Difference Equations
Reference:
Let’s start by recasting our familiar bank account example in discrete time:
y1 = (1 + r)y0
y2 = (1 + r)y1 = (1 + r)2 y0
y3 = (1 + r)y2 = (1 + r)3 y0
1
Hence, in general
yt = (1 + r)t y0
Another way of deriving this particular solution is to note that the general solution to (1)
takes the form
yt = k(1 + r)t
since this solution satisfies
A particular solution, that is, a specific value for k, can be found if we know a specific value
of yt at some date t. For example, if we have the initial condition y0 given, then the
general solution requires
y0 = k(1 + r)0 = k
so that the particular solution is found once again to be
yt = (1 + r)t y0
Equation (1) is an example of a first order difference equation, since only one past value of
yt+1 appears on the right-hand side.
Next, consider a second order difference equation, with two past values of yt+1 on the
right-hand side:
yt+1 = a1 yt + a2 yt−1 (2)
Thus, we can always rewrite a second order difference equation as a system of two first
order difference equations.
can be rewritten as a system of n first order difference equations by defining the vector
⎡ ⎤
yt+1
⎢ yt ⎥
zt+1 = ⎢ ⎣ ...
⎥
⎦
yt−n+2
2
and writing
⎡ ⎤ ⎡ ⎤⎡ ⎤
yt+1 a1 a2 ... ... an yt
⎢ yt ⎥ ⎢ 1 0 ... ... ⎥ ⎢
0 ⎥ ⎢ yt−1 ⎥
zt+1 =⎢
⎣ ...
⎥=⎢
⎦ ⎣ ... ...
⎥ = Azt
... ... ... ⎦ ⎣ ... ⎦
yt−n+2 0 0 ... 1 0 yt−n+1
Thus, as long as we are willing to consider systems of difference equations we can, without
any loss of generality, confine our attention to first order difference equations.
A general system of linear difference equations can be written as
where ⎡ ⎤
x1t+1
⎢ x2t+1 ⎥
xt+1 = ⎢
⎣
⎥,
⎦
...
xnt+1
⎡ ⎤
x1t
⎢ x2t ⎥
xt = ⎢
⎣
⎥,
... ⎦
xnt
and ⎡ ⎤
a11 a12 ... a1n
⎢ a21 a22 ... a2n ⎥
A=⎢
⎣ ...
⎥
... ... ... ⎦
an1 an2 ... ann
xit = ki atii
for all i = 1, 2, ..., n, where particular values for the constants ki , i = 1, 2, ..., n, can be
found if, for example, the initial conditions x10 , x20 , ..., xn0 are given.
But even in the more general case where A is not diagonal, we can solve (3) almost as easily
by drawing on our results having to do with eigenvalues, eigenvectors, and diagonaliz-
able matrices.
3
Begin, as before, by calculating the eigenvalues r1 , r2 , ..., rn of the matrix A, together with
the corresponding eigenvectors v1 , v2 , ..., vn . Then form the matrix
£ ¤
P = v1 v2 ... vn
with the eigenvalues along the diagonal and zeros everywhere else. If the eigenvectors
are linearly independent, then we know from before that
P −1 AP = D.
And since D is diagonal, (4) is a system of n self-contained equations, each of which takes
the form
zit+1 = ri zit
and therefore has the general solution
zit = ki rit
4
Finally, with these solutions for the zit ’s in hand, undo the transformation to solve for the
xit ’s:
xt = P zt
⎡ ⎤
z1t
£ ¤ ⎢ z2t ⎥
= v1 v2 ... vn ⎢⎣ ... ⎦
⎥
znt
⎡ ⎤
k1 r1t
£ ¤ ⎢ k2 r2t ⎥
= v1 v2 ... vn ⎢⎣ ... ⎦
⎥
kn rnt
or
xt = k1 r1t v1 + k2 r2t v2 + ... + kn rnt vn (5)
where particular values for the constants ki , i = 1, 2, ..., n can be pinned down, for
example, by initial conditions x10 , x20 , ..., xn0 and the implied values of z10 , z20 , ..., zn0
There is another way of solving systems of linear difference equations that also exploits the
fact that
P −1 AP = D,
where D is diagonal. Note that this equality can be restated as
A = P DP −1 .
Hence, in general,
xt = P Dt P −1 x0 ,
which suggests that (3) has the general solution
xt = P Dt P −1 q, (6)
where particular values of the constants in the vector
⎡ ⎤
q1
⎢ q2 ⎥
q=⎢ ⎥
⎣ ... ⎦
qn
5
can be pinned down, for example if the initial conditions
⎡ ⎤
x10
⎢ x20 ⎥
x0 = ⎢⎣ ... ⎦
⎥
xn0
are given, since in that case:
x0 = P D0 P −1 q
= P IP −1 q
= q
xt+1 = P Dt+1 P −1 q
= P DDt P −1 q
= P DIDt P −1 q
= P DP −1 P Dt P −1 q
= P DP −1 xt
= Axt
Now, in general, it is not the case that for an arbitrary square matrix B, B t can be calculated
by raising each element of B to the tth power. However, in the special case of a diagonal
matrix, it does turn out that this is true.
FACT: For the diagonal matrix D:
⎡ ⎤
r1t 0 ... 0
⎢ 0 r2t ... 0 ⎥
Dt = ⎢
⎣ ...
⎥
... ... ... ⎦
0 0 ... rnt
xt = P Dt P −1 q, (6)
⎡ t ⎤⎡ ⎤
r1 0 ... 0 k1
£ ¤ ⎢ 0 r2t ... ⎥ ⎢
0 ⎥ ⎢ k2 ⎥
xt = v1 v2 ... vn ⎢ ⎣ ... ...
⎥
... ... ⎦ ⎣ ... ⎦
0 0 ... rnt kn
where ⎡ ⎤
k1
⎢ k2 ⎥
⎢ ⎥ −1
⎣ ... ⎦ = P q
kn
6
Thus, (6) is equivalent to ⎡ ⎤
k1 r1t
£ ¤ ⎢ k2 r2t ⎥
xt = v1 v2 ... vn ⎢⎣ ... ⎦
⎥
kn rnt
or
xt = k1 r1t v1 + k2 r2t v2 + ... + kn rnt vn , (5)
which confirms that the two approaches lead us to the same general solution.
Before moving on, note that a stationary, or steady state, solution to
xt+1 = Axt (3)
is a solution of the form xt = x for all t = 0, 1, 2, ..., where the vector of constants x
must satisfy
x = Ax.
If A − I is nonsingular, this requires that
x = 0̄,
where 0̄ is an n × 1 vector of zeros:
⎡ ⎤
0
⎢ 0 ⎥
0̄ = ⎢ ⎥
⎣ ... ⎦
0
These calculations reveal that (3) has the steady state solution
xt = 0̄
The general solution
xt = k1 r1t v1 + k2 r2t v2 + ... + kn rnt vn , (5)
reveals that the steady state xt = 0̄ will be asymptotically stable if
|ri | < 1
for all i = 1, 2, ..., n, for in this case
lim rit = 0
t→∞
7
2 Lag Operators
Many economic applications, including some dynamic optimization problems, give rise to
difference equations of the slightly more general form
yt = ayt−1 + xt , (7)
In these cases, we often want to obtain solutions that express yt in terms of current, past,
or future values of xt .
Suppose, therefore, that (7) holds for negative as well as positive values of t. And suppose
that the time horizon extends into the infinite past as well as into the infinite future.
Then t = ..., −2, −1, 0, 1, 2, ...
And since
yt−2 = ayt−3 + xt−2 ,
we can also write
yt = a3 yt−3 + a2 xt−2 + axt−1 + xt
or, more generally,
X
T −1
T
yt = a yt−T + aj xt−j (8)
j=0
Then we might repeat our backward substitution an infinite number of times or, equiva-
lently, take the limit in (8) as T → ∞
lim aT yt−T = 0
T →∞
8
And since |a| < 1 and {xt }∞
t=−∞ is bounded,
X
T −1 X
∞
j
lim a xt−j = aj xt−j < ∞
T →∞
j=0 j=0
And since
yt+2 = (1/a)yt+3 − (1/a)xt+3
we can also write
9
And since |a| > 1 and {xt }∞
t=−∞ is bounded,
X
T X
∞
j
lim (1/a) xt+j = (1/a)j xt+j < ∞
T →∞
j=1 j=1
Thus, if we repeat our forward substitution an infinite number of times or, equivalently,
take the limit in (10) as T → ∞, we obtain the solution
X
∞
yt = − (1/a)j xt+j (11)
j=1
10
Now suppose that |a| < 1, and recall the following fact.
It turns out that we can apply this fact to (12) as well, and write
yt = (1 − aL)−1 xt . (12)
X
∞
yt = (aL)j xt
j=0
X∞
= aj Lj xt
j=0
or
X
∞
yt = aj xt−j (9)
j=0
which is, of course, the same solution that we obtained by repeated backward substi-
tution.
Alternatively, if |a| > 1, we can use the lag operator to rewrite (7) as
yt = ayt−1 + xt , (7)
11
to (13) yields
X
∞
yt = −(aL)−1 (aL)−j xt
j=0
X
∞
= −(aL)−1 (1/a)j L−j xt
j=0
X
∞
= − (1/a)j+1 L−j−1 xt
j=0
X∞
= − (1/a)j L−j xt
j=1
or
X
∞
yt = − (1/a)j xt+j (11)
j=1
which is, of course, the same solution that we obtained using repeated forward substi-
tution.
Note: Consider forming a matrix A with the parameter a from (7) as its only element:
£ ¤
A= a
Then A has a as its single eigenvalue: r1 = a.
Next, recall that for the linear system
xt+1 = Axt ,
the steady state xt = 0̄ is asymptotically stable if the eigenvalues of A are all less than
one in absolute value and unstable if one or more of the eigenvalues of A is greater
than one in absolute value.
By analogy, the difference equation
yt = ayt−1 + xt (7)
is said to be stable if |a| < 1 and unstable if |a| > 1.
Since, when |a| < 1, we can use backward substitution to obtain
X
∞
yt = aj xt−j (9)
j=0
it is often said that stable equations are solved backwards and unstable equations are
solved forwards.
12
3 Four Examples
3.1 Example 1: Optimal Growth
Production function:
F (kt ) = ktα
where 0 < α < 1
Utility:
X
∞
β t ln(ct )
t=0
Earlier, we solved this problem via dynamic programming, by guessing that the value
function takes the form
v(kt ) = E + F ln(kt ).
After deriving the first order and envelope conditions and solving for the unknown
constants E and F , we concluded that the optimal capital stock follows the difference
equation
kt+1 = αβktα (14)
So far, we have only discussed linear difference equations. But notice that if we take logs
on both sides of (14), we obtain
ln(αβ)
ln(kt+1 ) = (1 − α) + α ln(kt )
1−α
∙ ¸
ln(αβ) ln(αβ)
ln(kt+1 ) − = α ln(kt ) −
1−α 1−α
Consider, therefore, defining the new variable
ln(αβ)
zt = ln(kt ) −
1−α
and rewriting the difference equation more simply as
zt+1 = αzt
Now we have a linear difference equation, which we know has the general solution
zt = qαt
13
Given the initial condition k0 , we can calculate the initial condition
ln(αβ)
z0 = ln(k0 ) −
1−α
and thereby determine the particular solution
∙ ¸
t t ln(αβ)
zt = α z0 = α ln(k0 ) −
1−α
Since |α| < 1, this solution tells us that for any value of k0 ,
ln(αβ)
lim ln(kt ) = = ln[(αβ)1/(1−α) ]
t→∞ 1−α
Hence, starting from any initial capital stock k0 , the capital stock converges to a steady
state level:
lim kt = (αβ)1/(1−α)
t→∞
Production function:
F (kt , zt ) = zt ktα
where 0 < α < 1 and zt is random with Et ln(zt ) = 0 for all t = 0, 1, 2, ...
Expected utility:
X
∞
E0 β t ln(ct )
t=0
14
Problem set 3 asked you to solve this problem via dynamic programming and to show that
the optimal capital stock follows the first order autoregressive process
ln(kt+1 ) = ln(αβ) + α ln(kt ) + ln(zt ) (15)
Since |α| < 1, we can use the lag operator to solve for xt in terms of past values of εt :
xt+1 = αxt + εt ,
xt = αxt−1 + εt−1
xt = αLxt + Lεt
(1 − αL)xt = Lεt
xt = L(1 − αL)−1 εt
X
∞
xt = L (αL)j εt
j=0
X
∞
xt = L αj Lj εt
j=0
X
∞
xt = αj Lj+1 εt
j=0
X
∞
xt = αj εt−j−1
j=0
Now undo the transformations to find the solution for kt in terms of past zt :
ln(αβ) X ∞
ln(kt ) − = xt = αj εt−j−1
1−α j=0
ln(αβ) X j
∞
ln(kt ) = + α ln(zt−j−1 )
1−α j=0
15
Use this solution to characterize the behavior of output:
yt = zt ktα
implies
ln(yt ) = ln(zt ) + α ln(kt )
or µ ¶
α X∞
ln(yt ) = ln(zt ) + ln(αβ) + α αj ln(zt−j−1 ) (16)
1−α j=0
Equation (16) reveals that yt will be serially correlated even if zt is serially uncorrelated.
Thus, the process of optimal capital accumulation can transform serially uncorrelated
shocks to productivity into serially correlated movements in output like those that
occur over the business cycle.
For this third example, consider a more general nonlinear difference equation of the form
xt+1 = g(xt )
and suppose that the function g does not allow us to rewrite this equation as a linear
difference equation.
In this more general case, we might not be able to find an explicit solution.
As in continuous time, however, we might still be able to characterize the solution graphi-
cally.
In case one, illustrated below, the graph of g(x) intersects the 45-degree line at x = 0 and
x = x∗ , revealing that there are two steady states.
But starting from x0 , which can be arbitrarily close to zero, the graph reveals that
lim xt = x∗ ,
t→∞
In case two, once again, there are two steady states: xt = 0 and xt = x∗ .
16
45-degree line (xt+1 = xt)
xt+1
xt+1 = g(xt)
x2
x1
0 x0 x1 x2 x*
xt
x1
x2
0 x2 x1 x0 x* xt
Example: During any period, an individual worker might be employed (in state 1) or
unemployed (in state 2).
mij = probability that an agent in state j during period t will be in state i during
period t + 1
mij = fraction of the agents in state j during period t who will move to state i during
period t + 1
The random, or stochastic process that allocates individual agents to individual states in
this example is called a Markov process or a Markov chain.
The defining characteristic of a Markov process is that only the immediate past matters:
the probability that an agent will be in state i during period t + 1 depends only on the
state j that the agent is in during period t.
The probabilities mij are called transition probabilities. Since, in this example, the mij do
not depend on time, the Markov process is stationary.
17
Suppose we collect all of the transition probabilities into a matrix:
⎡ ⎤
m11 m12 ... m1n
⎢ m21 m22 ... m2n ⎥
M =⎢ ⎣ ...
⎥
... ... ... ⎦
mn1 mn2 ... mnn
Then M is called a Markov matrix, and has the special property that each of its columns
have entries that sum to one.
mij ≥ 0
mij > 0
holds for all i = 1, 2, ..., n and j = 1, 2, ..., n, then the Markov matrix is said to be
regular.
Note that if we know all of the fractions xjt , j = 1, 2, ..., n, we can calculate the fractions
xit+1 using
X n
xit+1 = mij xjt (17)
j=1
Alternatively, if we define ⎡ ⎤
x1t
⎢ x2t ⎥
xt = ⎢ ⎥
⎣ ... ⎦ ,
xnt
then we can write the equations in (17) in matrix form as:
⎡ ⎤ ⎡ ⎤⎡ ⎤
x1t+1 m11 m12 ... m1n x1t
⎢ x2t+1 ⎥ ⎢ m21 m22 ... m2n ⎥ ⎢ x2t ⎥
xt+1 = ⎢⎣ ...
⎥=⎢
⎦ ⎣ ...
⎥⎢
⎦ ⎣
⎥ = Mxt , (18)
... ... ... ... ⎦
xnt+1 mn1 mn2 ... mnn xnt
18
c) For any constant k1 , we can find an eigenvector v1 of M corresponding to the
eigenvalue r1 = 1, such that the elements of w1 = k1 v1 all lie between zero
and one. Moreover, the elements of w1 sum to one. The elements of w1 can
therefore be interpreted as probabilities or, if the population is large, fractions of
the population.
We know that the general solution to a system of linear difference equations like (18) takes
the form
xt = k1 r1t v1 + k2 r2t v2 + ... + kn rnt vn , (5)
Thus, so long as M is regular, (5) implies that starting from any initial x0 ,
lim xt = k1 v1 = w1 ,
t→∞
These results tell us that if the Markov matrix M is regular, then starting from any initial
distribution of the population into states,
⎡ ⎤
x10
⎢ x20 ⎥
x0 = ⎢ ⎥
⎣ ... ⎦ ,
xn0
the economy will converge over time towards a steady state, in which the distribution
of the population into states is given by
⎡ ⎤
w11
⎢ w12 ⎥
w1 = ⎢ ⎥
⎣ ... ⎦ .
w1n
19