Irelandp

LECTURE NOTES ON
ECONOMIC DYNAMICS

Peter N. Ireland
Department of Economics
Boston College

irelandp@bc.edu
http://www2.bc.edu/~irelandp/ec720.html

Copyright (c) 2008 by Peter N. Ireland. Redistribution is permitted for educational and research
purposes, so long as no changes are made. All copies much be provided free of charge and must include
this copyright notice.

Two Useful Theorems
Two theorems will prove quite useful in all of our discussions of dynamic optimization:
the Kuhn-Tucker Theorem and the Envelope Theorem. Let’s consider each of these in turn.
1 The Kuhn-Tucker Theorem

References:
Dixit, Chapters 2 and 3.

Simon-Blume, Chapter 18.
Consider a simple constrained optimization problem:
x ∈ R choice variable
F : R → R objective function, continuously differentiable
c ≥ G(x) constraint, with c ∈ R and G : R → R, also continuously differentiable.
The problem can be stated as:
max F (x) subject to c ≥ G(x)

x
Probably the easiest way to solve this problem is via the method of Lagrange multipliers.
The mathematical foundations that allow for the application of this method are given
to us by Lagrange’s Theorem or, in its most general form, the Kuhn-Tucker Theorem.
To prove this theorem, begin by defining the Lagrangian:
L(x, λ) = F (x) + λ[c − G(x)]
for any x ∈ R and λ ∈ R.
Theorem (Kuhn-Tucker) Suppose that x∗ maximizes F (x) subject to c ≥ G(x), where

F and G are both continuously differentiable, and suppose that G0 (x∗ ) 6= 0. Then
there exists a value λ∗ of λ such that x∗ and λ∗ satisfy the following four conditions:
L1 (x∗ , λ∗ ) = F 0 (x∗ ) − λ∗ G0 (x∗ ) = 0, (1)
1
L2 (x∗ , λ∗ ) = c − G(x∗ ) ≥ 0, (2)
λ∗ ≥ 0, (3)
and
λ∗ [c − G(x∗ )] = 0. (4)
Proof Consider two possible cases, depending on whether or not the constraint is binding
at x∗ .
Case 1: Nonbinding constraint.
If c > G(x∗ ), then let λ∗ = 0. Clearly, (2)-(4) are satisfied, so it only remains to show
that (1) must hold. With λ∗ = 0, (1) holds if and only if
F 0 (x∗ ) = 0. (5)
We can show that (5) must hold using a proof by contradiction. Suppose that
instead of (5), it turns out that
F 0 (x∗ ) < 0.
Then, by the continuity of F and G, there must exist an ε > 0 such that
F (x∗ − ε) > F (x∗ ) and c > G(x∗ − ε).
But this result contradicts the assumption that x∗ maximizes F (x) subject to
c ≥ G(x). Similarly, if it turns out that
F 0 (x∗ ) > 0,
then by the continuity of F and G there must exist an ε > 0 such that
F (x∗ + ε) > F (x∗ ) and c > G(x∗ + ε).
But, again, this result contradicts the assumption that x∗ maximizes F (x) subject
to c ≥ G(x). This establishes that (5) must hold, completing the proof for case 1.
Case 2: Binding Constraint.
If c = G(x∗ ), then let λ∗ = F 0 (x∗ )/G0 (x∗ ). This is possible, given the assumption
that G0 (x∗ ) 6= 0. Clearly, (1), (2), and (4) are satisfied, so it only remains to show
that (3) must hold. With λ∗ = F 0 (x∗ )/G0 (x∗ ), (3) holds if and only if
F 0 (x∗ )/G0 (x∗ ) ≥ 0. (6)
We can show that (6) must hold using a proof by contradiction. Suppose that
instead of (6), it turns out that
F 0 (x∗ )/G0 (x∗ ) < 0.
2
One way that this can happen is if F 0 (x∗ ) > 0 and G0 (x∗ ) < 0. But if these
conditions hold, then the continuity of F and G implies the existence of an ε > 0
such that
F (x∗ + ε) > F (x∗ ) and c = G(x∗ ) > G(x∗ + ε),
which contradicts the assumption that x∗ maximizes F (x) subject to c ≥ G(x).
If, instead, F 0 (x∗ )/G0 (x∗ ) < 0 because F 0 (x∗ ) < 0 and G0 (x∗ ) > 0, then the
continuity of F and G implies the existence of an ε > 0 such that
F (x∗ − ε) > F (x∗ ) and c = G(x∗ ) > G(x∗ − ε),
which again contradicts the assumption that x∗ maximizes F (x) subject to c ≥

G(x). This establishes that (6) must hold, completing the proof for case 2.
Notes:
a) The theorem can be extended to handle cases with more than one choice variable
and more than one constraint: see Dixit or Simon-Blume.
b) Equations (1)-(4) are necessary conditions: If x∗ is a solution to the optimization
problem, then there exists a λ∗ such that (1)-(4) must hold. But (1)-(4) are not
sufficient conditions: if x∗ and λ∗ satisfy (1)-(4), it does not follow automatically
that x∗ is a solution to the optimization problem.
Despite point (b) listed above, the Kuhn-Tucker theorem is extremely useful in practice.
Suppose that we are looking for the solution x∗ to the constrained optimization problem
max F (x) subject to c ≥ G(x).

x
The theorem tells us that if we form the Lagrangian
L(x, λ) = F (x) + λ[c − G(x)],
then x∗ and the associated λ∗ must satisfy the first-order condition (FOC) obtained
by differentiating L by x and setting the result equal to zero:
L1 (x∗ , λ∗ ) = F 0 (x∗ ) − λ∗ G0 (x∗ ) = 0, (1)
In addition, we know that x∗ must satisfy the constraint:
c ≥ G(x∗ ). (2)
We know that the Lagrange multiplier λ∗ must be nonnegative:
λ∗ ≥ 0. (3)
3
And finally, we know that the complementary slackness condition
λ∗ [c − G(x∗ )] = 0, (4)
must hold: If λ∗ > 0, then the constraint must bind; if the constraint does not bind,
then λ∗ = 0.
In searching for the value of x that solves the constrained optimization problem, we only
need to consider values of x∗ that satisfy (1)-(4).
Two pieces of terminology:
a) The extra assumption that G0 (x∗ ) 6= 0 is needed to guarantee the existence of a

multiplier λ∗ satisfying (1)-(4). This extra assumption is called the constraint
qualification, and almost always holds in practice.
b) Note that (1) is a FOC for x, while (2) is like a FOC for λ. In most applications,
the second-order conditions (SOC) will imply that x∗ maximizes L(x, λ), while λ∗
minimizes L(x, λ). For this reason, (x∗ , λ∗ ) is typically a saddle-point of L(x, λ).
Thus, in solving the problem in this way, we are using the Lagrangian to turn a constrained
optimization problem into an unconstrained optimization problem, where we seek to
maximize L(x, λ) rather than simply F (x).
One final note:
Our general constraint, c ≥ G(x), nests as a special case the nonnegativity constraint
x ≥ 0, obtained by setting c = 0 and G(x) = −x.
So nonnegativity constraints can be introduced into the Lagrangian in the same way
as all other constraints. If we consider, for example, the extended problem
max F (x) subject to c ≥ G(x) and x ≥ 0,
x
then we can introduce a second multiplier µ, form the Lagrangian as

L(x, λ, µ) = F (x) + λ[c − G(x)] + µx,
and write the first order condition for the optimal x∗ as
L1 (x∗ , λ∗ , µ∗ ) = F 0 (x∗ ) − λ∗ G0 (x∗ ) + µ∗ = 0. (10 )
In addition, analogs to our earlier conditions (2)-(4) must also hold for the second
constraint: x∗ ≥ 0, µ∗ ≥ 0, and µ∗ x∗ = 0.
Kuhn and Tucker’s original statement of the theorem, however, does not incorporate
nonnegativity constraints into the Lagrangian. Instead, even with the additional
nonnegativity constraint x ≥ 0, they continue to define the Lagrangian as
L(x, λ) = F (x) + λ[c − G(x)].
If this case, the first order condition for x∗ must be modified to read
L1 (x∗ , λ∗ ) = F 0 (x∗ ) − λ∗ G0 (x∗ ) ≤ 0, with equality if x∗ > 0. (100 )
4
Of course, in (10 ), µ∗ ≥ 0 in general and µ∗ = 0 if x∗ > 0. So a close inspection reveals
that these two approaches to handling nonnegativity constraints lead in the end
to the same results.
2 The Envelope Theorem

References:
Dixit, Chapter 5.
Simon-Blume, Chapter 19.
In our discussion of the Kuhn-Tucker theorem, we considered an optimization problem of

the form
max F (x) subject to c ≥ G(x)
x
Now, let’s generalize the problem by allowing the functions F and G to depend on a
parameter θ ∈ R. The problem can now be stated as
max F (x, θ) subject to c ≥ G(x, θ)

x
For this problem, define the maximum value function V : R → R as
V (θ) = max F (x, θ) subject to c ≥ G(x, θ)

x
Note that evaluating V requires a two-step procedure:
First, given θ, find the value of x∗ that solves the constrained optimization problem.
Second, substitute this value of x∗ , together with the given value of θ, into the objec-
tive function to obtain
V (θ) = F (x∗ , θ)
Now suppose that we want to investigate the properties of this function V . Suppose, in
particular, that we want to take the derivative of V with respect to its argument θ.
As the first step in evaluating V 0 (θ), consider solving the constrained optimization problem
for any given value of θ by setting up the Lagrangian
L(x, λ) = F (x, θ) + λ[c − G(x, θ)]
We know from the Kuhn-Tucker theorem that the solution x∗ to the optimization problem
and the associated value of the multiplier λ∗ must satisfy the complementary slackness
condition:
λ∗ [c − G(x∗ , θ)] = 0
5
Use this last result to rewrite the expression for V as
V (θ) = F (x∗ , θ) = F (x∗ , θ) + λ∗ [c − G(x∗ , θ)]
So suppose that we tried to calculate V 0 (θ) simply by differentiating both sides of this
equation with respect to θ:
V 0 (θ) = F2 (x∗ , θ) − λ∗ G2 (x∗ , θ).
In principle, this formula may not be correct. The reason is that x∗ and λ∗ will themselves
depend on the parameter θ, and we must take this dependence into account when
differentiating V with respect to θ.
However, the envelope theorem tells us that our formula for V 0 (θ) is, in fact, correct. That
is, the envelope theorem tells us that we can ignore the dependence of x∗ and λ∗ on θ
in calculating V 0 (θ).
To see why, for any θ, let x∗ (θ) denote the solution to the problem: max F (x, θ) subject to
c ≥ G(x, θ), and let λ∗ (θ) be the associated Lagrange multiplier.
Theorem (Envelope) Let F and G be continuously differentiable functions of x and θ.
For any given θ, let x∗ (θ) maximize F (x, θ) subject to c ≥ G(x, θ), and let λ∗ (θ) be
the value of the associated Lagrange multiplier. Suppose, further, that x∗ (θ) and λ∗ (θ)
are also continuously differentiable functions, and that the constraint qualification
G1 [x∗ (θ), θ] 6= 0 holds for all values of θ. Then the maximum value function defined by
x
satisfies
V 0 (θ) = F2 [x∗ (θ), θ] − λ∗ (θ)G2 [x∗ (θ), θ]. (7)
Proof The Kuhn-Tucker theorem tells us that for any given value of θ, x∗ (θ) and λ∗ (θ)
must satisfy
L1 [x∗ (θ), λ∗ (θ)] = F1 [x∗ (θ), θ] − λ∗ (θ)G1 [x∗ (θ), θ] = 0, (1)
and
λ∗ (θ){c − G[x∗ (θ), θ]} = 0. (4)
In light of (4),
V (θ) = F [x∗ (θ), θ] = F [x∗ (θ), θ] + λ∗ (θ){c − G[x∗ (θ), θ]}
Differentiating both sides of this expression with respect to θ yields
V 0 (θ) = F1 [x∗ (θ), θ]x∗0 (θ) + F2 [x∗ (θ), θ]
+λ∗0 (θ){c − G[x∗ (θ), θ]} − λ∗ (θ)G1 [x∗ (θ), θ]x∗0 (θ)
−λ∗ (θ)G2 [x∗ (θ), θ]
which shows that, in principle, we must take the dependence of x∗ and λ∗ on θ into
account when calculating V 0 (θ).
6
Note, however, that
V 0 (θ) = {F1 [x∗ (θ), θ] − λ∗ (θ)G1 [x∗ (θ), θ]}x∗0 (θ)
+F2 [x∗ (θ), θ] + λ∗0 (θ){c − G[x∗ (θ), θ]} − λ∗ (θ)G2 [x∗ (θ), θ],
which by (1) reduces to
V 0 (θ) = F2 [x∗ (θ), θ] + λ∗0 (θ){c − G[x∗ (θ), θ]} − λ∗ (θ)G2 [x∗ (θ), θ]
Thus, it only remains to show that

λ∗0 (θ){c − G[x∗ (θ), θ]} = 0 (8)
Clearly, (8) holds for any θ such that the constraint is binding.
For θ such that the constraint is not binding, (4) implies that λ∗ (θ) must equal zero.
Furthermore, by the continuity of G and x∗ , if the constraint does not bind at θ, there
exists a ε∗ > 0 such that the constraint does not bind for all θ + ε with ε∗ > |ε|. Hence,
(4) also implies that λ∗ (θ + ε) = 0 for all ε∗ > |ε|. Using the definition of the derivative
∗0 λ∗ (θ + ε) − λ∗ (θ) 0
λ (θ) = lim = lim = 0,
ε→0 ε ε→0 ε
it once again becomes apparent that (8) must hold.

Thus,
V 0 (θ) = F2 [x∗ (θ), θ] − λ∗ (θ)G2 [x∗ (θ), θ]
as claimed in the theorem.
Once again, this theorem is useful because it tells us that we can ignore the dependence of
x∗ and λ∗ on θ in calculating V 0 (θ).
But what is the intuition for why the envelope theorem holds? To obtain some intuition,
begin by considering the simpler, unconstrained optimization problem:
max F (x, θ),
x
where x is the choice variable and θ is the parameter.

Associated with this unconstrained problem, define the maximum value function in the
same way as before:
V (θ) = max F (x, θ).

x
To evaluate V for any given value of θ, use the same two-step procedure as before. First,
find the value x∗ (θ) that solves the unconstrained maximization problem for that value
of θ. Second, substitute that value of x back into the objective function to obtain
V (θ) = F [x∗ (θ), θ].
7
Now differentiate both sides of this expression through by θ, carefully taking the dependence
of x∗ on θ into account:
V (θ) = F1 [x∗ (θ), θ]x∗0 (θ) + F2 [x∗ (θ), θ].
But, if x∗ (θ) is the value of x that maximizes F given θ, we know that x∗ (θ) must be a
critical value of F :
F1 [x∗ (θ), θ] = 0.
Hence, for the unconstrained problem, the envelope theorem implies that
V (θ) = F2 [x∗ (θ), θ],
so that, again, we can ignore the dependence of x∗ on θ in differentiating the maximum

value function. And this result holds not because x∗ fails to depend on θ: to the
contrary, in fact, x∗ will typically depend on θ through the function x∗ (θ). Instead, the
result holds because since x∗ is chosen optimally, x∗ (θ) is a critical point of F given θ.
Now return to the constrained optimization problem
max F (x, θ) subject to c ≥ G(x, θ)

x
and define the maximum value function as before:
V (θ) = max F (x, θ) subject to c ≥ G(x, θ).

x
The envelope theorem for this constrained problem tells us that we can also ignore the
dependence of x∗ on θ when differentiating V with respect to θ, but only if we start by
adding the complementary slackness condition to the maximized objective function to
first obtain
V (θ) = F [x∗ (θ), θ] + λ∗ (θ){c − G[x∗ (θ), θ]}.
In taking this first step, we are actually evaluating the entire Lagrangian at the optimum,
instead of just the objective function. We need to take this first step because for the
constrained problem, the Kuhn-Tucker condition (1) tells us that x∗ (θ) is a critical
point, not of the objective function by itself, but of the entire Lagrangian formed by
adding the product of the multiplier and the constraint to the objective function.
And what gives the envelope theorem its name? The “envelope” theorem refers to a
geometrical presentation of the same result that we’ve just worked through.
To see where that geometrical interpretation comes from, consider again the simpler, un-
constrained optimization problem:
max F (x, θ),

x
where x is the choice variable and θ is a parameter.
8
Following along with our previous notation, let x∗ (θ) denote the solution to this problem
for any given value of θ, so that the function x∗ (θ) tells us how the optimal choice of
x depends on the parameter θ.
Also, continue to define the maximum value function V in the same way as before:
V (θ) = max F (x, θ).

x
Now let θ1 denote a particular value of θ, and let x1 denote the optimal value of x associated
with this particular value θ1 . That is, let
x1 = x∗ (θ1 ).
After substituting this value of x1 into the function F , we can think about how F (x1 , θ)
varies as θ varies–that is, we can think about F (x1 , θ) as a function of θ, holding x1
fixed.
In the same way, let θ2 denote another particular value of θ, with θ2 > θ1 let’s say. And
following the same steps as above, let x2 denote the optimal value of x associated with
this particular value θ2 , so that
x2 = x∗ (θ2 ).
Once again, we can hold x2 fixed and consider F (x2 , θ) as a function of θ.
The geometrical presentation of the envelope theorem can be derived by thinking about the
properties of these three functions of θ: V (θ), F (x1 , θ), and F (x2 , θ).
One thing that we know about these three functions is that for θ = θ1 :
V (θ1 ) = F (x1 , θ1 ) > F (x2 , θ1 ),
where the first equality and the second inequality both follow from the fact that, by
definition, x1 maximizes F (x, θ1 ) by choice of x.
Another thing that we know about these three functions is that for θ = θ2 :
V (θ2 ) = F (x2 , θ2 ) > F (x1 , θ2 ),
because again, by definition, x2 maximizes F (x, θ2 ) by choice of x.
On a graph, these relationships imply that:
At θ1 , V (θ) coincides with F (x1 , θ), which lies above F (x2 , θ).
At θ2 , V (θ) coincides with F (x2 , θ), which lies above F (x1 , θ).
And we could find more and more values of V by repeating this procedure for more
and more specific values of θi , i = 1, 2, 3, ....
In other words:
9
V(ө)
The Envelope Theorem
F(x2,ө)
F(x1,ө)
ө1 ө2 ө
V (θ) traces out the “upper envelope” of the collection of functions F (xi , θ), formed
by holding xi = x∗ (θi ) fixed and varying θ.
Moreover, V (θ) is tangent to each individual function F (xi , θ) at the value θi of θ for
which xi is optimal, or equivalently:
V 0 (θ) = F2 [x∗ (θ), θ],
which is the same analytical result that we derived earlier for the unconstrained
optimization problem.
To generalize these arguments so that they apply to the constrained optimization problem
max F (x, θ) subject to c ≥ G(x, θ),

x
simply use the fact that in most cases (where the appropriate second-order conditions
hold) the value x∗ (θ) that solves the constrained optimization problem for any given
value of θ also maximizes the Lagrangian function
L(x, λ, θ) = F (x, θ) + λ[c − G(x, θ)],
so that

x
= max L(x, λ, θ)
x
Now just replace the function F with the function L in working through the arguments
from above to conclude that
V 0 (θ) = L3 [x∗ (θ), λ∗ (θ), θ] = F2 [x∗ (θ), θ] − λ∗ (θ)G2 [x∗ (θ), θ],
which is exactly the same result that we derived before!
3 Two Examples
3.1 Utility Maximization
A consumer has a utility function defined over consumption of two goods: U (c1 , c2 )
Prices: p1 and p2
Income: I
Budget constraint: I ≥ p1 c1 + p2 c2 = G(c1 , c2 )
The consumer’s problem is:
max U(c1 , c2 ) subject to I ≥ p1 c1 + p2 c2

c1 ,c2
10
The Kuhn-Tucker theorem tells us that if we set up the Lagrangian:
L(c1 , c2 , λ) = U (c1 , c2 ) + λ(I − p1 c1 − p2 c2 )
Then the optimal consumptions c∗1 and c∗2 and the associated multiplier λ∗ must satisfy the
FOC:
L1 (c∗1 , c∗2 , λ∗ ) = U1 (c∗1 , c∗2 ) − λ∗ p1 = 0
and
L2 (c∗1 , c∗2 , λ∗ ) = U2 (c∗1 , c∗2 ) − λ∗ p2 = 0
Move the terms with minus signs to the other side, and divide the first of these FOC by
the second to obtain
U1 (c∗1 , c∗2 ) p1
∗ ∗
= ,
U2 (c1 , c2 ) p2
which is just the familiar condition that says that the optimizing consumer should set
the slope of his or her indifference curve, the marginal rate of substitution, equal to
the slope of his or her budget constraint, the ratio of prices.
Now consider I as one of the model’s parameters, and let the functions c∗1 (I), c∗2 (I), and
λ∗ (I) describe how the optimal choices c∗1 and c∗2 and the associated value λ∗ of the
multiplier depend on I.
In addition, define the maximum value function as
V (I) = max U(c1 , c2 ) subject to I ≥ p1 c1 + p2 c2
c1 ,c2
The Kuhn-Tucker theorem tells us that

λ∗ (I)[I − p1 c∗1 (I) − p2 c∗2 (I)] = 0
and hence
V (I) = U [c∗1 (I), c∗2 (I)] = U[c∗1 (I), c∗2 (I)] + λ∗ (I)[I − p1 c∗1 (I) − p2 c∗2 (I)].
The envelope theorem tells us that we can ignore the dependence of c∗1 and c∗2 on I in
calculating
V 0 (I) = λ∗ (I),
which gives us an interpretation of the multiplier λ∗ as the marginal utility of income.
3.2 Cost Minimization

The Kuhn-Tucker and envelope conditions can also be used to study constrained minimiza-
tion problems.
Consider a firm that produces output y using capital k and labor l, according to the
technology described by
f (k, l) ≥ y.
11
r = rental rate for capital
w = wage rate
Suppose that the firm takes its output y as given, and chooses inputs k and l to minimize
costs. Then the firm solves
min rk + wl subject to f (k, l) ≥ y

k,l
If we set up the Lagrangian as
L(k, l, λ) = rk + wl − λ[f (k, l) − y],
where the term involving the multiplier λ is subtracted rather than added in the case of
a minimization problem, the Kuhn-Tucker conditions (1)-(4) continue to apply, exactly
as before.
Thus, according to the Kuhn-Tucker theorem, the optimal choices k∗ and l∗ and the asso-
ciated multiplier λ∗ must satisfy the FOC:
L1 (k ∗ , l∗ , λ∗ ) = r − λ∗ f1 (k∗ , l∗ ) = 0 (9)
and
L2 (k∗ , l∗ , λ∗ ) = w − λ∗ f2 (k∗ , l∗ ) = 0 (10)
Move the terms with minus signs over to the other side, and divide the first FOC by the
second to obtain
f1 (k∗ , l∗ ) r
∗ ∗
= ,
f2 (k , l ) w
which is another familiar condition that says that the optimizing firm chooses factor
inputs so that the marginal rate of substitution between inputs in production equals
the ratio of factor prices.
Now suppose that the constraint binds, as it usually will:
y = f (k∗ , l∗ ) (11)
Then (9)-(11) represent 3 equations that determine the three unknowns k ∗ , l∗ , and λ∗ as
functions of the model’s parameters r, w, and y. In particular, we can think of the
functions
k∗ = k∗ (r, w, y)
and
l∗ = l∗ (r, w, y)
as demand curves for capital and labor: strictly speaking, they are conditional (on y)
factor demand functions.
12
Now define the minimum cost function as
C(r, w, y) = min rk + wl subject to f (k, l) ≥ y

k,l
= rk (r, w, y) + wl∗ (r, w, y)
∗
= rk ∗ (r, w, y) + wl∗ (r, w, y)

−λ∗ (r, w, y){f [k ∗ (r, w, y), l∗ (r, w, y)] − y}
The envelope theorem tells us that in calculating the derivatives of the cost function, we
can ignore the dependence of k∗ , l∗ , and λ∗ on r, w, and y.
Hence:
C1 (r, w, y) = k∗ (r, w, y),
C2 (r, w, y) = l∗ (r, w, y),
and
C3 (r, w, y) = λ∗ (r, w, y).
The first two of these equations are statements of Shephard’s lemma; they tell us that
the derivatives of the cost function with respect to factor prices coincide with the
conditional factor demand curves. The third equation gives us an interpretation of the
multiplier λ∗ as a measure of the marginal cost of increasing output.
Thus, our two examples illustrate how we can apply the Kuhn-Tucker and envelope theorems
in specific economic problems.
The two examples also show how, in the context of specific economic problems, it is often
possible to attach an economic interpretation to the multiplier λ∗ .
13
The Maximum Principle
Here, we will explore the connections between two popular ways of solving dynamic
optimization problems, that is, problems that involve optimization over time. The first
solution method is just a straightforward application of the Kuhn-Tucker theorem; the second
solution method relies on a result known as the maximum principle.
We’ll being by briefly noting the basic features that set dynamic optimization problems
apart from purely static ones. Then we’ll go on consider the connections between the Kuhn-
Tucker theorem and the maximum principle in both discrete and continuous time.
Reference:
Dixit, Chapter 10.
1 Basic Elements of Dynamic Optimization Problems

Moving from the static optimization problems that we’ve considered so far to the dynamic
optimization problems that are of primary interest here involves only a few minor
changes.
a) We need to index the variables that enter into the problem by t, in order to keep track
of changes in those variables that occur over time.
b) We need to distinguish between two types of variables:
stock variables - e.g., stock of capital, assets, or wealth

flow variables - e.g., output, consumption, saving, or labor supply per unit of time
c) We need to introduce constraints that describe the evolution of stock variables over time:
e.g., larger flows of savings or investment today will lead to larger stocks of wealth or
capital tomorrow.
2 The Maximum Principle: Discrete Time

2.1 A Dynamic Optimization Problem in Discrete Time
Consider a dynamic optimization in discrete time, that is, in which time can be indexed by
t = 0, 1, ..., T .
1
yt = stock variable
zt = flow variable
Objective function:
X
T
β t F (yt , zt ; t)
t=0
Following Dixit, we can allow for a wider range of possibilities by letting the functions as
well as the variables depend on the time index t.
1 ≥ β > 0 = discount factor
Constraint describing the evolution of the stock variable:
Q(yt , zt ; t) ≥ yt+1 − yt
or
yt + Q(yt , zt ; t) ≥ yt+1
for all t = 0, 1, ..., T
Constraint applying to variables within each period:
c ≥ G(yt , zt ; t)
for all t = 0, 1, ..., T
Constraints on initial and terminal values of stock:
y0 given
yT +1 ≥ y ∗
The dynamic optimization problem can now be stated as: choose sequences {zt }Tt=0 and
{yt }Tt=1
+1
to maximize the objective function subject to all of the constraints.
Notes:
a) It is important for the application of the maximum principle that the problem
be additively time separable: that is, the values of F , Q, and G at time t must
depend on the values of yt and zt only at time t.
b) Although the constraints describing the evolution of the stock variable and applying
to the variables within each period can each be written in the form of a single
equation, it must be emphasized that these constraints must hold for all t =
0, 1, ..., T . That is, each of these equations actually describes T + 1 constraints.
2
2.2 The Kuhn-Tucker Formulation
Let’s begin by applying the Kuhn-Tucker Theorem to solve this problem. That is, let’s set
up the Lagrangian and take first-order conditions.
Set up the Lagrangian, recognizing that the constraints must hold for all t = 0, 1, ..., T :
X
T X
T X
T
t
L= β F (yt, zt ; t)+ π t+1 [yt +Q(yt , zt ; t)−yt+1 ]+ λt [c−G(yt , zt ; t)]+φ(yT +1 −y ∗ )
t=0 t=0 t=0
The Kuhn-Tucker theorem tells us that the solution to this problem must satisfy the FOC
for the choice variables zt for t = 0, 1, ..., T and yt for t = 1, 2, ..., T + 1.
FOC for zt , t = 0, 1, ..., T :
β t Fz (yt , zt ; t) + π t+1 Qz (yt , zt ; t) − λt Gz (yt , zt ; t) = 0 (1)
for all t = 0, 1, ..T.
FOC for yt , t = 1, 2, ..., T :
β t Fy (yt , zt ; t) + π t+1 + πt+1 Qy (yt , zt ; t) − λt Gy (yt , zt ; t) − π t = 0
or
π t+1 − π t = −[β t Fy (yt , zt ; t) + π t+1 Qy (yt , zt ; t) − λt Gy (yt , zt ; t)] (2)
for all t = 1, 2, ..., T .
FOC for yT +1 :
−π T +1 + φ = 0
Let’s assume that the problem is such that the constraint governing the evolution of the
stock variable always holds with equality, as will typically be the case in economic
applications. Then another condition describing the solution to the problem is
yt+1 − yt = Q(yt , zt ; t) (3)
for all t = 0, 1, ..., T .
Finally, let’s write down the initial condition for the stock variable and the complementary
slackness condition for the constraint on the terminal value of the stock:
y0 given (4)
φ(yT +1 − y ∗ ) = 0
or, using the FOC for yT +1 :
π T +1 (yT +1 − y ∗ ) = 0 (5)
Notes:
3
a) Together with the complementary slackness condition
λt [c − G(yt , zt ; t)] = 0,
which implies either
λt = 0 or c = G(yt , zt ; t),
we can think of (1)-(3) as forming a system of four equations in four unknowns
yt , zt , πt , λt . This system of equations determines the problem’s solution.
b) Equations (2) and (3), linking the values of yt and π t at adjacent points in time, are
examples of difference equations. They must be solved subject to two boundary
conditions:
The initial condition (4).
The terminal, or transversality, condition (5).
c) The analysis can also be applied to the case of an infinite time horizon, where
T = ∞. In this case, (1) must hold for all t = 0, 1, 2, ..., (2) must hold for all
t = 1, 2, 3, ..., (3) must hold for all t = 0, 1, 2, ..., and (5) becomes a condition on
the limiting behavior of πt and yt :
lim π T +1 (yT +1 − y ∗ ) = 0. (6)
T →∞
2.3 An Alternative Formulation

Now let’s consider the problem in a slightly different way.
Begin by defining the Hamiltonian for time t:
H(yt , π t+1 ; t) = max β t F (yt , zt ; t) + π t+1 Q(yt , zt ; t) subject to c ≥ G(yt , zt ; t) (7)
zt
Note that the Hamiltonian is a maximum value function.

Note also that the maximization problem on the right-hand side of (7) is a static optimiza-
tion problem, involving no dynamic elements.
By the Kuhn-Tucker theorem:
H(yt , π t+1 ; t) = max β t F (yt , zt ; t) + π t+1 Q(yt , zt ; t) + λt [c − G(yt , zt ; t)]
zt
And by the envelope theorem:

Hy (yt , π t+1 ; t) = β t Fy (yt , zt ; t) + π t+1 Qy (yt , zt ; t) − λt Gy (yt , zt ; t) (8)
and
Hπ (yt , π t+1 ; t) = Q(yt , zt ; t) (9)
where zt solves the optimization problem on the right-hand side of (7) and must there-
fore satisfy the FOC:
4
Now notice the following:
a) Equation (10) coincides with (1).

b) In light of (8) and (9), (2) and (3) can be written more compactly as
πt+1 − πt = −[β t Fy (yt , zt ; t) + π t+1 Qy (yt , zt ; t) − λt Gy (yt , zt ; t)] (2)
π t+1 − π t = −Hy (yt , π t+1 ; t) (11)
and
yt+1 − yt = Q(yt , zt ; t) (3)
yt+1 − yt = Hπ (yt , π t+1 ; t). (12)
This establishes the following result.

Theorem (Maximum Principle) Consider the discrete time dynamic optimization prob-
lem of choosing sequences {zt }Tt=0 and {yt }Tt=1
+1
to maximize the objective function
X
T
t=0
subject to the constraints

yt + Q(yt , zt ; t) ≥ yt+1
for all t = 0, 1, ..., T ,
c ≥ G(yt , zt ; t)
for all t = 0, 1, .., T ,
y0 given
and
yT +1 ≥ y ∗ .
Associated with this problem, define the Hamiltonian
H(yt , π t+1 ; t) = max β t F (yt , zt ; t) + π t+1 Q(yt , zt ; t) st c ≥ G(yt , zt ; t). (7)
zt
Then the solution to the dynamic optimization problem must satisfy
a) The first-order condition for the static optimization problem on the right-hand side
of (7):
for all t = 0, 1, ..., T .
b) The pair of difference equations:
π t+1 − π t = −Hy (yt , π t+1 ; t) (11)
for all t = 1, 2, ..., T and
yt+1 − yt = Hπ (yt , π t+1 ; t) (12)
for all t = 0, 1, ..., T , where the derivatives of H can be calculated using the
envelope theorem.
5
c) The initial condition
y0 given (4)
d) The terminal, or transversality, condition
π T +1 (yT +1 − y ∗ ) = 0 (5)
in the case where T < ∞ or
lim π T +1 (yT +1 − y ∗ ) = 0 (6)

T →∞
in the case where T = ∞.
According to the maximum principle, there are two ways of solving discrete time dynamic
optimization problems, both of which lead to the same answer:
a) Set up the Lagrangian for the dynamic optimization problem and take first-order
conditions for all t = 0, 1, ..., T .
b) Set up the Hamiltonian for the problem and derive the first-order and envelope con-
ditions (10)-(12) for the static optimization problem that appears in the definition
of the Hamiltonian.
3 The Maximum Principle: Continuous Time

3.1 A Dynamic Optimization Problem in Continuous Time
Like the extension from static to dynamic optimization, the extension from discrete to
continuous time requires no new substantive ideas, but does require some changes in
notation.
Accordingly, suppose now that the variable t, instead of taking on discrete values t =
0, 1, ..., T , takes on continuous values t ∈ [0, T ], where as before, T can be finite or
infinite.
It is most convenient now to regard the variables as functions of time:
y(t) = stock variable

z(t) = flow variable
The obvious analog to the objective function from before is:

Z T
e−ρt F (y(t), z(t); t)dt
0
ρ ≥ 0 = discount rate
Example:
6
β = 0.95
ρ = 0.05
β t for t = 1 is 0.95
e−ρt for t = 1 is 0.951, or approximately 0.95
Consider next the constraint describing the evolution of the stock variable.
In the discrete time case, the interval between time periods is just ∆t = 1.
Hence, the constraint might be written as
Q(y(t), z(t); t)∆t ≥ y(t + ∆t) − y(t)
or
y(t + ∆t) − y(t)
Q(y(t), z(t); t) ≥
∆t
In the limit as the interval ∆t goes to zero, this last expression simplifies to
Q(y(t), z(t); t) ≥ ẏ(t)
for all t ∈ [0, T ], where ẏ(t) denotes the derivative of y(t) with respect to t.
The constraint applying to variables at a given point in time remains the same:
c ≥ G(y(t), z(t); t)
for all t ∈ [0, T ].
Note once again that these constraints must hold for all t ∈ [0, T ]. Thus, each of the two
equations from above actually represents an entire continuum of constraints.
Finally, the initial and terminal constraints for the stock variable remain unchanged:
y(0) given
y(T ) ≥ y ∗
The dynamic optimization problem can now be stated as: choose functions z(t) for t ∈
[0, T ] and y(t) for t ∈ (0, T ] to maximize the objective function subject to all of the
constraints.
7
Once again, let’s begin by setting up the Lagrangian and taking first-order conditions:
Z T Z T
−ρt
L = e F (y(t), z(t); t)dt + π(t)[Q(y(t), z(t); t) − ẏ(t)]dt
0 0
Z T
+ λ(t)[c − G(y(t), z(t); t)]dt + φ[y(T ) − y ∗ ]
0
Now we are faced with a problem: y(t) is a choice variable for all t ∈ [0, T ], but ẏ(t) appears
in the Lagrangian.
To solve this problem, use integration by parts:
Z T½ ¾ Z T Z T
d
[π(t)y(t)] dt = π̇(t)y(t)dt + π(t)ẏ(t)dt
0 dt 0 0
Z T Z T
π(T )y(T ) − π(0)y(0) = π̇(t)y(t)dt + π(t)ẏ(t)dt
0 0
Z T Z T
− π(t)ẏ(t)dt = π̇(t)y(t)dt + π(0)y(0) − π(T )y(T )
0 0
Use this result to rewrite the Lagrangian as

Z T Z T
−ρt
L = e F (y(t), z(t); t)dt + π(t)Q(y(t), z(t); t)dt
0 0
Z T
+ π̇(t)y(t)dt + π(0)y(0) − π(T )y(T )
0
Z T
+ λ(t)[c − G(y(t), z(t); t)]dt + φ[y(T ) − y ∗ ]
0
Before taking first-order conditions, note that the multipliers π(t) and λ(t) are functions of t
and that the corresponding constraints appear in the form of integrals. These features
of the Lagrangian reflect the fact that the constraints must hold for all t ∈ [0, T ].
FOC for z(t), t ∈ [0, T ]:
e−ρt Fz (y(t), z(t); t) + π(t)Qz (y(t), z(t); t) − λ(t)Gz (y(t), z(t); t) = 0 (13)
for all t ∈ [0, T ]
FOC for y(t), t ∈ (0, T ):
e−ρt Fy (y(t), z(t); t) + π(t)Qy (y(t), z(t); t) + π̇(t) − λ(t)Gy (y(t), z(t); t) = 0
or
π̇(t) = −[e−ρt Fy (y(t), z(t); t) + π(t)Qy (y(t), z(t); t) − λ(t)Gy (y(t), z(t); t)]
for all t ∈ (0, T ).
8
If we require all functions of t to be continuously differentiable, then this last equation will
also hold for t = 0 and t = T , so that we can write
π̇(t) = −[e−ρt Fy (y(t), z(t); t) + π(t)Qy (y(t), z(t); t) − λ(t)Gy (y(t), z(t); t)] (14)
FOC for y(T ):
0 = e−ρT Fy (y(T ), z(T ); T ) + π(T )Qy (y(T ), z(T ); T )

+π̇(T ) − π(T ) − λ(T )Gy (y(T ), z(T ); T ) + φ
or, using (14),

π(T ) = φ
Assume, as before, that the constraint governing ẏ(t) holds with equality:
ẏ(t) = Q(y(t), z(t); t) (15)
Finally, write down the initial condition
y(0) given (16)
and the complementary slackness, or transversality condition
φ[y(T ) − y ∗ ] = 0
or
π(T )[y(T ) − y ∗ ] = 0 (17)
or in the infinite-horizon case
lim π(T )[y(T ) − y ∗ ] = 0. (18)

T →∞
Notes:
a) Together with the complementary slackness condition
λ(t)[c − G(y(t), z(t); t)] = 0,
we can think of (13)-(15) as a system of four equations in four unknowns y(t),

z(t), π(t), and λ(t). This system of equations determines the problem’s solution.
b) Equations (14) and (15), describing the behavior of ẏ(t) and π̇(t), are examples of
differential equations. They must be solved subject to two boundary conditions:
(16) and either (17) or (18).
9
As before, define the Hamiltonian for this problem as
H(y(t), π(t); t) = max e−ρt F (y(t), z(t); t) + π(t)Q(y(t), z(t); t) (19)

z(t)
st c ≥ G(y(t), z(t); t)
As before, the Hamiltonian is a maximum value function. And as before, the maximization
problem of the right-hand side is a static one.
H(y(t), π(t); t) = max e−ρt F (y(t), z(t); t) + π(t)Q(y(t), z(t); t) + λ(t)[c − G(y(t), z(t); t)]
z(t)
Hy (y(t), π(t); t) = e−ρt Fy (y(t), z(t); t)+π(t)Qy (y(t), z(t); t)−λ(t)Gy (y(t), z(t); t) (20)
and
Hπ (y(t), π(t); t) = Q(y(t), z(t); t) (21)
where z(t) solves the optimization problem on the right-hand side of (19) and must
therefore satisfy the FOC:
e−ρt Fz (y(t), z(t); t) + π(t)Qz (y(t), z(t); t) − λ(t)Gz (y(t), z(t); t) = 0. (22)
Now notice the following:
a) Equation (22) coincides with (13).

b) In light of (20) and (21), (14) and (15) can be written more compactly as
π̇(t) = −Hy (y(t), π(t); t) (23)
and
ẏ(t) = Hπ (y(t), π(t); t). (24)
This establishes the following result.
Theorem (Maximum Principle) Consider the continuous time dynamic optimization

problem of choosing continuously differentiable functions z(t) and y(t) for t ∈ [0, T ] to
maximize the objective function
Z T
e−ρt F (y(t), z(t); t)dt
0

Q(y(t), z(t); t) ≥ ẏ(t)
10
for all t ∈ [0, T ],
c ≥ G(y(t), z(t); t)
for all t ∈ [0, T ],
y(0) given,
and
y(T ) ≥ y ∗ .
Associated with this problem, define the Hamiltonian
z(t)
st c ≥ G(y(t), z(t); t).

Then the solution to the dynamic optimization problem must satisfy
a) The first-order condition for the static optimization problem on the right-hand side
of (19):
b) The pair of differential equations
π̇(t) = −Hy (y(t), π(t); t) (23)
and
ẏ(t) = Hπ (y(t), π(t); t) (24)
for all t ∈ [0, T ], where the derivatives of H can be calculated using the envelope
theorem.
c) The initial condition
y(0) given. (16)
d) The terminal, or transversality, condition
π(T )[y(T ) − y ∗ ] = 0 (17)
in the case where T < ∞ or
lim π(T )[y(T ) − y ∗ ] = 0. (18)
T →∞
in the case where T = ∞.
Once again, according to the maximum principle, there are two ways of solving continuous
time dynamic optimization problems, both of which lead to the same answer:
a) Set up the Lagrangian for the dynamic optimization problem and take first-order
conditions for all t ∈ [0, T ].
b) Set up the Hamiltonian for the problem and derive the first-order and envelope con-
ditions (22)-(24) for the static optimization problem that appears in the definition
of the Hamiltonian.
11
4 Two Examples
4.1 Life-Cycle Saving
Consider a consumer who is employed for T + 1 years: t = 0, 1, ..., T .
w = constant annual labor income
kt = stock of assets at the beginning of period t = 0, 1, ..., T + 1
k0 = 0
kt can be negative for t = 1, 2, ..., T , so that the consumer is allowed borrow.
However, kT +1 must satisfy
kT +1 ≥ k∗ > 0
where k∗ denotes saving required for retirement.
r = constant interest rate
total income during period t = w + rkt
ct = consumption
Hence,
kt+1 = kt + w + rkt − ct
or equivalently,
kt + Q(kt , ct ; t) ≥ kt+1
where
Q(kt , ct ; t) = Q(kt , ct ) = w + rkt − ct
for all t = 0, 1, ..., T
Utility function:
X
T
β t ln(ct )
t=0
The consumer’s problem: choose sequences {ct }Tt=0 and {kt }Tt=1
+1
to maximize the utility
function subject to all of the constraints.
For this problem:
kt = stock variable
ct = flow variable
To solve this problem, set up the Hamiltonian:
H(kt , π t+1 ; t) = max β t ln(ct ) + πt+1 (w + rkt − ct )

ct
12
FOC for ct :
βt
= πt+1 (25)
ct
Difference equations for π t and kt :
π t+1 − π t = −Hk (kt , π t+1 ; t) = −π t+1 r (26)
and
kt+1 − kt = Hπ (kt , π t+1 ; t) = w + rkt − ct (27)
Equations (25)-(27) represent a system of three equations in the three unknowns ct , π t , and
kt . They must be solved subject to the boundary conditions
k0 = 0 given (28)
and
πT +1 (kT +1 − k∗ ) = 0 (29)
We can use (25)-(29) to deduce some key properties of the solution even without solving
the system completely.
Note first that (25) implies that
βT
π T +1 = > 0.
cT
Hence, it follows from (29) that
kT +1 = k ∗ .
Thus, the consumer saves just enough for retirement and no more.
Next, note that (26) implies
π t+1 − π t = −π t+1 r
(1 + r)π t+1 = π t (30)
Use (25) to obtain
βt β t−1
πt+1 = and π t =
ct ct−1
and substitute these expressions into (30) to obtain
βt β t−1
(1 + r) =
ct ct−1
β 1
(1 + r) =
ct ct−1
ct
= β(1 + r) (31)
ct−1
Equation (31) reveals that the optimal growth rate of consumption is constant, and is faster
for a more patient consumer, with a higher value of β, and a consumer that faces a
higher interest rate r.
13
4.2 Optimal Growth
Consider an economy in which output is produced with capital according to the production
function
F (kt ) = ktα ,
where 0 < α < 1.
ct = consumption
δ = depreciation rate for capital, 0 < δ < 1
Then the evolution of the capital stock is governed by
kt+1 = ktα + (1 − δ)kt − ct
or
kt+1 − kt = ktα − δkt − ct
Our first example had a finite horizon and was cast in discrete time. So for the sake of
variety, make this second example have an infinite horizon in continuous time.
The continuous time analog to the capital accumulation constraint shown above is just
k(t)α − δk(t) − c(t) ≥ k̇(t)
or
Q(k(t), c(t); t) ≥ k̇(t),
where
Q(k(t), c(t); t) = Q(k(t), c(t)) = k(t)α − δk(t) − c(t)
for all t ∈ [0, ∞)
Initial condition:
k(0) given
Objective of a benevolent social planner or the utility of an infinitely-lived representative

consumer: Z ∞
e−ρt ln(c(t))dt,
0
where ρ > 0 is the discount rate.
The problem: choose continuously differentiable functions c(t) and k(t) for t ∈ [0, ∞) to
maximize utility subject to all of the constraints.
For this problem:
k(t) = stock variable

c(t) = flow variable
14
To solve this problem, set up the Hamiltonian:
H(k(t), π(t); t) = max e−ρt ln(c(t)) + π(t)[k(t)α − δk(t) − c(t)]

c(t)
FOC for c(t):

e−ρt = c(t)π(t) (32)
Differential equations for π(t) and k(t):
π̇(t) = −Hk (k(t), π(t); t) = −π(t)[αk(t)α−1 − δ] (33)
and
k̇(t) = Hπ (k(t), π(t); t) = k(t)α − δk(t) − c(t). (34)
Equations (32)-(34) form a system of three equations in the three unknowns c(t), π(t), and
k(t). How can we solve them?
Start by differentiating both sides of (32) with respect to t:
e−ρt = c(t)π(t) (32)
−ρe−ρt = ċ(t)π(t) + c(t)π̇(t)

−ρc(t)π(t) = ċ(t)π(t) + c(t)π̇(t)
Next, use (33)

π̇(t) = −π(t)[αk(t)α−1 − δ] (33)
to rewrite this last equation as
−ρc(t)π(t) = ċ(t)π(t) − c(t)π(t)[αk(t)α−1 − δ]
−ρc(t) = ċ(t) − c(t)[αk(t)α−1 − δ]

ċ(t) = c(t)[αk(t)α−1 − δ − ρ] (35)
Collect (34) and (35):

k̇(t) = k(t)α − δk(t) − c(t). (34)
ċ(t) = c(t)[αk(t)α−1 − δ − ρ] (35)
and notice that these two differential equations depend only on k(t) and c(t).
Equation (35) implies that ċ(t) = 0 when
αk(t)α−1 − δ − ρ = 0
or µ ¶1/(α−1)
δ+ρ
k(t) = = k∗
α
15
And since α − 1 < 0, (35) also implies that ċ(t) < 0 when k(t) > k∗ and ċ(t) > 0 when
k(t) < k∗ .
Equation (34) implies that k̇(t) = 0 when
k(t)α − δk(t) − c(t) = 0
or
c(t) = k(t)α − δk(t).
Moreover, (34) implies that k̇(t) < 0 when
c(t) > k(t)α − δk(t)
and k̇(t) > 0 when

c(t) < k(t)α − δk(t)
We can illustrate these conditions graphically using a phase diagram, which reveals that:
The economy has a steady state at (k∗ , c∗ ).

For each possible value of k0 , there exists a unique value of c0 such that, starting from
(k0 , c0 ), the economy converges to the steady state (k∗ , c∗ ).
Starting from all other values of c0 , either k becomes negative or c approaches zero.
Trajectories that lead to negative values of k violate the nonnegativity condition for
capital, and hence cannot represent a solution.
Trajectories that lead towards zero values of c violate the transversality condition
e−ρT
lim π(T )k(T ) = lim k(T ) = 0
T →∞ T →∞ c(T )
and hence cannot represent a solution.

Hence, the phase diagram allows us to identify the model’s unique solution.
5 One Final Note on the Maximum Principle

In applying the maximum principle in discrete time, we defined the Hamiltonian as
H(yt , π t+1 ; t) = max β t F (yt , zt ; t) + π t+1 Q(yt , zt ; t) st c ≥ G(yt , zt ; t) (7)

zt
= max β t F (yt , zt ; t) + π t+1 Q(yt , zt ; t) + λt [c − G(yt , zt ; t)]

zt
and used this definition to derive the optimality conditions (10)-(12) and either (5) or
(6), depending on whether the horizon is finite or infinite.
The Hamiltonian, when defined as above, is often called the present-value Hamiltonian,
because β t F (yt , zt ; t) measures the present value at time t = 0 of the payoff F (yt , zt ; t)
received at time t > 0.
16
c’ = 0 isocline or locus (k = k*)
steady state
(k*,c*)
c*
k’ = 0 isocline or locus
(c = kα – δk)
saddle path or
stable manifold
k* k
For each
F h value
l off k0, there
th iis a unique
i value
l off c0 that
th t
leads the system to converge to the steady state.
The present-value Hamilton stands in contrast to the current-value Hamiltonian, defined
by multiplying both sides of (7) by β −t :
β −t H(yt , π t+1 ; t) = max F (yt , zt ; t) + β −t πt+1 Q(yt , zt ; t) + β −t λt [c − G(yt , zt ; t)]

zt
= max F (yt , zt ; t) + θt+1 Q(yt , zt ; t) + μt [c − G(yt , zt ; t)]
zt
= H̃(yt , θt+1 ; t),
where the last line states the definition of the current-value Hamiltonian H̃(yt , θt+1 ; t)
and where
θt+1 = β −t π t+1 ⇒ π t+1 = β t θt+1
and
μt = β −t λt ⇒ λt = β t μt
Let’s consider rewriting the optimality conditions (10)-(12) and (5) in terms of the current
value Hamiltonian H̃(yt , θt+1 ; t).
To do this, note first that by definition
H(yt , πt+1 ; t) = β t H̃(yt , θt+1 ; t) = β t H̃(yt , β −t π t+1 ; t)
Hence
Hy (yt , π t+1 ; t) = β t H̃y (yt , θt+1 ; t)
and
∂
Hπ (yt , π t+1 ; t) = [β t H̃(yt , β −t π t+1 ; t)]
∂π t+1
= β t β −t H̃θ (yt , β −t π t+1 ; t)
= H̃θ (yt , θt+1 ; t)
In light of these results, (10) can be rewritten
Fz (yt , zt ; t) + β −t πt+1 Qz (yt , zt ; t) − β −t λt Gz (yt , zt ; t) = 0

Fz (yt , zt ; t) + θt+1 Qz (yt , zt ; t) − μt Gz (yt , zt ; t) = 0 (100 )
(11) can be rewritten

π t+1 − π t = −Hy (yt , π t+1 ; t) (11)
β t θt+1 − β t−1 θt = −β t H̃y (yt , θt+1 ; t)
θt+1 − β −1 θt = −H̃y (yt , θt+1 ; t) (110 )

yt+1 − yt = Hπ (yt , πt+1 ; t) (12)
yt+1 − yt = H̃θ (yt , θt+1 ; t) (120 )
17
π T +1 (yT +1 − y ∗ ) = 0 (5)
β T θT +1 (yT +1 − y ∗ ) = 0 (50 )

lim π T +1 (yT +1 − y ∗ ) = 0 (6)
T →∞
lim β T θT +1 (yT +1 − y ∗ ) = 0 (60 )

T →∞
Thus, when the maximum principle in discrete time is stated in terms of the current-value
Hamiltonian instead of the present-value Hamiltonian, (10)-(12) and (5) or (6) are
replaced by (100 )-(120 ) and (50 ) or (60 ).
We can use the same types of transformations in the case of continuous time, where the
present-value Hamiltonian is defined by

z(t)
st c ≥ G(y(t), z(t); t)
= max e−ρt F (y(t), z(t); t) + π(t)Q(y(t), z(t); t)
z(t)
+λ(t)[c − G(y(t), z(t); t)]
Define the current-value Hamiltonian by multiplying both sides of (19) by eρt :
eρt H(y(t), π(t); t) = max F (y(t), z(t); t) + eρt π(t)Q(y(t), z(t); t)

z(t)
ρt
+e λ(t)[c − G(y(t), z(t); t)]
= max F (y(t), z(t); t) + θ(t)Q(y(t), z(t); t)
z(t)
+μ(t)[c − G(y(t), z(t); t)]

= H̃(y(t), θ(t); t)
where the last line defines the current-value Hamiltonian H̃(y(t), θ(t); t) and where
θ(t) = eρt π(t) ⇒ π(t) = e−ρt θ(t)
μ(t) = eρt λ(t) ⇒ λ(t) = e−ρt μ(t)
In the case of continuous time, the optimality conditions derived from (19) are (22)-(24)
and either (17) or (18). Let’s rewrite these conditions in terms of the current-value
Hamiltonian H̃(y(t), θ(t); t).
To begin, note that
H(y(t), π(t); t) = e−ρt H̃(y(t), θ(t); t) = e−ρt H̃(y(t), eρt π(t); t)
18
Hence
Hy (y(t), π(t); t) = e−ρt H̃y (y(t), θ(t); t)
and
∂
Hπ (y(t), π(t); t) = [e−ρt H̃(y(t), eρt π(t); t)]
∂π(t)
= e−ρt eρt H̃θ (y(t), eρt π(t); t)
= H̃θ (y(t), θ(t); t)
and, finally,
∂ −ρt
π̇(t) = [e θ(t)] = −ρe−ρt θ(t) + e−ρt θ̇(t)
∂t
In light of these results, (22) can be rewritten
Fz (y(t), z(t); t) + eρt π(t)Qz (y(t), z(t); t) − eρt λ(t)Gz (y(t), z(t); t) = 0
Fz (y(t), z(t); t) + θ(t)Qz (y(t), z(t); t) − μ(t)Gz (y(t), z(t); t) = 0 (220 )

π̇(t) = −Hy (y(t), π(t); t) (23)
−ρe−ρt θ(t) + e−ρt θ̇(t) = −e−ρt H̃y (y(t), θ(t); t)
θ̇(t) = ρθ(t) − H̃y (y(t), θ(t); t) (230 )

ẏ(t) = Hπ (y(t), π(t); t) (24)
ẏ(t) = H̃θ (y(t), θ(t); t) (240 )

π(T )[y(T ) − y ∗ ] = 0 (17)
e−ρT θ(T )[y(T ) − y ∗ ] = 0 (170 )

lim π(T )[y(T ) − y ∗ ] = 0 (18)
T →∞
lim e−ρT θ(T )[y(T ) − y ∗ ] = 0 (180 )

T →∞
Thus, when the maximum principle in continuous time is stated in terms of the current-
value Hamiltonian instead of the present-value Hamiltonian, (22)-(24) and (17) or (18)
are replaced by (220 )-(240 ) and (170 ) or (180 ).
19
Dynamic Programming
We have now studied two ways of solving dynamic optimization problems, one based
on the Kuhn-Tucker theorem and the other based on the maximum principle. These two
methods both lead us to the same sets of optimality conditions; they differ only in terms of
how those optimality conditions are derived.
Here, we will consider a third way of solving dynamic optimization problems: the method
of dynamic programming. We will see, once again, that dynamic programming leads us to the
same set of optimality conditions that the Kuhn-Tucker theorem does; once again, this new
method differs from the others only in terms of how the optimality conditions are derived.
While the maximum principle lends itself equally well to dynamic optimization problems
set in both discrete time and continuous time, dynamic programming is easiest to apply in
discrete time settings. On the other hand, dynamic programing, unlike the Kuhn-Tucker
theorem and the maximum principle, can be used quite easily to solve problems in which
optimal decisions must be made under conditions of uncertainty.
Thus, in our discussion of dynamic programming, we will begin by considering dynamic
programming under certainty; later, we will move on to consider stochastic dynamic pro-
gramming.
Reference:
Dixit, Chapter 11.
1 Dynamic Programming Under Certainty

1.1 A Perfect Foresight Dynamic Optimization Problem in Dis-
crete Time
No uncertainty
Discrete time, infinite horizon: t = 0, 1, 2, ...
yt = stock, or state, variable
zt = flow, or control, variable
Objective function:
X
∞
t=0
1
1 > β > 0 discount factor
Constraint describing the evolution of the state variable
Q(yt , zt ; t) ≥ yt+1 − yt
or
yt + Q(yt , zt ; t) ≥ yt+1
for all t = 0, 1, 2, ...
Constraint applying to variables within each period:
c ≥ G(yt , zt ; t)
for all t = 0, 1, 2, ...
Constraint on initial value of the state variable:
y0 given
The problem: choose sequences {zt }∞ ∞

t=0 and {yt }t=1 to maximize the objective function
subject to all of the constraints.
Notes:
a) It is important for the application of dynamic programming that the problem is

additively time separable: that is, the values of F , Q, and G at time t must
depend only on the values of yt and zt at time t.
b) Once again, it must be emphasized that although the constraints describing the
evolution of the state variable and that apply to the variables within each period
can each be written in the form of a single equation, these constraints must hold
for all t = 0, 1, 2, .... Thus, each equation actually represents an infinite number
of constraints.

Let’s being our analysis of this problem by applying the Kuhn-Tucker theorem. That is,
let’s begin by setting up the Lagrangian and taking first order conditions.
Set up the Lagrangian, recognizing that the constraints must hold for all t = 0, 1, 2, ...:
X
∞ X
∞ X
∞
t
L= β F (yt , zt ; t) + μ̃t+1 [yt + Q(yt , zt ; t) − yt+1 ] + λ̃t [c − G(yt , zt ; t)]
t=0 t=0 t=0
2
It will be convenient to define
μt+1 = β −(t+1) μ̃t+1 =⇒ μ̃t+1 = β t+1 μt+1
λt = β −t λ̃t =⇒ λ̃t = β t λt
and to rewrite the Lagrangian in terms of μt+1 and λt instead of μ̃t+1 and λ̃t :
X
∞ X
∞ X
∞
L= β t F (yt , zt ; t) + β t+1 μt+1 [yt + Q(yt , zt ; t) − yt+1 ] + β t λt [c − G(yt , zt ; t)]
t=0 t=0 t=0
FOC for zt , t = 0, 1, 2, ...:
β t F2 (yt , zt ; t) + β t+1 μt+1 Q2 (yt , zt ; t) − β t λt G2 (yt , zt ; t) = 0
FOC for yt , t = 1, 2, 3, ...:
β t F1 (yt , zt ; t) + β t+1 μt+1 [1 + Q1 (yt , zt ; t)] − β t λt G1 (yt , zt ; t) − β t μt = 0
Now, let’s suppose that somehow we could solve for μt as a function of the state variable
yt :
μt = W (yt ; t)
μt+1 = W (yt+1 ; t + 1) = W [yt + Q(yt , zt ; t); t + 1]
Then we could rewrite the FOC as:
F2 (yt , zt ; t) + βW [yt + Q(yt , zt ; t); t + 1]Q2 (yt , zt ; t) − λt G2 (yt , zt ; t) = 0 (1)
W (yt ; t) = F1 (yt , zt ; t) + βW [yt + Q(yt , zt ; t); t + 1][1 + Q1 (yt , zt ; t)] − λt G1 (yt , zt ; t) (2)
And together with the binding constraint
yt+1 = yt + Q(yt , zt ; t) (3)
and the complementary slackness condition
λt [c − G(yt , zt ; t)] = 0 (4)
we can think of (1) and (2) as forming a system of four equations in three unknown
variables yt , zt , and λt and one unknown function W (:, t). This system of equations
determines the problem’s solution.
Note that since (3) is in the form of a difference equation, finding the problem’s solution
involves solving a difference equation.
3
Now let’s consider the same problem in a slightly different way.
For any given value of the initial state variable y0 , define the value function
X
∞
v(y0 ; 0) = max β t F (yt , zt ; t)
{zt }∞ ∞
t=0 ,{yt }t=1
t=0
subject to
y0 given
yt + Q(yt , zt ; t) ≥ yt+1 for all t = 0, 1, 2, ...
c ≥ G(yt , zt ; t) for all t = 0, 1, 2, ...
More generally, for any period t and any value of yt , define

X
∞
v(yt ; t) = max β j F (yt+j , zt+j ; t + j)
{zt+j }∞ ∞
j=0 ,{yt+j }j=1
j=0
subject to
yt given
yt+j + Q(yt+j , zt+j ; t + j) ≥ yt+j+1 for all j = 0, 1, 2, ...
c ≥ G(yt+j , zt+j ; t + j) for all j = 0, 1, 2, ...
Note that the value function is a maximum value function.
Now consider expanding the definition of the value function by separating out the time t
components:
X
∞
v(yt ; t) = max [F (yt , zt ; t) + max β j F (yt+j , zt+j ; t + j)]
zt ,yt+1 {zt+j }∞ ∞
j=1 ,{yt+j }j=2
j=1
subject to
yt given
yt + Q(yt , zt ; t) ≥ yt+1
yt+j + Q(yt+j , zt+j ; t + j) ≥ yt+j+1 for all j = 1, 2, 3, ...
c ≥ G(yt , zt ; t)
c ≥ G(yt+j , zt+j ; t + j) for all j = 1, 2, 3, ...
4
Next, relabel the time indices:
X
∞
v(yt ; t) = max [F (yt , zt ; t) + β max β j F (yt+1+j , zt+1+j ; t + 1 + j)]
zt ,yt+1 {zt+1+j }∞ ∞
j=0 ,{yt+1+j }j=1
j=0
subject to
yt given
yt + Q(yt , zt ; t) ≥ yt+1
yt+j+1 + Q(yt+1+j , zt+1+j ; t + 1 + j) ≥ yt+1+j+1 for all j = 0, 1, 2, ...
c ≥ G(yt , zt ; t)
c ≥ G(yt+1+j , zt+1+j ; t + 1 + j) for all j = 0, 1, 2, ...
Now notice that together, the components for t + 1 + j, j = 0, 1, 2, ... define v(yt+1 ; t + 1),
enabling us to simplify the statement considerably:
v(yt ; t) = max F (yt , zt ; t) + βv(yt+1 ; t + 1)

zt ,yt+1
subject to
yt given
yt + Q(yt , zt ; t) ≥ yt+1
c ≥ G(yt , zt ; t)
Or, even more simply:
v(yt ; t) = max F (yt , zt ; t) + βv[yt + Q(yt , zt ; t); t + 1] (5)

zt
subject to
yt given
c ≥ G(yt , zt ; t)
Equation (5) is called the Bellman equation for this problem, and lies at the heart of the
dynamic programming approach.
Note that the maximization on the right-hand side of (5) is a static optimization problem,
involving no dynamic elements.
v(yt ; t) = max F (yt , zt ; t) + βv[yt + Q(yt , zt ; t); t + 1] + λt [c − G(yt , zt ; t)]

zt
The FOC for zt is
F2 (yt , zt ; t) + βv0 [yt + Q(yt , zt ; t); t + 1]Q2 (yt , zt ; t) − λt G2 (yt , zt ; t) = 0 (6)
5
v 0 (yt ; t) = F1 (yt , zt ; t) + βv 0 [yt + Q(yt , zt ; t); t + 1][1 + Q1 (yt , zt ; t)] − λt G1 (yt , zt ; t) (7)
Together with the binding constraint
yt+1 = yt + Q(yt , zt ; t) (3)
and complementary slackness condition
λt [c − G(yt , zt ; t)] = 0, (4)
we can think of (6) and (7) as forming a system of four equations in three unknown
variables yt , zt , and λt and one unknown function v(:, t). This system of equations
determines the problem’s solution.
Note once again that since (3) is in the form of a difference equation, finding the problem’s
solution involves solving a difference equation.
But more important, notice that (6) and (7) are equivalent to (1) and (2) with
v0 (yt ; t) = W (yt ; t).
Thus, we have two ways of solving this discrete time dynamic optimization problem, both
of which lead us to the same set of optimality conditions:
a) Set up the Lagrangian for the dynamic optimization problem and take first order
conditions for zt , t = 0, 1, 2, ... and yt , t = 1, 2, 3, ....
b) Set up the Bellman equation and take the first order condition for zt and then
derive the envelope condition for yt .
One question remains: How, in practice, can we solve for the unknown value functions
v(:, t)?
To see how to answer this question, consider two examples:
Example 1: Optimal Growth - Here, it will be possible to solve for v explicitly.

Example 2: Saving Under Certainty - Here, it will not be possible to solve for v
explicitly, yet we can learn enough about the properties of v to obtain some
useful economic insights.
6
2 Example 1: Optimal Growth
Here, we will modify the optimal growth example that we solved earlier using the maximum
principle in two ways:
a) We will switch to discrete time in order to facilitate the use of dynamic program-
ming.
b) Set the depreciation rate for capital equal to δ = 1 in order to obtain a very special
case in which an explicit solution for the value function can be found.
Production function:
F (kt ) = ktα
where 0 < α < 1
kt = capital (state variable)
ct = consumption (control variable)
Evolution of the capital stock:

kt+1 = ktα − ct
for all t = 0, 1, 2, ...
Initial condition:
k0 given
Utility or social welfare:

X
∞
β t ln(ct )
t=0
The social planner’s problem: choose sequences {ct }∞ ∞

t=0 and {kt }t=1 to maximize the utility
function subject to all of the constraints.
To solve this problem via dynamic programming, use
kt = state variable
ct = control variable
Set up the Bellman equation:
v(kt ; t) = max ln(ct ) + βv(ktα − ct ; t + 1)

ct
Now guess that the value function takes the time-invariant form
v(kt ; t) = v(kt ) = E + F ln(kt ),
where E and F are constants to be determined.
7
Using the guess for v, the Bellman equation becomes
E + F ln(kt ) = max ln(ct ) + βE + βF ln(ktα − ct ) (8)

ct
FOC for ct :
1 βF
− α =0 (9)
ct kt − ct
Envelope condition for kt :
F αβF kα−1
= α t (10)
kt kt − ct
kt+1 = ktα − ct ,
equations (8)-(10) form a system of four equations in 4 unknowns: ct , kt , E, and F .
Equation (9) implies

ktα − ct = βF ct
or µ ¶
1
ct = ktα (11)
1 + βF
Substitute (11) into the envelope condition (10):
F αβF kα−1
= α t (10)
kt kt − ct
µ ¶
α 1
F kt − F ktα = αβF ktα
1 + βF
µ ¶
1
1− = αβ
1 + βF
Hence
1
= 1 − αβ (12)
1 + βF
Or, equivalently,
1
1 + βF =
1 − αβ
1 αβ
βF = −1=
1 − αβ 1 − αβ
α
F = (13)
1 − αβ
8
Numerical Solutions to the Optimal Growth Model with Complete Depreciation
Generated using equations (14) and (15). Each example sets α = 0.33 and β = 0.99.
Example 1: k(0) = 0.01
consumption capital stock
0.5 0.2
0.4 0.15
0.3
0.1
0.2
0.1 0.05
0 0
0 5 10 15 20 0 5 10 15 20
Example 2: k(0) = 1
consumption capital stock
0.8 1.2
1
0.6
0.8
0.4 0.6
0.4
0.2
0.2
0 0
0 5 10 15 20 0 5 10 15 20
In both examples, c(t) converges to its steady state value of 0.388 and k(t) converges to its steady-state value of 0.188.
Substitute (12) into (11) to obtain
ct = (1 − αβ)ktα (14)
which shows that it is optimal to consume the fixed fraction 1 − αβ of output.
Evolution of capital:
kt+1 = ktα − ct = ktα − (1 − αβ)ktα = αβktα (15)
which is in the form of a difference equation for kt .
Equations (14) and (15) show how the optimal values of ct and kt+1 depend on the state
variable kt and the parameters α and β. Given a value for k0 , these two equations can
be used to construct the optimal sequences {ct }∞ ∞
t=0 and {kt }t=1 .
For the sake of completeness, substitute (14) and (15) back into (8) to solve for E:
E + F ln(kt ) = max ln(ct ) + βE + βF ln(ktα − ct ) (8)

ct
E + F ln(kt ) = ln(1 − αβ) + α ln(kt ) + βE + βF ln(αβ) + αβF ln(kt )
Since (13) implies that

F = α + αβF,
this last equality reduces to
E = ln(1 − αβ) + βE + βF ln(αβ)
which leads directly to the solution

αβ
ln(1 − αβ) + 1−αβ
ln(αβ)
E=
1−β
3 Example 2: Saving Under Certainty

Here, a consumer maximizes utility over an infinite horizon, t = 0, 1, 2, ..., earning income
from labor and from investments.
At = beginning-of-period assets
At can be negative, that is, the consumer is allowed to borrow
yt = labor income (exogenous)
ct = consumption
saving = st = At + yt − ct
r = constant interest rate
9
Evolution of assets:
At+1 = (1 + r)st = (1 + r)(At + yt − ct )
Note: µ ¶
1
At + yt − ct = At+1
1+r
µ ¶
1
At = At+1 + ct − yt
1+r
Similarly, µ ¶
1
At+1 = At+2 + ct+1 − yt+1
1+r
Combining these last two equalities yields
µ ¶2 µ ¶
1 1
At = At+2 + (ct+1 − yt+1 ) + (ct − yt )
1+r 1+r
Continuing in this manner yields

µ ¶T T −1 µ
X ¶j
1 1
At = At+T + (ct+j − yt+j ).
1+r j=0
1 + r
Now assume that the sequence {At }∞

t=0 must remain bounded (while borrowing is allowed,
unlimited borrowing is ruled out), and take the limit as T → ∞ to obtain
X∞ µ ¶j
1
At = (ct+j − yt+j )
j=0
1 + r
or ∞ µ ¶j ∞ µ ¶j
X 1 X 1
At + yt+j = ct+j . (16)
j=0
1 + r j=0
1 + r
Equation (16) takes the form of an infinite horizon budget constraint, indicating that over
the infinite horizon beginning at any period t, the consumer’s sources of funds include
assets At and the present value of current and future labor income, while the consumer’s
use of funds is summarized by the present value of current and future consumption.
The consumer’s problem: choose the sequences {st }∞ ∞
t=0 and {At }t=1 to maximize the utility
function
X∞ X
∞
β t u(ct ) = β t u(At + yt − st )
t=0 t=0
A0 given
and
(1 + r)st ≥ At+1
for all t = 0, 1, 2, ...
10
To solve the problem via dynamic programming, note first that
At = state variable
st = control
Set up the Bellman equation
v(At ; t) = max u(At + yt − st ) + βv(At+1 ; t + 1) st (1 + r)st ≥ At+1

st
v(At ; t) = max u(At + yt − st ) + βv[(1 + r)st ; t + 1]

st
FOC for st :
−u0 (At + yt − st ) + β(1 + r)v0 [(1 + r)st ; t + 1] = 0
Envelope condition for At :

v0 (At ; t) = u0 (At + yt − st )
Use the constraints to rewrite these optimality conditions as
u0 (ct ) = β(1 + r)v0 (At+1 ; t + 1) (17)
and
v0 (At ; t) = u0 (ct ) (18)
Since (18) must hold for all t = 0, 1, 2, ..., it implies
v 0 (At+1 ; t + 1) = u0 (ct+1 )
Substitute this result into (17) to obtain:
u0 (ct ) = β(1 + r)u0 (ct+1 ) (19)
Now make 2 extra assumptions:
a) β(1 + r) = 1 or 1 + r = 1/β, the interest rate equals the discount rate

b) u is strictly concave
Under these 2 additional assumptions, (19) implies
u0 (ct ) = u0 (ct+1 )
or
ct = ct+1
And since this last equation must hold for all t = 0, 1, 2, ..., it implies
ct = ct+j for all j = 0, 1, 2, ...
11
Now, return to (16):
X∞ µ ¶j X∞ µ ¶j
1 1
At + yt+j = ct+j . (16)
j=0
1 + r j=0
1 + r
X∞ µ ¶j X∞
1
At + yt+j = ct βj (20)
j=0
1 + r j=0
FACT: Since |β| < 1,

X
∞
1
βj =
j=0
1−β
To see why this is true, multiply both sides by 1 − β:
1−β
1 =
1−β
X
∞
= (1 − β) βj
j=0
= (1 + β + β + ...) − β(1 + β + β 2 + ...)

2
= (1 + β + β 2 + ...) − (β + β 2 + β 3 + ...)
= 1
Use this fact to rewrite (20):

X∞ µ ¶j µ ¶
1 1
At + yt+j = ct
j=0
1+r 1−β
or " #
X∞ µ ¶j
1
ct = (1 − β) At + yt+j (21)
j=0
1 + r
Equation (21) indicates that it is optimal to consume a fixed fraction 1−β of wealth at each
date t, where wealth consists of value of current asset holdings and the present dis-
counted value of future labor income. Thus, (21) describes a version of the permanent
income hypothesis.
4 Stochastic Dynamic Programming

4.1 A Dynamic Stochastic Optimization Problem
Discrete time, infinite horizon: t = 0, 1, 2, ...
yt = state variable
zt = control variable
12
εt+1 = random shock, which is observed at the beginning of t + 1
Thus, when zt is chosen:
εt is known ...
... but εt+1 is still viewed as random.
The shock εt+1 may be serially correlated, but will be assumed to have the Markov property
(i.e., to be generated by a Markov process): the distribution of εt+1 depends on εt , but
not on εt−1 , εt−2 , εt−3 , ....
For example, εt+1 may follow a first-order autoregressive process:
εt+1 = ρεt + ηt+1 .
Now, the full state of the economy at the beginning of each period is described jointly by
the pair of values for yt and εt , since the value for εt is relevant for forecasting, that is,
forming expectations of, future values of εt+j , j = 1, 2, 3, ....
Objective function:
X
∞
E0 β t F (yt , zt , εt )
t=0
1 > β > 0 discount factor
E0 = expected value as of t = 0
Constraint describing the evolution of the state variable
yt + Q(yt , zt , εt+1 ) ≥ yt+1
for all t = 0, 1, 2, ... and for all possible realizations of εt+1
Thus, the value of yt+1 does not become known until εt+1 is observed at the beginning of
t + 1 for all t = 0, 1, 2, ...
Constraint on initial value of the state variable:
y0 given
The problem: choose contingency plans for zt , t = 0, 1, 2, ..., and yt , t = 1, 2, 3, ..., to

maximize the objective function subject to all of the constraints.
Notes:
a) In order to incorporate uncertainty, we have really only made two adjustments to

the problem:
13
First, we have added the shock εt to the objective function for period t and the
shock εt+1 to the constraint linking periods t and t + 1.
And second, we have assumed that the planner cares about the expected value
of the objective function.
b) For simplicity, the functions F and Q are now assumed to be time-invariant,
although now they depend on the shock as well as on the state and control variable.
c) For simplicity, we have also dropped the second set of constraints, c ≥ G(yt , zt ).
Adding them back is straightforward, but complicates the algebra.
d) In the presence of uncertainty, the constraint
yt + Q(yt , zt , εt+1 ) ≥ yt+1
must hold, not only for all t = 0, 1, 2, ..., but for all possible realizations of εt+1
as well. Thus, this single equation can actually represent a very large number of
constraints.
e) The Kuhn-Tucker theorem can still be used to solve problems that feature uncer-
tainty. But because problems with uncertainty can have a very large number of
constraints, the Kuhn-Tucker theorem can become very difficult to apply in prac-
tice, since one may have to introduce a very large number of Lagrange multipliers.
Dynamic programming, therefore, can be an easier and more convenient way to
solve dynamic stochastic optimization problems.
4.2 The Dynamic Programming Formulation

Once again, for any values of y0 and ε0 , define
X
∞
v(y0 , ε0 ) = max E0 β t F (yt , zt , εt )
{zt }∞ ∞
t=0 ,{yt }t=1
t=0
subject to
y0 and ε0 given
yt + Q(yt , zt , εt+1 ) ≥ yt+1 for all t = 0, 1, 2, ... and all εt+1
More generally, for any period t and any values of yt and εt , define
X
∞
v(yt , εt ) = max Et β j F (yt+j , zt+j , εt+j )
{zt+j }j=0 ,{yt+j }∞
∞
j=1
j=0
subject to
yt and εt given
yt+j + Q(yt+j , zt+j , εt+j+1 ) ≥ yt+j+1 for all j = 0, 1, 2, ... and all εt+j+1
Note once again that the value function is a maximum value function.
14
Now separate out the time t components:
X
∞
v(yt , εt ) = max [F (yt , zt , εt ) + max Et β j F (yt+j , zt+j , εt+j )]
zt ,yt+1 {zt+j }∞ ∞
j=1 ,{yt+j }j=2
j=1
subject to
yt and εt given
yt + Q(yt , zt , εt+1 ) ≥ yt+1 for all εt+1
yt+j + Q(yt+j , zt+j , εt+j+1 ) ≥ yt+j+1 for all j = 1, 2, 3, ... and all εt+j+1
Relabel the time indices:

X
∞
v(yt , εt ) = max [F (yt , zt , εt ) + β max Et β j F (yt+1+j , zt+1+j , εt+1+j )]
zt ,yt+1 {zt+1+j }j=0 ,{yt+1+j }∞
∞
j=1
j=0
subject to
yt and εt given
yt+j+1 + Q(yt+1+j , zt+1+j , εt+1+j+1 ) ≥ yt+1+j+1 for all j = 0, 1, 2, ... and all εt+1+j+1
FACT (Law of Iterated Expectations): For any random variable Xt+j , realized at time t+j,
j = 0, 1, 2, ...:
Et Et+1 Xt+j = Et Xt+j .
To see why this fact holds true, consider the following example:
Suppose εt+1 follows the first-order autoregression:
εt+1 = ρεt + η t+1 , with Et ηt+1 = 0
Hence
εt+2 = ρεt+1 + η t+2 , with Et+1 ηt+2 = 0
or
εt+2 = ρ2 εt + ρη t+1 + ηt+2 .
It follows that
Et+1 εt+2 = Et+1 (ρ2 εt + ρη t+1 + ηt+2 ) = ρ2 εt + ρηt+1
and therefore
Et Et+1 εt+2 = Et (ρ2 εt + ρηt+1 ) = ρ2 εt .
It also follows that
Et εt+2 = Et (ρ2 εt + ρηt+1 + η t+2 ) = ρ2 εt .
15
So that in this case as in general
Et Et+1 εt+2 = Et εt+2
Using this fact:

X
∞
v(yt , εt ) = max [F (yt , zt , εt )+β max Et Et+1 β j F (yt+1+j , zt+1+j , εt+1+j )]
zt ,yt+1 {zt+1+j }∞ ∞
j=0 ,{yt+1+j }j=1
j=0
subject to
yt and εt given
yt+j+1 + Q(yt+1+j , zt+1+j , εt+1+j+1 ) ≥ yt+1+j+1 for all j = 0, 1, 2, ... and all εt+1+j+1
Now use the definition of v(yt+1 , εt+1 ) to simplify:
v(yt , εt ) = max F (yt , zt , εt ) + βEt v(yt+1 , εt+1 )

zt ,yt+1
subject to
yt and εt given
Or, even more simply:
v(yt , εt ) = max F (yt , zt , εt ) + βEt v[yt + Q(yt , zt , εt+1 ), εt+1 ] (22)

zt
Equation (22) is the Bellman equation for this stochastic problem.

Thus, in order to incorporate uncertainty into the dynamic programming framework, we
only need to make two modifications to the Bellman equation:
a) Include the shock εt as an additional argument of the value function.

b) Add the expectation term Et in front of the value function for t+1 on the right-hand
side.
Note that the maximization on the right-hand side of (22) is a static optimization problem,
involving no dynamic elements.
Note also that by substituting the constraints into the value function, we are left with
an unconstrained problem. Unlike the Kuhn-Tucker approach, which requires many
constraints and many multipliers, dynamic programming in this case has no constraints
and no multipliers.
The FOC for zt is
F2 (yt , zt , εt ) + βEt {v1 [yt + Q(yt , zt , εt+1 ), εt+1 ]Q2 (yt , zt , εt+1 )} = 0 (23)
16
The envelope condition for yt is:
v1 (yt , εt ) = F1 (yt , zt , εt ) + βEt {v1 [yt + Q(yt , zt , εt+1 ), εt+1 ][1 + Q1 (yt , zt , εt+1 )]} (24)
Equations (23)-(24) coincide exactly with the first-order conditions for zt and yt that we
would have derived through a direct application of the Kuhn-Tucker theorem to the
original, dynamic stochastic optimization problem.
yt+1 = yt + Q(yt , zt , εt+1 ) (25)
we can think of (23) and (24) as forming a system of three equations in two unknown
variables yt and zt and one unknown function v. This system of equations determines
the problem’s solution, given the behavior of the exogenous shocks εt .
Note that (25) is in the form of a difference equation; once again, solving a dynamic
optimization problem involves solving a difference equation.
5 Example 3: Saving with Multiple Random Returns

This example extends example 2 by:
a) Introducing n ≥ 1 assets
b) Allowing returns on each asset to be random
As in example 2, we will not be able to solve explicitly for the value function, but we will
be able to learn enough about its properties to derive some useful economic results.
Since we are extending the example in two ways, assume for simplicity that the consumer
receives no labor income, and therefore must finance all of his or her consumption by
investing.
At = beginning-of-period financial wealth
ct = consumption
sit = savings allocated to asset i = 1, 2, ..., n
Hence,
X
n
At = ct + sit
i=1
Rit+1 = random gross return on asset i, not known until t + 1

Hence, when sit is chosen:
Rit is known ...
17
... but Rit+1 is still viewed as random.
Hence
X
n
At+1 = Rit+1 sit
i=1
does not become known until the beginning of t+1, even though the sit must be chosen
during t.
Utility:
X
∞ X
∞ X
n
t t
E0 β u(ct ) = E0 β u(At − sit )
t=0 t=0 i=1
The problem can now be stated as: choose contingency plans for sit for all i = 1, 2, ..., n
and t = 0, 1, 2, ... and At for all t = 0, 1, 2, ... to maximize
X
∞ X
n
t
E0 β u(At − sit )
t=0 i=1
subject to
A0 given
and
X
n
Rit+1 sit ≥ At+1
i=1
for all t = 0, 1, 2, ... and all possible realizations of Rit+1 for each i = 1, 2, ..., n.
As in the general case, the returns can be serially correlated, but must have the Markov
property.
To solve this problem via dynamic programming, let
At = state variable
sit , i = 1, 2, ...n = control variables
Rt = [R1t , R2t , ...Rnt ] = vector of random returns
The Bellman equation is

X
n Xn
v(At , Rt ) = max
n
u(At − sit ) + βEt v( Rit+1 sit , Rt+1 )
{sit }i=1
i=1 i=1
FOC:
X
n Xn
0
−u (At − sit ) + βEt Rit+1 v1 ( Rit+1 sit , Rt+1 ) = 0
i=1 i=1
for all i = 1, 2, ..., n
18
Envelope condition:
X
n
0
v1 (At , Rt ) = u (At − sit )
i=1
Use the constraints to rewrite the FOC and envelope conditions more simply as
u0 (ct ) = βEt Rit+1 v1 (At+1 , Rt+1 )
for all i = 1, 2, ..., n and

v1 (At , Rt ) = u0 (ct )
Since the envelope condition must hold for all t = 0, 1, 2, ..., it implies
v1 (At+1 , Rt+1 ) = u0 (ct+1 )
Hence, the FOC imply that

u0 (ct ) = βEt Rit+1 u0 (ct+1 ) (26)
must hold for all i = 1, 2, ..., n
Equation (26) generalizes (19) to the case where there is more than one asset and where
the asset returns are random. It must hold for all assets i = 1, 2, ..., n, even though
each asset may pay a different return ex-post.
In example 2, we combined (19) with some additional assumptions to derive a version of

the permanent income hypothesis. Similarly, we can use (26) to derive a version of the
famous capital asset pricing model.
For simplicity, let

βu0 (ct+1 )
mt+1 =
u0 (ct )
denote the consumer’s intertemporal marginal rate of substitution.
Then (26) can be written more simply as
1 = Et Rit+1 mt+1 (27)
Keeping in mind that (27) must hold for all assets, suppose that there is a risk-free asset,
f f
with return Rt+1 that is known during period t. Then Rt+1 must satisfy
f
1 = Rt+1 Et mt+1
or
1
Et mt+1 = f
(28)
Rt+1
19
FACT: For any two random variables x and y,
cov(x, y) = E[(x − μx )(y − μy )], where μx = E(x) and μy = E(y).
Hence,
cov(x, y) = E[xy − μx y − xμy + μx μy ]
= E(xy) − μx μy − μx μy + μx μy
= E(xy) − μx μy
= E(xy) − E(x)E(y)
Or, by rearranging,
E(xy) = E(x)E(y) + cov(x, y)
Using this fact, (27) can be rewritten as

1 = Et Rit+1 mt+1 = Et Rit+1 Et mt+1 + covt (Rit+1 , mt+1 )
or, using (28),
f f
Rt+1 = Et Rit+1 + Rt+1 covt (Rit+1 , mt+1 )
f f
Et Rit+1 − Rt+1 = −Rt+1 covt (Rit+1 , mt+1 ) (29)
Equation (29) indicates that the expected return on asset i exceeds the risk-free rate only
if Rit+1 is negatively correlated with mt+1 .
Does this make sense?
Consider that an asset that acts like insurance pays a high return Rit+1 during bad
economic times, when consumption ct+1 is low. Therefore, for this asset:
covt (Rit+1 , ct+1 ) < 0 ⇒ covt [Rit+1 , u0 (ct+1 )] > 0
f
⇒ covt (Rit+1 , mt+1 ) > 0 ⇒ Et Rit+1 < Rt+1 .
This implication seems reasonable: assets that work like insurance often have
expected returns below the risk-free return.
Consider that common stocks tend to pay a high return Rit+1 during good economic
times, when consumption ct+1 is high. Therefore, for stocks:
covt (Rit+1 , ct+1 ) > 0 ⇒ covt [Rit+1 , u0 (ct+1 )] < 0
f
⇒ covt (Rit+1 , mt+1 ) < 0 ⇒ Et Rit+1 > Rt+1 .
This implication also seems to hold true: historically, stocks have had expected
returns above the risk-free return.
Recalling once more that (29) must hold for all assets, consider in particular the asset whose
return happens to coincide exactly with the representative consumer’s intertemporal
marginal rate of substitution:
m
Rt+1 = mt+1 .
20
For this asset, equation (29) implies
m f f m
Et Rt+1 − Rt+1 = −Rt+1 covt (Rt+1 , mt+1 )
f f f
Et mt+1 − Rt+1 = −Rt+1 covt (mt+1 , mt+1 ) = −Rt+1 vart (mt+1 )
or
f
f Et mt+1 − Rt+1
−Rt+1 = (30)
vart (mt+1 )
Substitute (30) into the right-hand side of (29) to obtain
f covt (Rit+1 , mt+1 ) f

Et Rit+1 − Rt+1 = (Et mt+1 − Rt+1 )
vart (mt+1 )
or
f f
Et Rit+1 − Rt+1 = bit (Et mt+1 − Rt+1 ), (31)
where
covt (Rit+1 , mt+1 )
bit =
vart (mt+1 )
is like the slope coefficient from a regression of Rit+1 on mt+1 .
Equation (31) is a statement of the consumption-based capital asset pricing model, or

consumption CAPM. This model links the expected return on each asset to the risk-
free rate and the representative consumer’s intertemporal marginal rate of substitution.
21
Eigenvalues and Eigenvectors
As we have seen, the equations characterizing the solution to a dynamic optimization

problem usually take the form of differential or difference equations. Thus, it will be useful
to consider differential and difference equations in more detail and to focus, in particular, on
ways in which these equations can be solved. In solving differential and difference equations,
some basic results from matrix algebra, having to do with eigenvalues and eigenvectors,
will often be of use. So before moving on to explicitly consider differential and difference
equations, we will review some definitions and facts about eigenvalues and eigenvectors.
Reference:
Simon and Blume, Chapter 23.
1 Definitions and Examples

Definition Let A be a square matrix. An eigenvalue of A is a number r that, when
subtracted from each of the diagonal elements of A, converts A into a singular matrix.
Since A is singular if and only if its determinant is zero, we can calculate the eigenvalues
of A by solving the characteristic equation
det(A − rI) = 0, (1)
where I is the identity matrix and det(A − rI) is the characteristic polynomial of A.
Example: Consider the general 2 × 2 matrix

∙ ¸
a11 a12
A= .
a21 a22
For this matrix,

∙ ¸
a11 − r a12
det(A − rI) = det
a21 a22 − r
= (a11 − r)(a22 − r) − a12 a21
= a11 a22 − (a11 + a22 )r + r2 − a12 a21
= r2 − (a11 + a22 )r + (a11 a22 − a12 a21 ),
1
so that the characteristic polynomial is a second order polynomial. Thus, the charac-
teristic equation (1) can be solved using the quadratic formula:
a11 + a22 ± [(a11 + a22 )2 − 4(a11 a22 − a12 a21 )]1/2
r= .
2
This example reveals that a 2 × 2 matrix has two eigenvalues. More generally, an n × n
matrix has n eigenvalues.
Recall, next, that a square matrix B is singular if and only if there exists a nonzero vector
x such that Bx = 0.
This fact tells us that if r is an eigenvalue of A, so that A − rI is singular, then there exists
a nonzero vector v such that
(A − rI)v = 0. (2)
This vector v is called an eigenvector of A corresponding to the eigenvalue r. Note that (2)
is equivalent to
Av = rv, (3)
so that an eigenvector must also satisfy (3).
Example: Consider the 2 × 2 matrix
∙ ¸
−1 3
A= .
2 0
The characteristic equation is
∙ ¸
−1 − r 3
0 = det = r(1 + r) − 6 = r2 + r − 6 = (r + 3)(r − 2),
2 −r
so that the eigenvalues of A are r1 = −3 and r2 = 2.
The eigenvector v1 corresponding to the eigenvalue r1 = −3 must satisfy
∙ ¸∙ ¸ ∙ ¸
−1 + 3 3 v11 0
=
2 3 v12 0
2v11 + 3v12 = 0
v12 = −(2/3)v11
Evidently, any vector of the form ∙ ¸

v11
−(2/3)v11
is an eigenvector of A. A simple vector of this form is
∙ ¸
3
−2
but, as this example shows, eigenvectors are not uniquely determined.
2
In fact, we might have seen this earlier by examining equation (2), defining an eigenvector:
(A − rI)v = 0. (2)
Clearly, if a vector v1 satisfies (2), then so does the vector αv1 for any α 6= 0.
Exercise: Show that the eigenvector v2 corresponding to r2 = 2 must be of the form

∙ ¸
v21
,
v21
as is ∙ ¸
1
.
1
2 Using Eigenvalues and Eigenvectors to Diagonalize

a Square Matrix
Eigenvalues and eigenvectors are useful in solving differential and difference equations be-
cause they can often be used to diagonalize a square matrix.
Let A be an n × n matrix, and consider the problem of finding a nonsingular matrix P such
that
P −1 AP = D, (4)
where D is a diagonal matrix.
To solve this problem, calculate the n eigenvalues of A, r1 , r2 , ..., rn , along with the cor-
responding eigenvectors v1 , v2 , ..., vn . Then form the matrix P using the eigenvectors
as its columns: £ ¤
P = v1 v2 ... vn .
And form the matrix D by placing the eigenvalues on the diagonal and zeros everywhere
else: ⎡ ⎤
r1 0 ... 0
⎢ 0 r2 ... 0 ⎥
D=⎢ ⎥
⎣ ... ... ... ... ⎦ .
0 0 ... rn
P will be nonsingular if A has n linearly independent eigenvectors. In this case, equation

(4) requires that
AP = P D
⎡ ⎤
r1 0 ... 0
£ ¤ £ ¤ ⎢ 0 r2 ... 0 ⎥
A v1 v2 ... vn = v1 v2 ... vn ⎢ ⎥
⎣ ... ... ... ... ⎦
0 0 ... rn
£ ¤ £ ¤
Av1 Av2 ... Avn = r1 v1 r2 v2 ... rn vn (5)
3
Since the ri ’s are eigenvalues and the vi ’s are eigenvectors, the definition (3) tells us that
(5) must hold.
Thus, we can state the following theorem.
Theorem Let A be an n × n matrix, let r1 , r2 , ..., rn be the eigenvalues of A, and let v1 ,

v2 , ..., vn be the corresponding eigenvectors. Form the matrix
£ ¤
P = v1 v2 ... vn
using the eigenvectors as columns. If A has n linearly independent eigenvectors, so

that P is nonsingular, then
P −1 AP = D, (4)
where ⎡ ⎤
r1 0 ... 0
⎢ 0 r2 ... 0 ⎥
D=⎢
⎣ ...
⎥
... ... ... ⎦
0 0 ... rn
is a matrix with the n eigenvalues along the diagonal and zeros everywhere else. Con-
versely, if P −1 AP = D is diagonal, then the diagonal elements of D must be the
eigenvalues of A, and the columns of P must be the corresponding eigenvectors.
Note: An n × n matrix that does not have n linearly independent eigenvectors is called
nondiagonalizable or defective.
Let’s check that (4) holds for the matrix A that we considered in one of our earlier examples:
∙ ¸
−1 3
A= .
2 0
For this choice of A, we’ve already found the eigenvalues r1 = −3 and r2 = 2, as well as
the corresponding eigenvectors
∙ ¸ ∙ ¸
3 1
v1 = and v2 = .
−2 1
Accordingly, form the matrix P using v1 and v2 as its columns:

∙ ¸
3 1
P = .
−2 1
For this matrix P , ∙ ¸

−1 1 1 −1
P = .
5 2 3
4
Now calculate
∙ ¸∙ ¸∙ ¸
−1 1 1 −1 −1 3 3 1
P AP =
5 2 3 2 0 −2 1
∙ ¸∙ ¸
1 −3 3 3 1
=
5 4 6 −2 1
∙ ¸
1 −15 0
=
5 0 10
∙ ¸
−3 0
=
0 2
∙ ¸
r1 0
=
0 r2
= D
exactly as required by (4).
3 Complex Eigenvalues
All of these results carry over to the case where the square matrix A has complex eigenvalues,
as the following example illustrates.
Let ∙ ¸
1 −2
A= .
2 1
For this choice of A, the characteristic polynomial is
∙ ¸
1 − r −2
det = (1 − r)2 + 4 = 1 − 2r + r2 + 4 = r2 − 2r + 5.
2 1−r
Solve the characteristic equation

r2 − 2r + 5 = 0
using the quadratic formula:
2 ± [4 − 20]1/2 2 ± 4i
r= = = 1 ± 2i.
2 2
Evidently, the two eigenvalues are
r1 = 1 + 2i
and
r2 = 1 − 2i.
Next, let’s find the eigenvector v1 corresponding to the eigenvalue r1 . Since this eigenvector
may also be complex, it will be of the general form
∙ ¸
α11 + β 11 i
v1 = .
α12 + β 12 i
5
Together r1 and v1 must satisfy
(A − r1 I)v1 = 0
½∙ ¸ ∙ ¸¾ ∙ ¸ ∙ ¸
1 −2 1 + 2i 0 α11 + β 11 i 0
− =
2 1 0 1 + 2i α12 + β 12 i 0
∙ ¸∙ ¸ ∙ ¸
−2i −2 α11 + β 11 i 0
= .
2 −2i α12 + β 12 i 0
Hence, in particular,
−2i(α11 + β 11 i) − 2(α12 + β 12 i) = 0
−2iα11 + 2β 11 − 2α12 − 2β 12 i = 0 + 0i,
which requires that
2(β 11 − α12 ) = 0 or α12 = β 11
and
−2(α11 + β 12 )i = 0i or β 12 = −α11 .
Evidently, v1 takes the more specific form
∙ ¸
α11 + β 11 i
v1 = .
β 11 − α11 i
One particularly simple choice for v1 is therefore

∙ ¸
1+i
v1 = .
1−i
Now let’s find the eigenvector v2 corresponding to the eigenvalue r2 . This second eigenvector
will be of the general form ∙ ¸
α21 + β 21 i
v2 = .
α22 + β 22 i
Together r2 and v2 must satisfy
(A − r2 I)v2 = 0
½∙ ¸ ∙ ¸¾ ∙ ¸ ∙ ¸
1 −2 1 − 2i 0 α21 + β 21 i 0
− =
2 1 0 1 − 2i α22 + β 22 i 0
∙ ¸∙ ¸ ∙ ¸
2i −2 α21 + β 21 i 0
= .
2 2i α22 + β 22 i 0
Hence, in particular,
2i(α21 + β 21 i) − 2(α22 + β 22 i) = 0
2iα21 − 2β 21 − 2α22 − 2β 22 i = 0 + 0i,
which requires that
−2(β 21 + α22 ) = 0 or α22 = −β 21
and
2(α21 − β 22 )i = 0i or β 22 = α21 .
6
Evidently, v2 takes the more specific form
∙ ¸
α21 + β 21 i
v2 = .
−β 21 + α21 i
One particularly simple choice for v2 is therefore

∙ ¸
1+i
v2 = .
−1 + i
Based on these results, form the matrix P using v1 and v2 as its columns:
∙ ¸
1+i 1+i
P = .
1 − i −1 + i
And form D using r1 and r2 as its diagonal elements:

∙ ¸
1 + 2i 0
D= .
0 1 − 2i
Now let’s verify that

P −1 AP = D
AP = P D
∙ ¸∙ ¸ ∙ ¸∙ ¸
1 −2 1+i 1+i 1+i 1+i 1 + 2i 0
=
2 1 1 − i −1 + i 1 − i −1 + i 0 1 − 2i
∙ ¸ ∙ ¸
1 + i − 2 + 2i 1 + i + 2 − 2i (1 + i)(1 + 2i) (1 + i)(1 − 2i)
=
2 + 2i + 1 − i 2 + 2i − 1 + i (1 − i)(1 + 2i) (−1 + i)(1 − 2i)
∙ ¸ ∙ ¸
−1 + 3i 3 − i −1 + 3i 3 − i
= ,
3+i 1 + 3i 3+i 1 + 3i
where the last line is exactly as required!
7
Differential Equations
Regardless of whether we use the Kuhn-Tucker theorem or the maximum principle to solve
a dynamic optimization problem in continuous time, we must ultimately solve a system of
differential equations. Thus, we will now consider differential equations and their solutions
in more detail. We will begin by considering the solution to a single differential equation
and then go on to consider the solution to systems of multiple differential equations.
1 Scalar Differential Equations

Reference:
1.1 Introduction
Let’s start by considering an example: a bank account with continuous compounding of
interest.
y(t) = funds in the bank account at time t

r = interest rate
The evolution of y(t) is described by
dy(t)
= ẏ(t) = ry(t), (1)
dt
since at each point in time, the interest payment ry(t) is added to the account.
One solution to (1) is
y(t) = ert .
To see this, note that
ẏ(t) = rert = ry(t).
But note also that
y(t) = kert
also satisfies (1), since
ẏ(t) = rkert = ry(t)
for any constant k.
1
Thus, in general, (1) has many solutions.
Why isn’t the solution determined uniquely in this example? To see why, just note
that in order to know the amount of funds in a bank account, one needs to know
more than just the rate of interest; one also needs to know the amount that was
initially deposited in the account.
Let y0 denote the amount initially deposited. Then y(t) must satisfy both (1) and
the initial condition
y(0) = y0 . (2)
This initial condition determines a specific value for k:
y(t) = kert =⇒ y(0) = k =⇒ k = y0 .
Hence, the unique function that satisfies both (1) and (2) is
y(t) = y0 ert .
Notes:
a) Obviously, knowing the value of y(t) at any date t would allow one to find the
exact value of k. But problems like finding a function y(t) that satisfies both (1)
and the initial condition (2) are so common that they have a special name: they
are called initial value problems.
b) The many solutions to (1) and the particular solution to (1) and (2) are all functions
y(t). This fact makes solving a differential equation more complicated than solving
a simple algebraic equation that has as its solution an unknown variable.
Definition An ordinary differential equation is an equation

ẏ(t) = F [y(t), t]
relating the derivative ẏ(t) of an unknown function y(t) to an expression involving y(t)
and t. If the expression does not specifically involve t, so that the equation can be
written as
ẏ(t) = F [y(t)],
then the differential equation is said to be autonomous or time-independent. Otherwise,
the differential equation is nonautonomous or time-dependent.
Notes:
a) The definition describes first order differential equations that involve only the
first derivative of the unknown function. More generally, an ith order differential
equation involves derivatives up to and including the ith derivative of y(t).
b) The definition describes ordinary differential equations that relate the value of a
function of a single variable to the value of its derivative. Differential equations
that link the value of a function of several variables to the values of its partial
derivatives are called partial differential equations.
2
c) Again, it is important to emphasize that solving a differential equation involves
finding an unknown function y(t).
d) Sometimes the solution to a differential equation will be a constant function of the
form y(t) = c, where c is a constant. These constant solutions are called steady
states, stationary solutions, stationary points, or equilibria. Since ẏ(t) = 0 when
y(t) = c, the steady states of a differential equation such as
ẏ(t) = F [y(t)]
can be found by finding any constants c that satisfy
0 = F (c).
e) As in our first example, the full set of solutions y(t) can often be indexed by a
parameter k, and written y(t, k). If every solution to the differential equation can
be achieved by letting k take on different values, then y(t, k) is called a general
solution of the differential equation.
1.2 Explicit Solutions

Most differential equations do not have solutions that can be written down explicitly. There
are, however, some important exceptions, two of which are described in the examples
below.
Example 1: Consider the differential equation
ẏ(t) = ay(t).
As we have seen, the general solution to this equation is
y(t) = keat ,
where different values of k correspond to different values of y(0) = y0 .

Example 2: Generalize the first example by considering the differential equation
ẏ(t) = ay(t) + b.
Here, the general solution is

y(t) = −b/a + keat ,
since
ẏ(t) = akeat
and
ay(t) + b = −b + akeat + b = akeat .
These two examples are among the few for which explicit solutions are available. Since these
examples are both ones in which y(t) enters the equation linearly, they are examples
of linear differential equations.
3
Fortunately, there is a general result on the existence and uniqueness of solutions to initial
value problems that is very easy to apply.
Theorem (Fundamental Theorem of Differential Equations) Consider the initial
value problem described by the differential equation
ẏ(t) = F [y(t), t]
and the initial condition
y(0) = y0 .
If F is continuous at (y0 , 0), then there exists a continuously differentiable function
y(t) defined on an interval I = (−a, a) for some a > 0 such that y(0) = y0 and
ẏ(t) = F [y(t), t] for all t ∈ I. That is, the function y(t) solves the problem on I.
Moreover, if F is continuously differentiable at (y0 , 0), then the solution y(t) is unique.
In some cases, the number a > 0 referred to in the theorem can be arbitrarily large. For
example, consider the initial value problem:
ẏ(t) = ry(t) and y(0) = y0 .
We know that the solution is
y(t) = y0 ert
and this solution clearly applies for all t ∈ (−∞, ∞).
But in other cases, a > 0 must be finite. For example, consider the initial value problem
1
ẏ(t) = and y(0) = 0.
t2 −1
A solution to this problem is given by
µ ¶
1 1−t
y(t) = ln
2 1+t
since
1
y(0) = ln(1) = 0
2
and
µ ¶∙ ¸
1 1+t (1 + t)(−1) − (1 − t)
ẏ(t) =
2 1−t (1 + t)2
µ ¶µ ¶
1 1 −2
=
2 1−t 1+t
∙ ¸
1
= −
(1 − t)(1 + t)
µ ¶
1
= −
1 − t2
1
= 2
.
t −1
4
But the solution µ ¶
1 1−t
y(t) = ln
2 1+t
is only defined for t ∈ (−1, 1), so in the theorem, a must be less than or equal to 1.
Notes:
a) One implication of the uniqueness results provided by this theorem is that if we

can guess a solution to an initial value problem and verify that it works, as we did
in our first, bank account example, then we know that we’ve found the problem’s
only solution.
b) The theorem also tells us that solutions to initial value problems usually exist and
are unique, even though we may not be able to find them explicitly.
1.3 Phase Diagrams

In many cases, we can characterize the solution to a differential equation graphically even
when we cannot find an explicit solution. One example of this is provided by the
optimal growth example that we considered in our discussion of the maximum principle.
There, we were able to use a phase diagram to illustrate the features of the solution to
the system of differential equations that follow from the application of the maximum
principle, even though we did not find the solution explicitly.
Now, let’s get a better feel for how phase diagrams work by considering some simpler
examples. In the earlier, optimal growth example, we had a system of two differential
equations, so that the phase diagram had to have two dimensions. In simpler examples,
with only a single differential equation, the phase diagram has just one dimension.
Example 1: Consider the differential equation
ẏ(t) = y(t)[2 − y(t)]. (3)
This equation is nonlinear, since it involves y(t)2 .

Recall that a steady state, or stationary solution to a differential equation is a solution
that takes the form of a constant function, with y(t) = c and ẏ(t) = 0. Thus, the
stationary solutions of (3) can be found by solving the algebraic equation
0 = c(2 − c).
Obviously, there are two stationary solutions:
y(t) = 0 and y(t) = 2.
Next, consider that (3) implies that ẏ(t) > 0 whenever:

y(t) > 0 and 2 > y(t) =⇒ 2 > y(t) > 0
5
(3) also implies that ẏ(t) > 0 whenever
y(t) < 0 and 2 < y(t), but these two conditions cannot hold simultaneously.
Thus, we know that (3) requires
ẏ(t) = 0 whenever y(t) = 0 or y(t) = 2
ẏ(t) > 0 whenever 2 > y(t) > 0
ẏ(t) < 0 whenever y(t) > 2 or 0 > y(t)
These results can be illustrated graphically using a one-dimensional phase diagram.
The phase diagram reveals that:
If y(0) = y0 < 0, then y(t) decreases forever.
If y(0) = y0 = 0, then y(t) remains at 0 forever.
If y(0) = y0 > 0, then y(t) converges to 2.
In this example, the stationary solution y(t) = 2 is asymptotically stable, since for all
values of y0 close to 2, the solution converges to the steady state.
On the other hand, the stationary solution y(t) = 0 is unstable, since y(t) moves away
from y0 6= 0, even when y0 is arbitrarily close to 0.
Example 2: Consider the more complicated differential equation

ẏ(t) = ey(t) {y(t)[1 − y(t)2 ]}.
Stationary solutions y(t) = c can be found be solving

0 = ec [c(1 − c2 )]
or, since ec is always positive,
0 = c(1 − c2 ).
Evidently, there are three stationary solutions:

y(t) = 0, y(t) = 1, and y(t) = −1
Next, consider that ẏ(t) > 0 whenever:

y(t) > 0 and 1 > y(t)2 =⇒ 1 > y(t) > 0
Also, ẏ(t) > 0 whenever
y(t) < 0 and y(t)2 > 1 =⇒ y(t) < −1
Thus,
ẏ(t) = 0 whenever y(t) = 0, y(t) = 1, or y(t) = −1
ẏ(t) > 0 whenever 1 > y(t) > 0 or y(t) < −1
ẏ(t) < 0 whenever y(t) > 1 or 0 > y(t) > −1
The phase diagram reveals that y(t) = −1 and y(t) = 1 are asymptotically stable,
while y(t) = 0 is unstable.
6
Example 1: y’(t) = y(t)[2-y(t)]
0 2 y
ONE-DIMENSIONAL
PHASE DIAGRAMS
Example 2: y’(t) = exp[y(t)]{y(t)[1-y(t)2]}
-1 0 1
2 Systems of Differential Equations
Reference:
2.1 Introduction
Typically, the process of solving a dynamic optimization problem in continuous time involves
solving a system of differential equations. The statement of the maximum principle,
for example, involves two differential equations: one describing the evolution of the
stock variable and the other describing the behavior of a Lagrange multiplier. So we
need to move on now to consider the solution of systems of differential equations.
A general system of two differential equations can be written as
ẋ(t) = F [x(t), y(t), t] (4)

ẏ(t) = G[x(t), y(t), t]
A solution to this system is a pair of functions x(t) and y(t) that satisfy (4) for all t.
Notes:
a) The system (4) is a first order system because it involves only the first derivatives
of the unknown functions. The system is autonomous if the functions F and G
do not specifically involve t; otherwise, the system is nonautonomous.
b) A solution to (4) will typically involve two parameters, k1 and k2 , such that by
varying the values of these parameters, one can obtain every solution to (4). In
this case, the solution written as x(t, k1 , k2 ) and y(t, k1 , k2 ) is the general solution
to (4).
c) The general solution to a system of n first order differential equations will contain
n parameters k1 , k2 , ..., kn .
d) The problem of finding a particular solution to (4) that also satisfies the initial
conditions x(0) = x0 and y(0) = y0 is called an initial value problem.
e) In our optimal growth example, we had a system of two differential equations and
we were trying to find the two unknown functions, y(t) and π(t), that solved that
system of equations. There, we had only one initial condition, y(0) given. But
the terminal, or transversality condition, gave us the other boundary condition
that we need to identify the particular solution.
Three useful facts:
Fact 1) Every second order differential equation can be written as a system of two
first order equations as follows. Starting from the second order equation
ÿ(t) = F [ẏ(t), y(t), t], (5)
7
define the new function
v(t) = ẏ(t).
Then
v̇(t) = ÿ(t),
so that (5) is equivalent to
ẏ(t) = v(t)
v̇(t) = F [v(t), y(t), t]
Thus, it is without loss of generality that we’ve focused mainly on first order
differential equations.
Fact 2) Every nonautonomous differential equation can be written as a system of two
autonomous equations as follows. Starting from the nonautonomous equation
ẏ(t) = F [y(t), t], (6)
define the new function

v(t) = t.
Then
v̇(t) = 1,
so that (6) is equivalent to
v̇(t) = 1
ẏ(t) = F [y(t), v(t)]
Thus, it is also without loss of generality that we’ve focused mainly on autonomous
differential equations.
Fact 3) The existence and uniqueness results for scalar differential equations also hold
for systems of differential equations.
2.2 Explicit Solutions

As in the scalar case, explicit solutions are available for systems of linear differential equa-
tions.
A general system of linear equations can be written as
ẋ1 (t) = a11 x1 (t) + a12 x2 (t) + ... + a1n xn (t)

ẋ2 (t) = a21 x1 (t) + a22 x2 (t) + ... + a2n xn (t)
...
ẋn (t) = an1 x1 (t) + an2 x2 (t) + ... + ann xn (t)
or, more simply, as

ẋ(t) = Ax(t), (7)
8
where ⎡ ⎤
x1 (t)
⎢ x2 (t) ⎥
x(t) = ⎢⎣ ...
⎥,
⎦
xn (t)
⎡ ⎤
ẋ1 (t)
⎢ ẋ2 (t) ⎥
ẋ(t) = ⎢
⎣ ...
⎥,
⎦
ẋn (t)
and ⎡ ⎤
a11 a12 ... a1n
⎢ a21 a22 ... a2n ⎥
A=⎢
⎣ ...
⎥.
... ... ... ⎦
an1 an2 ... ann
In the simplest case, A is diagonal, so that (7) is just a system of n self-contained equations,
each of which takes the form
ẋi (t) = aii xi (t).
We already know that the general solution in this case is just
xi (t) = ki eaii t
for i = 1, 2, ..., n.
But even in the more general case where A is not diagonal, we can often solve the linear
system (7) almost as easily by drawing on our results having to do with eigenvalues,
eigenvectors, and diagonalizable matrices.
Begin by calculating the eigenvalues r1 , r2 , ..., rn of A, together with the associated eigen-
vectors v1 , v2 , ..., vn . Then form the matrix
£ ¤
P = v1 v2 ... vn
using the eigenvectors as columns and the matrix

⎡ ⎤
r1 0 ... 0
⎢ 0 r2 ... 0 ⎥
D=⎢ ⎣ ... ... ... ...
⎥
⎦
0 0 ... rn
with the eigenvalues along the diagonal and zeros everywhere else. If the eigenvectors
are linearly independent, then we know from before that
P −1 AP = D.
9
Next, define a new vector of functions
z(t) = P −1 x(t),
so that
x(t) = P z(t),
ż(t) = P −1 ẋ(t),
and
ẋ(t) = P ż(t).
Now (7) can be rewritten as

ẋ(t) = Ax(t), (7)
P ż(t) = AP z(t)
ż(t) = P −1 AP z(t)
ż(t) = Dz(t). (8)
And since the matrix D is diagonal, (8) is just a system of n self-contained equations, each
of which takes the form
żi (t) = ri zi (t)
and therefore has the familiar solution
zi (t) = ki eri t .
Finally, with these solutions for the zi (t) in hand, undo the transformation to find the
solutions for the xi (t):
x(t) = P z(t)
⎡ ⎤
z1 (t)
£ ¤ ⎢ z2 (t) ⎥
= v1 v2 ... vn ⎢⎣ ...
⎥
⎦
zn (t)
⎡ ⎤
k1 er1 t
£ ¤ ⎢ k2 er2 t ⎥
= v1 v2 ... vn ⎢⎣ ...
⎥
⎦
rn t
kn e
= k1 er1 t v1 + k2 er2 t v2 + ... + kn ern t vn
Note that by setting k1 = k2 = ... = kn = 0, we obtain the stationary solution x(t) = 0 to

(7).
10
Suppose now that the eigenvalues r1 , r2 , ..., rn are real numbers. If all of the ri are negative,
then
lim eri t = 0
t→∞
for all i = 1, 2, ..., n, which implies that for any choice of k1 , k2 , ..., kn corresponding
to any set of initial conditions x1 (0), x2 (0), ..., xn (0),
lim x(t) = lim [k1 er1 t v1 + k2 er2 t v2 + ... + kn ern t vn ] = 0.

t→∞ t→∞
Thus, if all of the eigenvalues of A are real and negative, then the stationary solution
x(t) = 0 of (7) is asymptotically stable.
On the other hand, even if just one of the eigenvalues ri is positive, so that
lim eri t = ∞,
t→∞
then the stationary solution x(t) = 0 will be unstable.
2.3 Phase Diagrams

As in the scalar case, explicit solutions are usually not available for systems of differential
equations outside of the linear class. But again, even when an explicit solution is not
available, the solution can be characterized graphically using a phase diagram.
Example 1: Consider the system
ẋ(t) = −x(t) (9a)
ẏ(t) = −y(t) (9b)
We already know that the solution to this system is just
x(t) = k1 e−t and y(t) = k2 e−t ,
so that for any choices of k1 and k2 corresponding to any initial conditions x(0) =
x0 and y(0) = y0 ,
lim x(t) = 0 and lim y(t) = 0.
t→∞ t→∞
But let’s draw the phase diagram anyway, to make sure that it illustrates these
results.
Equation (9a) implies
ẋ(t) = 0 whenever x(t) = 0
ẋ(t) > 0 whenever x(t) < 0
ẋ(t) < 0 whenever x(t) > 0
Equation (9b) implies
ẏ(t) = 0 whenever y(t) = 0
ẏ(t) > 0 whenever y(t) < 0
11
ẏ(t) < 0 whenever y(t) > 0
These conditions can be illustrated using a two-dimensional phase diagram, which
shows that the stationary solution with x(t) = 0 and y(t) = 0 is asymptotically
stable.
Example 2: Now consider the more complicated system
ẋ(t) = y(t) − x(t)2 (10a)
ẏ(t) = −y(t) (10b)
This system is nonlinear, and is difficult to solve explicitly. But we can still charac-
terize the solution using a phase diagram.
Begin by considering stationary solutions of the form x(t) = x and y(t) = y. Equations
(10a) and (10b) imply that these solutions can be found by solving the equations
0 = y − x2
0 = −y.
Clearly, the only stationary solution has x(t) = 0 and y(t) = 0.
Next, note that (10a) implies
ẋ(t) = 0 whenever y(t) = x(t)2
ẋ(t) > 0 whenever y(t) > x(t)2
ẋ(t) < 0 whenever y(t) < x(t)2
Equation (10b) implies
ẏ(t) = 0 whenever y(t) = 0
ẏ(t) > 0 whenever y(t) < 0
ẏ(t) < 0 whenever y(t) > 0
The phase diagram reveals that the stationary solution with x(t) = 0 and y(t) = 0
is unstable: although there are some initial conditions, such as those labelled
as points A, B, and C, from which the system converges, there are other initial
conditions, such as those labelled as D and E, which are close to the steady state
but which to not lead to convergence.
3 A Linearized System
Before finishing with our discussion of differential equations, let’s return once again to the
system of differential equations that we derived when using the maximum principle to
solve the optimal growth example in continuous time.
12
Example 1: y
x’=0 (x=0)
x’(t) = -x(t), y’(t) = -y(t)
B
A
y’=0 (y=0)
D C
TWO-DIMENSIONAL
PHASE DIAGRAM
Example 2: y
x’(t) = y(t) - x(t)2
y’(t) = -y(t)
x’=0 (y=x2)
A B
y’=0 (y=0)
E
TWO-DIMENSIONAL
PHASE DIAGRAM
That system consisted of two nonlinear difference equations: one for the capital stock,
k̇(t) = k(t)α − δk(t) − c(t),
and the other for consumption,
ċ(t) = c(t)[αk(t)α−1 − δ − ρ]
Earlier, we used a phase diagram to characterize the solution to this system. The diagram
showed us that for each possible value of k(0), there exists a unique value of c(0) such
that the system converges to a steady state, with
µ ¶1/(α−1)
∗ δ+ρ
lim k(t) = k =
t→∞ α
and
lim c(t) = c∗ = k∗α − δk∗
t→∞
There is an alternative way of analyzing this system that relies on algebra rather than
geometry.
This alternative method involves taking a first order Taylor approximation around the
steady state (k∗ , c∗ ) to the expressions on the right hand side of each of the two equa-
tions, thereby approximating the nonlinear system by a linear system for which an
explicit solution exists.
Start by considering the differential equation for k(t):
k̇(t) = k(t)α − δk(t) − c(t) (11)

≈ (k∗α − δk∗ − c∗ ) + (αk∗α−1 − δ)[k(t) − k∗ ] − [c(t) − c∗ ]
= (αk ∗α−1 − δ)[k(t) − k∗ ] − [c(t) − c∗ ]
= ρ[k(t) − k∗ ] − [c(t) − c∗ ],
since µ ¶
∗α−1 δ+ρ
αk −δ =α − δ = ρ.
α
Next, consider the differential equation for c(t):
ċ(t) = c(t)[αk(t)α−1 − δ − ρ] (12)

≈ c∗ [αk∗α−1 − δ − ρ]
+α(α − 1)c∗ k∗α−2 [k(t) − k∗ ] + [αk∗α−1 − δ − ρ][c(t) − c∗ ]
= θ[k(t) − k∗ ],
where
θ = α(α − 1)c∗ k ∗α−2 < 0.
13
Now define the new variables
x(t) = k(t) − k∗
and
y(t) = c(t) − c∗ ,
so that x(t) is the deviation of k(t) from its steady state level and y(t) is the deviation
of c(t) from its steady state level. Note that these definitions imply that
ẋ(t) = k̇(t)
and
ẏ(t) = ċ(t).
In terms of these new variables, (11) and (12) can be rewritten as
k̇(t) = ρ[k(t) − k∗ ] − [c(t) − c∗ ] (11)
ẋ(t) = ρx(t) − y(t)

and
ċ(t) = θ[k(t) − k∗ ] (12)
ẏ(t) = θx(t)
If we let ∙ ¸
x(t)
z(t) = ,
y(t)
then these two equations can be written in matrix form as
∙ ¸ ∙ ¸∙ ¸
ẋ(t) ρ −1 x(t)
ż(t) = = = Az(t).
ẏ(t) θ 0 y(t)
We know that this system of linear differential equations has the general solution
z(t) = q1 er1 t v1 + q2 er2 t v2 ,
where r1 and r2 are the eigenvalues of A and v1 and v2 are the corresponding eigenvec-
tors.
By definition, r is an eigenvalue of A if it satisfies
0 = det(A − rI)
∙ ¸
ρ − r −1
= det
θ 0−r
= r2 − ρr + θ.
14
The quadratic formula then implies that the eigenvalues are
ρ − [ρ2 − 4θ]1/2
r1 =
2
and
ρ + [ρ2 − 4θ]1/2
r2 =
2
Since ρ > 0 and θ < 0,
ρ2 − 4θ > ρ2
so that
ρ − [ρ2 − 4θ]1/2 ρ − (ρ2 )1/2
r1 = < =0
2 2
while
ρ + [ρ2 − 4θ]1/2 ρ − (ρ2 )1/2
r2 = > = 0.
2 2
We now know that the general solution takes the form
z(t) = q1 er1 t v1 + q2 er2 t v2 ,
where r1 < 0 and r2 > 0. Thus, the requirement that the system converge to the
steady state, so that
∙ ¸ ∙ ¸
x(t) k(t) − k ∗
lim z(t) = = = 0,
t→∞ y(t) c(t) − c∗
amounts to the requirement that q2 = 0.
And if q2 = 0, the solution reduces to

∙ ¸
k(t) − k ∗
z(t) = = q1 er1 t v1 .
c(t) − c∗
Now consider ∙ ¸
k(0) − k∗
z(0) = = q1 v1 .
c(0) − c∗
This equation shows that the constant q1 can be chosen to satisfy the initial condition
k(0) given. This value of q1 , in turn, determines the unique value of c(0) that puts the
system on the saddle path towards (k∗ , c∗ ).
Once again, we can conclude that for each possible value of k(0), there exists a unique value
of c(0) such that the system converges to a steady state.
It turns out that many economic applications share the general structure of this example.
15
That is, a dynamic optimization problem with economic content will often give rise to
a system of n nonlinear differential equations that can be approximated by a linear
system of the form
ż(t) = Az(t),
where the n × 1 vector ∙ ¸
x(t)
z(t) =
y(t)
is made up of the n1 × 1 vector x(t) of predetermined variables, whose initial values
x(0) are given, and the n2 × 1 vector y(t) of nonpredetermined, or jump, variables,
whose initial values can adjust to place the system on the saddle path towards the
steady state, where n1 + n2 = n.
Order the eigenvalues of A so that r1 < r2 < ... < rn , and write the general solution to the
linear system as
z(t) = q1 er1 t v1 + q2 er2 t v2 + ... + qn ern t vn
If the first n1 eigenvalues are negative and the remaining n2 eigenvalues of A are positive,
then the requirement that the system converge to the steady state, so that
lim z(t) = 0
t→∞
requires that qn1 +1 = qn1 +2 = ... = qn = 0, reducing the solution to

∙ ¸
x(t)
z(t) = = q1 er1 t v1 + q2 er2 t v2 + ... + qn1 ern‘ t vn1
y(t)
Using ∙ ¸
x(0)
z(0) = = q1 v1 + q2 v2 + ... + qn1 vn1 ,
y(0)
the n1 constants q1 , q2 , ..., qn1 can be chosen to satisfy the n1 initial conditions given
by x(0). These values of q1 , q2 , ..., qn1 then determine the unique values in y(0) that
place the system on the saddle path towards the steady state.
Thus, in general, it is often said that an economic model has a unique solution if and only
if the number of negative eigenvalues is exactly equal to the number of predetermined
variables.
16
Difference Equations
Regardless of whether we use the Kuhn-Tucker theorem, the maximum principle, or

dynamic programming to solve a dynamic optimization problem in discrete time, we must
ultimately solve a system of difference equations. Thus, we will now consider difference
equations and their solutions in more detail.
As we will see, the issues and concepts involved with solving difference equations are quite
similar to those involved with solving differential equations. Thus, much of our discussion of
difference equations will parallel our earlier discussion of differential equations. In particular,
we will begin by focusing on linear difference equations, since linear difference equations, like
linear differential equations, have solutions that can be characterized most sharply. But later
in our discussion, we will consider some new tools that are especially useful in solving other
types of difference equations.
Reference:
1 Linear Difference Equations

We have seen that in continuous time, explicit solutions exist for linear differential equations.
Similarly, in discrete time, explicit solutions can be found for linear difference equations.
Let’s start by recasting our familiar bank account example in discrete time:
yt = funds in the bank account at time t

r = net interest rate
The evolution of yt is described by

yt+1 = (1 + r)yt (1)
Starting from y0 , use (1) to calculate:
y1 = (1 + r)y0
y2 = (1 + r)y1 = (1 + r)2 y0
y3 = (1 + r)y2 = (1 + r)3 y0
1
Hence, in general
yt = (1 + r)t y0
Another way of deriving this particular solution is to note that the general solution to (1)
takes the form
yt = k(1 + r)t
since this solution satisfies
yt+1 = k(1 + r)t+1 = (1 + r)k(1 + r)t = (1 + r)yt
A particular solution, that is, a specific value for k, can be found if we know a specific value
of yt at some date t. For example, if we have the initial condition y0 given, then the
general solution requires
y0 = k(1 + r)0 = k
so that the particular solution is found once again to be
yt = (1 + r)t y0
Equation (1) is an example of a first order difference equation, since only one past value of
yt+1 appears on the right-hand side.
Next, consider a second order difference equation, with two past values of yt+1 on the
right-hand side:
yt+1 = a1 yt + a2 yt−1 (2)
Suppose that we define the vector ∙ ¸

yt+1
zt+1 =
yt
Then we could rewrite (2) as

∙ ¸ ∙ ¸∙ ¸
yt+1 a1 a2 yt
zt+1 = = = Azt
yt 1 0 yt−1
Thus, we can always rewrite a second order difference equation as a system of two first
order difference equations.
More generally, an nth order difference equation
yt+1 = a1 yt + a2 yt−1 + ... + an yt−n+1
can be rewritten as a system of n first order difference equations by defining the vector
⎡ ⎤
yt+1
⎢ yt ⎥
zt+1 = ⎢ ⎣ ...
⎥
⎦
yt−n+2
2
and writing
⎡ ⎤ ⎡ ⎤⎡ ⎤
yt+1 a1 a2 ... ... an yt
⎢ yt ⎥ ⎢ 1 0 ... ... ⎥ ⎢
0 ⎥ ⎢ yt−1 ⎥
zt+1 =⎢
⎣ ...
⎥=⎢
⎦ ⎣ ... ...
⎥ = Azt
... ... ... ⎦ ⎣ ... ⎦
yt−n+2 0 0 ... 1 0 yt−n+1
Thus, as long as we are willing to consider systems of difference equations we can, without
any loss of generality, confine our attention to first order difference equations.
A general system of linear difference equations can be written as
xt+1 = Axt , (3)
where ⎡ ⎤
x1t+1
⎢ x2t+1 ⎥
xt+1 = ⎢
⎣
⎥,
⎦
...
xnt+1
⎡ ⎤
x1t
⎢ x2t ⎥
xt = ⎢
⎣
⎥,
... ⎦
xnt
and ⎡ ⎤
a11 a12 ... a1n
⎢ a21 a22 ... a2n ⎥
A=⎢
⎣ ...
⎥
... ... ... ⎦
an1 an2 ... ann
Each equation in the system (3) takes the general form
xit+1 = ai1 x1t + ai2 x2t + ... + ain xnt
In the simplest case, A is diagonal, so that (3) reduces to a system of n self-contained

equations, each of which takes the form
xit+1 = aii xit
We already know that the general solution in this case has
xit = ki atii
for all i = 1, 2, ..., n, where particular values for the constants ki , i = 1, 2, ..., n, can be
found if, for example, the initial conditions x10 , x20 , ..., xn0 are given.
But even in the more general case where A is not diagonal, we can solve (3) almost as easily
by drawing on our results having to do with eigenvalues, eigenvectors, and diagonaliz-
able matrices.
3
Begin, as before, by calculating the eigenvalues r1 , r2 , ..., rn of the matrix A, together with
the corresponding eigenvectors v1 , v2 , ..., vn . Then form the matrix
£ ¤
P = v1 v2 ... vn
using the eigenvectors as columns and the matrix

⎡ ⎤
r1 0 ... 0
⎢ 0 r2 ... 0 ⎥
D=⎢ ⎣ ... ... ... ...
⎥
⎦
0 0 ... rn
with the eigenvalues along the diagonal and zeros everywhere else. If the eigenvectors
are linearly independent, then we know from before that
P −1 AP = D.
Next, define a new vector

zt = P −1 xt ,
so that
zt+1 = P −1 xt+1 ,
xt = P zt ,
and
xt+1 = P zt+1 .
Now (3) can be rewritten as

xt+1 = Axt (3)
P zt+1 = AP zt
zt+1 = P −1 AP zt
zt+1 = Dzt (4)
And since D is diagonal, (4) is a system of n self-contained equations, each of which takes
the form
zit+1 = ri zit
and therefore has the general solution
zit = ki rit
for all i = 1, 2, ..., n
4
Finally, with these solutions for the zit ’s in hand, undo the transformation to solve for the
xit ’s:
xt = P zt
⎡ ⎤
z1t
£ ¤ ⎢ z2t ⎥
= v1 v2 ... vn ⎢⎣ ... ⎦
⎥
znt
⎡ ⎤
k1 r1t
£ ¤ ⎢ k2 r2t ⎥
= v1 v2 ... vn ⎢⎣ ... ⎦
⎥
kn rnt
or
xt = k1 r1t v1 + k2 r2t v2 + ... + kn rnt vn (5)
where particular values for the constants ki , i = 1, 2, ..., n can be pinned down, for
example, by initial conditions x10 , x20 , ..., xn0 and the implied values of z10 , z20 , ..., zn0
There is another way of solving systems of linear difference equations that also exploits the
fact that
P −1 AP = D,
where D is diagonal. Note that this equality can be restated as
A = P DP −1 .
Now consider (3) once again:

xt+1 = Axt (3)
xt+1 = P DP −1 xt
Starting from x0 , calculate

x1 = P DP −1 x0
x2 = P DP −1 x1 = P DP −1 P DP −1 x0 = P DDP −1 x0 = P D2 P −1 x0
x3 = P DP −1 x2 = P DP −1 P D2 P −1 x0 = P D3 P −1 x0
Hence, in general,
xt = P Dt P −1 x0 ,
which suggests that (3) has the general solution
xt = P Dt P −1 q, (6)
where particular values of the constants in the vector
⎡ ⎤
q1
⎢ q2 ⎥
q=⎢ ⎥
⎣ ... ⎦
qn
5
can be pinned down, for example if the initial conditions
⎡ ⎤
x10
⎢ x20 ⎥
x0 = ⎢⎣ ... ⎦
⎥
xn0
are given, since in that case:
x0 = P D0 P −1 q
= P IP −1 q
= q
To verify that (6) is the general solution, note that it satisfies
xt+1 = P Dt+1 P −1 q
= P DDt P −1 q
= P DIDt P −1 q
= P DP −1 P Dt P −1 q
= P DP −1 xt
= Axt
Now, in general, it is not the case that for an arbitrary square matrix B, B t can be calculated
by raising each element of B to the tth power. However, in the special case of a diagonal
matrix, it does turn out that this is true.
FACT: For the diagonal matrix D:
⎡ ⎤
r1t 0 ... 0
⎢ 0 r2t ... 0 ⎥
Dt = ⎢
⎣ ...
⎥
... ... ... ⎦
0 0 ... rnt
In light of this fact, (6) can be rewritten as
xt = P Dt P −1 q, (6)
⎡ t ⎤⎡ ⎤
r1 0 ... 0 k1
£ ¤ ⎢ 0 r2t ... ⎥ ⎢
0 ⎥ ⎢ k2 ⎥
xt = v1 v2 ... vn ⎢ ⎣ ... ...
⎥
... ... ⎦ ⎣ ... ⎦
0 0 ... rnt kn
where ⎡ ⎤
k1
⎢ k2 ⎥
⎢ ⎥ −1
⎣ ... ⎦ = P q
kn
6
Thus, (6) is equivalent to ⎡ ⎤
k1 r1t
£ ¤ ⎢ k2 r2t ⎥
xt = v1 v2 ... vn ⎢⎣ ... ⎦
⎥
kn rnt
or
xt = k1 r1t v1 + k2 r2t v2 + ... + kn rnt vn , (5)
which confirms that the two approaches lead us to the same general solution.
Before moving on, note that a stationary, or steady state, solution to
xt+1 = Axt (3)
is a solution of the form xt = x for all t = 0, 1, 2, ..., where the vector of constants x
must satisfy
x = Ax.
If A − I is nonsingular, this requires that
x = 0̄,
where 0̄ is an n × 1 vector of zeros:
⎡ ⎤
0
⎢ 0 ⎥
0̄ = ⎢ ⎥
⎣ ... ⎦
0
These calculations reveal that (3) has the steady state solution
xt = 0̄
The general solution
reveals that the steady state xt = 0̄ will be asymptotically stable if
|ri | < 1
for all i = 1, 2, ..., n, for in this case
lim rit = 0
t→∞
for all i = 1, 2, ..., n and hence

lim xt = 0̄
t→∞
for any choice of k1 , k2 , ..., kn corresponding to any set of initial conditions x10 , x20 ,
..., xn0 .
If, on the other hand, one or more of the ri ’s is such that
|ri | ≥ 1,
then xt = 0̄ is unstable.
7
2 Lag Operators
Many economic applications, including some dynamic optimization problems, give rise to
difference equations of the slightly more general form
yt = ayt−1 + xt , (7)
where xt is an exogenous, and possibly random, forcing variable.
In these cases, we often want to obtain solutions that express yt in terms of current, past,
or future values of xt .
Suppose, therefore, that (7) holds for negative as well as positive values of t. And suppose
that the time horizon extends into the infinite past as well as into the infinite future.
Then t = ..., −2, −1, 0, 1, 2, ...
Now consider working backward:

yt = ayt−1 + xt
and
yt−1 = ayt−2 + xt−1
imply
yt = a2 yt−2 + axt−1 + xt
And since
yt−2 = ayt−3 + xt−2 ,
we can also write
yt = a3 yt−3 + a2 xt−2 + axt−1 + xt
or, more generally,
X
T −1
T
yt = a yt−T + aj xt−j (8)
j=0
Now assume that:
a) The parameter a is less than one in absolute value: |a| < 1

b) The exogenous sequence {xt }∞
t=−∞ is bounded
c) The endogenous sequence {yt }∞

t=−∞ is also required to be bounded
Then we might repeat our backward substitution an infinite number of times or, equiva-
lently, take the limit in (8) as T → ∞
Since |a| < 1 and {yt }∞

t=−∞ is bounded,
lim aT yt−T = 0
T →∞
8
And since |a| < 1 and {xt }∞
X
T −1 X
∞
j
lim a xt−j = aj xt−j < ∞
T →∞
j=0 j=0
Thus, if |a| < 1, (7) has the solution

X
∞
yt = aj xt−j (9)
j=0
in which yt is expressed as a weighted sum of current and past values of xt .
What happens when, instead, |a| > 1?
In this case, we can rewrite (7) as

yt = ayt−1 + xt , (7)
yt+1 = ayt + xt+1
ayt = yt+1 − xt+1
yt = (1/a)yt+1 − (1/a)xt+1
Now work forward:

yt = (1/a)yt+1 − (1/a)xt+1
and
yt+1 = (1/a)yt+2 − (1/a)xt+2
imply
yt = (1/a)2 yt+2 − (1/a)2 xt+2 − (1/a)xt+1
And since
yt+2 = (1/a)yt+3 − (1/a)xt+3
we can also write
yt = (1/a)3 yt+3 − (1/a)3 xt+3 − (1/a)2 xt+2 − (1/a)xt+1
or, more generally,

X
T
yt = (1/a)T yt+T − (1/a)j xt+j (10)
j=1
Since |a| > 1 and {yt }∞

lim (1/a)T yt+T = 0

T →∞
9
And since |a| > 1 and {xt }∞
X
T X
∞
j
lim (1/a) xt+j = (1/a)j xt+j < ∞
T →∞
j=1 j=1
Thus, if we repeat our forward substitution an infinite number of times or, equivalently,
take the limit in (10) as T → ∞, we obtain the solution
X
∞
yt = − (1/a)j xt+j (11)
j=1
in which yt is expressed as a weighted sum of future values of xt .

In turns out that we can derive the solutions (9) and (11) more quickly if we use an analytic
tool called the lag operator.
The lag operator, denoted by L, is defined by
Lxt = xt−1
and
Lyt = yt−1
Note that
L2 xt = LLxt = Lxt−1 = xt−2
and, more generally,
Lj xt = xt−j
for j = 0, 1, 2, ...
In addition,
Lxt+1 = xt
implies
L−1 xt = xt+1
and, more generally,
L−j xt = xt+j
for j = 0, 1, 2, ...
Thus, the lag operator works to shift the time index on a variable backward when raised
to positive powers and forward when raised to negative powers.
Using the lag operator, (7) can be written as
yt = ayt−1 + xt , (7)
yt = aLyt + xt
(1 − aL)yt = xt
yt = (1 − aL)−1 xt . (12)
10
Now suppose that |a| < 1, and recall the following fact.
FACT: If |β| < 1, then

X
∞
1
βj = = (1 − β)−1
j=0
1−β
It turns out that we can apply this fact to (12) as well, and write
yt = (1 − aL)−1 xt . (12)
X
∞
yt = (aL)j xt
j=0
X∞
= aj Lj xt
j=0
or
X
∞
yt = aj xt−j (9)
j=0
which is, of course, the same solution that we obtained by repeated backward substi-
tution.
Alternatively, if |a| > 1, we can use the lag operator to rewrite (7) as
yt = ayt−1 + xt , (7)
yt+1 = ayt + xt+1

L−1 yt = ayt + L−1 xt+1
ayt − L−1 yt = −L−1 xt
yt − (aL)−1 yt = −(aL)−1 xt
[1 − (aL)−1 ]yt = −(aL)−1 xt
yt = −(aL)−1 [1 − (aL)−1 ]−1 xt (13)
Since |a| > 1, |1/a| < 1. Thus, applying the fact

X
∞
(1 − β) −1
= βj
j=0
11
to (13) yields
X
∞
yt = −(aL)−1 (aL)−j xt
j=0
X
∞
= −(aL)−1 (1/a)j L−j xt
j=0
X
∞
= − (1/a)j+1 L−j−1 xt
j=0
X∞
= − (1/a)j L−j xt
j=1
or
X
∞
yt = − (1/a)j xt+j (11)
j=1
which is, of course, the same solution that we obtained using repeated forward substi-
tution.
Note: Consider forming a matrix A with the parameter a from (7) as its only element:
£ ¤
A= a
Then A has a as its single eigenvalue: r1 = a.
Next, recall that for the linear system
xt+1 = Axt ,
the steady state xt = 0̄ is asymptotically stable if the eigenvalues of A are all less than
one in absolute value and unstable if one or more of the eigenvalues of A is greater
than one in absolute value.
By analogy, the difference equation
yt = ayt−1 + xt (7)
is said to be stable if |a| < 1 and unstable if |a| > 1.
Since, when |a| < 1, we can use backward substitution to obtain
X
∞
yt = aj xt−j (9)
j=0
and, when |a| > 1, we can use forward substitution to obtain

X
∞
yt = − (1/a)j xt+j (11)
j=1
it is often said that stable equations are solved backwards and unstable equations are
solved forwards.
12
3 Four Examples
3.1 Example 1: Optimal Growth
F (kt ) = ktα
where 0 < α < 1

kt+1 = ktα − ct
Utility:
X
∞
β t ln(ct )
t=0
where 0 < β < 1
Earlier, we solved this problem via dynamic programming, by guessing that the value
function takes the form
v(kt ) = E + F ln(kt ).
After deriving the first order and envelope conditions and solving for the unknown
constants E and F , we concluded that the optimal capital stock follows the difference
equation
kt+1 = αβktα (14)
So far, we have only discussed linear difference equations. But notice that if we take logs
on both sides of (14), we obtain
ln(kt+1 ) = ln(αβ) + α ln(kt )
ln(αβ)
ln(kt+1 ) = (1 − α) + α ln(kt )
1−α
∙ ¸
ln(αβ) ln(αβ)
ln(kt+1 ) − = α ln(kt ) −
1−α 1−α
Consider, therefore, defining the new variable
ln(αβ)
zt = ln(kt ) −
1−α
and rewriting the difference equation more simply as
zt+1 = αzt
Now we have a linear difference equation, which we know has the general solution
zt = qαt
13
Given the initial condition k0 , we can calculate the initial condition
ln(αβ)
z0 = ln(k0 ) −
1−α
and thereby determine the particular solution
∙ ¸
t t ln(αβ)
zt = α z0 = α ln(k0 ) −
1−α
Now undo the transformation to find the solution for kt :

∙ ¸
ln(αβ) t ln(αβ)
ln(kt ) − = zt = α ln(k0 ) −
1−α 1−α
Hence, given k0 , (14) has the solution

∙ ¸
ln(αβ) t ln(αβ)
ln(kt ) = + α ln(k0 ) −
1−α 1−α
Since |α| < 1, this solution tells us that for any value of k0 ,
ln(αβ)
lim ln(kt ) = = ln[(αβ)1/(1−α) ]
t→∞ 1−α
Hence, starting from any initial capital stock k0 , the capital stock converges to a steady
state level:
lim kt = (αβ)1/(1−α)
t→∞
3.2 Example 2: Stochastic Optimal Growth

Problem set 3 extended our optimal growth example by adding a random shock to produc-
tivity.
F (kt , zt ) = zt ktα
where 0 < α < 1 and zt is random with Et ln(zt ) = 0 for all t = 0, 1, 2, ...

kt+1 = zt ktα − ct
Expected utility:
X
∞
E0 β t ln(ct )
t=0
where 0 < β < 1
14
Problem set 3 asked you to solve this problem via dynamic programming and to show that
the optimal capital stock follows the first order autoregressive process
ln(kt+1 ) = ln(αβ) + α ln(kt ) + ln(zt ) (15)
Rewrite this difference equation as

ln(αβ)
ln(kt+1 ) = (1 − α) + α ln(kt ) + ln(zt )
1−α
or, more simply, as
xt+1 = αxt + εt ,
where
ln(αβ)
xt = ln(kt ) −
1−α
and
εt = ln(zt )
Since |α| < 1, we can use the lag operator to solve for xt in terms of past values of εt :
xt+1 = αxt + εt ,
xt = αxt−1 + εt−1
xt = αLxt + Lεt
(1 − αL)xt = Lεt
xt = L(1 − αL)−1 εt
X
∞
xt = L (αL)j εt
j=0
X
∞
xt = L αj Lj εt
j=0
X
∞
xt = αj Lj+1 εt
j=0
X
∞
xt = αj εt−j−1
j=0
Now undo the transformations to find the solution for kt in terms of past zt :
ln(αβ) X ∞
ln(kt ) − = xt = αj εt−j−1
1−α j=0
ln(αβ) X j
∞
ln(kt ) = + α ln(zt−j−1 )
1−α j=0
15
Use this solution to characterize the behavior of output:
yt = zt ktα
implies
ln(yt ) = ln(zt ) + α ln(kt )
or µ ¶
α X∞
ln(yt ) = ln(zt ) + ln(αβ) + α αj ln(zt−j−1 ) (16)
1−α j=0
Equation (16) reveals that yt will be serially correlated even if zt is serially uncorrelated.
Thus, the process of optimal capital accumulation can transform serially uncorrelated
shocks to productivity into serially correlated movements in output like those that
occur over the business cycle.
3.3 Example 3: Nonlinear Difference Equations

In our first two examples, we were able to solve nonlinear difference equations by trans-
forming them into linear difference equations.
For this third example, consider a more general nonlinear difference equation of the form
xt+1 = g(xt )
and suppose that the function g does not allow us to rewrite this equation as a linear
difference equation.
In this more general case, we might not be able to find an explicit solution.
As in continuous time, however, we might still be able to characterize the solution graphi-
cally.
In case one, illustrated below, the graph of g(x) intersects the 45-degree line at x = 0 and
x = x∗ , revealing that there are two steady states.
But starting from x0 , which can be arbitrarily close to zero, the graph reveals that
lim xt = x∗ ,
t→∞
implying that xt = x∗ is asymptotically stable, while xt = 0 is unstable.
In case two, once again, there are two steady states: xt = 0 and xt = x∗ .
But in this second case, xt = 0 is asymptotically stable, while xt = x∗ is unstable.
16
45-degree line (xt+1 = xt)
xt+1
xt+1 = g(xt)
x2
x1
0 x0 x1 x2 x*
xt
Case 1: xt = x* is asymptotically stable

45-degree line (xt+1 = xt)
xt+1 xt+1 = g(xt)
x1
x2
0 x2 x1 x0 x* xt
Case 2: xt = 0 is asymptotically stable

3.4 Example 4: Markov Processes
For our last example, consider an economy in which, during each period t, each individ-
ual agent experiences a random shock that places him or her into one of n distinct
categories, or states.
Example: During any period, an individual worker might be employed (in state 1) or
unemployed (in state 2).
For i = 1, 2, ..., n, let
xit = probability that a representative agent will be in state i during period t
If the size of the population is large, then
xit = fraction of the population in state i during period t
For all t = 0, 1, 2..., it must be that

X
n
xit = 1
i=1
Next, for i = 1, 2, ..., n and j = 1, 2, ..., n, let:
mij = probability that an agent in state j during period t will be in state i during
period t + 1
Again, if the size of the population is large,
mij = fraction of the agents in state j during period t who will move to state i during
period t + 1
Once again, for all t = 0, 1, 2, ... and j = 1, 2, ..., n, it must be that

X
n
mij = 1
i=1
The random, or stochastic process that allocates individual agents to individual states in
this example is called a Markov process or a Markov chain.
The defining characteristic of a Markov process is that only the immediate past matters:
the probability that an agent will be in state i during period t + 1 depends only on the
state j that the agent is in during period t.
The probabilities mij are called transition probabilities. Since, in this example, the mij do
not depend on time, the Markov process is stationary.
17
Suppose we collect all of the transition probabilities into a matrix:
⎡ ⎤
m11 m12 ... m1n
⎢ m21 m22 ... m2n ⎥
M =⎢ ⎣ ...
⎥
... ... ... ⎦
mn1 mn2 ... mnn
Then M is called a Markov matrix, and has the special property that each of its columns
have entries that sum to one.
In general, since the mij are probabilities, they must satisfy
mij ≥ 0
for all i = 1, 2, ..., n and j = 1, 2, ..., n. If the stronger condition
mij > 0
holds for all i = 1, 2, ..., n and j = 1, 2, ..., n, then the Markov matrix is said to be
regular.
Note that if we know all of the fractions xjt , j = 1, 2, ..., n, we can calculate the fractions
xit+1 using
X n
xit+1 = mij xjt (17)
j=1
for all i = 1, 2, ..., n.
Alternatively, if we define ⎡ ⎤
x1t
⎢ x2t ⎥
xt = ⎢ ⎥
⎣ ... ⎦ ,
xnt
then we can write the equations in (17) in matrix form as:
⎡ ⎤ ⎡ ⎤⎡ ⎤
x1t+1 m11 m12 ... m1n x1t
⎢ x2t+1 ⎥ ⎢ m21 m22 ... m2n ⎥ ⎢ x2t ⎥
xt+1 = ⎢⎣ ...
⎥=⎢
⎦ ⎣ ...
⎥⎢
⎦ ⎣
⎥ = Mxt , (18)
... ... ... ... ⎦
xnt+1 mn1 mn2 ... mnn xnt
which is just a system of linear difference equations.
FACTS: Let M be a regular Markov matrix. Then
a) M has one eigenvalue that is equal to one: r1 = 1

b) Every other eigenvalue of M is less than one in absolute value: |ri | < 1 for i =
2, 3, ..., n
18
c) For any constant k1 , we can find an eigenvector v1 of M corresponding to the
eigenvalue r1 = 1, such that the elements of w1 = k1 v1 all lie between zero
and one. Moreover, the elements of w1 sum to one. The elements of w1 can
therefore be interpreted as probabilities or, if the population is large, fractions of
the population.
We know that the general solution to a system of linear difference equations like (18) takes
the form
We also know from our facts that if M is a regular Markov matrix,
lim rit = 0 for i = 2, 3, ..., n

t→∞
Thus, so long as M is regular, (5) implies that starting from any initial x0 ,
lim xt = k1 v1 = w1 ,
t→∞
where the elements of w1 can be interpreted as fractions of the population.
These results tell us that if the Markov matrix M is regular, then starting from any initial
distribution of the population into states,
⎡ ⎤
x10
⎢ x20 ⎥
x0 = ⎢ ⎥
⎣ ... ⎦ ,
xn0
the economy will converge over time towards a steady state, in which the distribution
of the population into states is given by
⎡ ⎤
w11
⎢ w12 ⎥
w1 = ⎢ ⎥
⎣ ... ⎦ .
w1n
19

Irelandp

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Irelandp

Uploaded by

Copyright:

Available Formats

1 The Kuhn-Tucker Theorem

Dixit, Chapters 2 and 3.

Consider a simple constrained optimization problem:

The problem can be stated as:

max F (x) subject to c ≥ G(x)

To prove this theorem, begin by defining the Lagrangian:

L(x, λ) = F (x) + λ[c − G(x)]

for any x ∈ R and λ ∈ R.

Theorem (Kuhn-Tucker) Suppose that x∗ maximizes F (x) subject to c ≥ G(x), where

L1 (x∗ , λ∗ ) = F 0 (x∗ ) − λ∗ G0 (x∗ ) = 0, (1)

Case 1: Nonbinding constraint.

F (x∗ − ε) > F (x∗ ) and c > G(x∗ − ε).

F (x∗ + ε) > F (x∗ ) and c > G(x∗ + ε).

Case 2: Binding Constraint.

F 0 (x∗ )/G0 (x∗ ) ≥ 0. (6)

F 0 (x∗ )/G0 (x∗ ) < 0.

F (x∗ − ε) > F (x∗ ) and c = G(x∗ ) > G(x∗ − ε),

which again contradicts the assumption that x∗ maximizes F (x) subject to c ≥

max F (x) subject to c ≥ G(x).

The theorem tells us that if we form the Lagrangian

L(x, λ) = F (x) + λ[c − G(x)],

L1 (x∗ , λ∗ ) = F 0 (x∗ ) − λ∗ G0 (x∗ ) = 0, (1)

In addition, we know that x∗ must satisfy the constraint:

We know that the Lagrange multiplier λ∗ must be nonnegative:

a) The extra assumption that G0 (x∗ ) 6= 0 is needed to guarantee the existence of a

then we can introduce a second multiplier µ, form the Lagrangian as

2 The Envelope Theorem

In our discussion of the Kuhn-Tucker theorem, we considered an optimization problem of

max F (x, θ) subject to c ≥ G(x, θ)

For this problem, define the maximum value function V : R → R as

V (θ) = max F (x, θ) subject to c ≥ G(x, θ)

Note that evaluating V requires a two-step procedure:

L(x, λ) = F (x, θ) + λ[c − G(x, θ)]

Thus, it only remains to show that

it once again becomes apparent that (8) must hold.

where x is the choice variable and θ is the parameter.

V (θ) = max F (x, θ).

V (θ) = F1 [x∗ (θ), θ]x∗0 (θ) + F2 [x∗ (θ), θ].

V (θ) = F2 [x∗ (θ), θ],

so that, again, we can ignore the dependence of x∗ on θ in diﬀerentiating the maximum

Now return to the constrained optimization problem

max F (x, θ) subject to c ≥ G(x, θ)

and define the maximum value function as before:

V (θ) = max F (x, θ) subject to c ≥ G(x, θ).

max F (x, θ),

where x is the choice variable and θ is a parameter.

V (θ) = max F (x, θ).

V (θ1 ) = F (x1 , θ1 ) > F (x2 , θ1 ),

V (θ2 ) = F (x2 , θ2 ) > F (x1 , θ2 ),

because again, by definition, x2 maximizes F (x, θ2 ) by choice of x.

On a graph, these relationships imply that:

V 0 (θ) = F2 [x∗ (θ), θ],

max F (x, θ) subject to c ≥ G(x, θ),

L(x, λ, θ) = F (x, θ) + λ[c − G(x, θ)],

V (θ) = max F (x, θ) subject to c ≥ G(x, θ)

which is exactly the same result that we derived before!

max U(c1 , c2 ) subject to I ≥ p1 c1 + p2 c2

The Kuhn-Tucker theorem tells us that

3.2 Cost Minimization

min rk + wl subject to f (k, l) ≥ y

If we set up the Lagrangian as

L(k, l, λ) = rk + wl − λ[f (k, l) − y],