# MUST-HAVE MATH TOOLS

FOR

William Neilson

Department of Economics
University of Tennessee – Knoxville

September 2009

web.utk.edu/~wneilson/mathbook.pdf

Acknowledgments

Valentina Kozlova, Kelly Padden, and John Tilstra provided valuable
proofreading assistance on the first version of this book, and I am grateful.
Other mistakes were found by the students in my class. Of course, if they
missed anything it is still my fault. Valentina and Bruno Wichmann have
both suggested additions to the book, including the sections on stability of
dynamic systems and order statistics.

The cover picture was provided by my son, Henry, who also proofread
parts of the book. I have always liked this picture, and I thank him for
letting me use it.
CONTENTS
1 Econ and math 1
1.1 Some important graphs . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Math, micro, and metrics . . . . . . . . . . . . . . . . . . . . . 4
I Optimization (Multivariate calculus) 6
2 Single variable optimization 7
2.1 A graphical approach . . . . . . . . . . . . . . . . . . . . . . . 8
2.2 Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3 Uses of derivatives . . . . . . . . . . . . . . . . . . . . . . . . 14
2.4 Maximum or minimum? . . . . . . . . . . . . . . . . . . . . . 16
2.5 Logarithms and the exponential function . . . . . . . . . . . . 17
2.6 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3 Optimization with several variables 21
3.1 A more complicated pro…t function . . . . . . . . . . . . . . . 21
3.2 Vectors and Euclidean space . . . . . . . . . . . . . . . . . . . 22
3.3 Partial derivatives . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.4 Multidimensional optimization . . . . . . . . . . . . . . . . . . 26
i
ii
3.5 Comparative statics analysis . . . . . . . . . . . . . . . . . . . 29
3.5.1 An alternative approach (that I don’t like) . . . . . . . 31
3.6 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4 Constrained optimization 36
4.1 A graphical approach . . . . . . . . . . . . . . . . . . . . . . . 37
4.2 Lagrangians . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.3 A 2-dimensional example . . . . . . . . . . . . . . . . . . . . . 40
4.4 Interpreting the Lagrange multiplier . . . . . . . . . . . . . . . 42
4.5 A useful example - Cobb-Douglas . . . . . . . . . . . . . . . . 43
4.6 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5 Inequality constraints 52
5.1 Lame example - capacity constraints . . . . . . . . . . . . . . 53
5.1.1 A binding constraint . . . . . . . . . . . . . . . . . . . 54
5.1.2 A nonbinding constraint . . . . . . . . . . . . . . . . . 55
5.2 A new approach . . . . . . . . . . . . . . . . . . . . . . . . . . 56
5.3 Multiple inequality constraints . . . . . . . . . . . . . . . . . . 59
5.4 A linear programming example . . . . . . . . . . . . . . . . . 62
5.5 Kuhn-Tucker conditions . . . . . . . . . . . . . . . . . . . . . 64
5.6 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
II Solving systems of equations (Linear algebra) 71
6 Matrices 72
6.1 Matrix algebra . . . . . . . . . . . . . . . . . . . . . . . . . . 72
6.2 Uses of matrices . . . . . . . . . . . . . . . . . . . . . . . . . . 76
6.3 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6.4 Cramer’s rule . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
6.5 Inverses of matrices . . . . . . . . . . . . . . . . . . . . . . . . 81
6.6 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
7 Systems of equations 86
7.1 Identifying the number of solutions . . . . . . . . . . . . . . . 87
7.1.1 The inverse approach . . . . . . . . . . . . . . . . . . . 87
7.1.2 Row-echelon decomposition . . . . . . . . . . . . . . . 87
7.1.3 Graphing in (x,y) space . . . . . . . . . . . . . . . . . 89
iii
7.1.4 Graphing in column space . . . . . . . . . . . . . . . . 89
7.2 Summary of results . . . . . . . . . . . . . . . . . . . . . . . . 91
7.3 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
8 Using linear algebra in economics 95
8.1 IS-LM analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 95
8.2 Econometrics . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
8.2.1 Least squares analysis . . . . . . . . . . . . . . . . . . 98
8.2.2 A lame example . . . . . . . . . . . . . . . . . . . . . . 99
8.2.3 Graphing in column space . . . . . . . . . . . . . . . . 100
8.2.4 Interpreting some matrices . . . . . . . . . . . . . . . 101
8.3 Stability of dynamic systems . . . . . . . . . . . . . . . . . . . 102
8.3.1 Stability with a single variable . . . . . . . . . . . . . . 102
8.3.2 Stability with two variables . . . . . . . . . . . . . . . 104
8.3.3 Eigenvalues and eigenvectors . . . . . . . . . . . . . . . 105
8.3.4 Back to the dynamic system . . . . . . . . . . . . . . . 108
8.4 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
9 Second-order conditions 114
9.1 Taylor approximations for R ÷R . . . . . . . . . . . . . . . . 114
9.2 Second order conditions for R ÷R . . . . . . . . . . . . . . . 116
9.3 Taylor approximations for R
n
÷R . . . . . . . . . . . . . . . 116
9.4 Second order conditions for R
n
÷R . . . . . . . . . . . . . . 118
9.5 Negative semide…nite matrices . . . . . . . . . . . . . . . . . . 118
9.5.1 Application to second-order conditions . . . . . . . . . 119
9.5.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . 120
9.6 Concave and convex functions . . . . . . . . . . . . . . . . . . 120
9.7 Quasiconcave and quasiconvex functions . . . . . . . . . . . . 124
9.8 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
III Econometrics (Probability and statistics) 130
10 Probability 131
10.1 Some de…nitions . . . . . . . . . . . . . . . . . . . . . . . . . . 131
10.2 De…ning probability abstractly . . . . . . . . . . . . . . . . . . 132
10.3 De…ning probabilities concretely . . . . . . . . . . . . . . . . . 134
10.4 Conditional probability . . . . . . . . . . . . . . . . . . . . . . 136
iv
10.5 Bayes’ rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
10.6 Monty Hall problem . . . . . . . . . . . . . . . . . . . . . . . 139
10.7 Statistical independence . . . . . . . . . . . . . . . . . . . . . 140
10.8 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
11 Random variables 143
11.1 Random variables . . . . . . . . . . . . . . . . . . . . . . . . . 143
11.2 Distribution functions . . . . . . . . . . . . . . . . . . . . . . 144
11.3 Density functions . . . . . . . . . . . . . . . . . . . . . . . . . 144
11.4 Useful distributions . . . . . . . . . . . . . . . . . . . . . . . . 145
11.4.1 Binomial (or Bernoulli) distribution . . . . . . . . . . . 145
11.4.2 Uniform distribution . . . . . . . . . . . . . . . . . . . 147
11.4.3 Normal (or Gaussian) distribution . . . . . . . . . . . . 148
11.4.4 Exponential distribution . . . . . . . . . . . . . . . . . 149
11.4.5 Lognormal distribution . . . . . . . . . . . . . . . . . . 151
11.4.6 Logistic distribution . . . . . . . . . . . . . . . . . . . 151
12 Integration 153
12.1 Interpreting integrals . . . . . . . . . . . . . . . . . . . . . . . 155
12.2 Integration by parts . . . . . . . . . . . . . . . . . . . . . . . . 156
12.2.1 Application: Choice between lotteries . . . . . . . . . 157
12.3 Di¤erentiating integrals . . . . . . . . . . . . . . . . . . . . . . 159
12.3.1 Application: Second-price auctions . . . . . . . . . . . 161
12.4 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
13 Moments 164
13.1 Mathematical expectation . . . . . . . . . . . . . . . . . . . . 164
13.2 The mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
13.2.1 Uniform distribution . . . . . . . . . . . . . . . . . . . 165
13.2.2 Normal distribution . . . . . . . . . . . . . . . . . . . . 165
13.3 Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
13.3.1 Uniform distribution . . . . . . . . . . . . . . . . . . . 167
13.3.2 Normal distribution . . . . . . . . . . . . . . . . . . . . 168
13.4 Application: Order statistics . . . . . . . . . . . . . . . . . . . 168
13.5 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
v
14 Multivariate distributions 175
14.1 Bivariate distributions . . . . . . . . . . . . . . . . . . . . . . 175
14.2 Marginal and conditional densities . . . . . . . . . . . . . . . . 176
14.3 Expectations . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
14.4 Conditional expectations . . . . . . . . . . . . . . . . . . . . . 181
14.4.1 Using conditional expectations - calculating the bene…t
of search . . . . . . . . . . . . . . . . . . . . . . . . . . 181
14.4.2 The Law of Iterated Expectations . . . . . . . . . . . . 184
14.5 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
15 Statistics 187
15.1 Some de…nitions . . . . . . . . . . . . . . . . . . . . . . . . . . 187
15.2 Sample mean . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
15.3 Sample variance . . . . . . . . . . . . . . . . . . . . . . . . . . 189
15.4 Convergence of random variables . . . . . . . . . . . . . . . . 192
15.4.1 Law of Large Numbers . . . . . . . . . . . . . . . . . . 192
15.4.2 Central Limit Theorem . . . . . . . . . . . . . . . . . . 193
15.5 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
16 Sampling distributions 194
16.1 Chi-square distribution . . . . . . . . . . . . . . . . . . . . . . 194
16.2 Sampling from the normal distribution . . . . . . . . . . . . . 196
16.3 t and F distributions . . . . . . . . . . . . . . . . . . . . . . . 198
16.4 Sampling from the binomial distribution . . . . . . . . . . . . 200
17 Hypothesis testing 201
17.1 Structure of hypothesis tests . . . . . . . . . . . . . . . . . . . 202
17.2 One-tailed and two-tailed tests . . . . . . . . . . . . . . . . . . 205
17.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
17.3.1 Example 1 . . . . . . . . . . . . . . . . . . . . . . . . . 207
17.3.2 Example 2 . . . . . . . . . . . . . . . . . . . . . . . . . 208
17.3.3 Example 3 . . . . . . . . . . . . . . . . . . . . . . . . . 208
17.4 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
18 Solutions to end-of-chapter problems 211
18.1 Solutions for Chapter 15 . . . . . . . . . . . . . . . . . . . . . 265
Index 267
CHAPTER
1
Econ and math
Every academic discipline has its own standards by which it judges the merits
of what researchers claim to be true. In the physical sciences this typically
requires experimental veri…cation. In history it requires links to the original
sources. In sociology one can often get by with anecdotal evidence, that
is, with giving examples. In economics there are two primary ways one
can justify an assertion, either using empirical evidence (econometrics or
experimental work) or mathematical arguments.
Both of these techniques require some math, and one purpose of this
course is to provide you with the mathematical tools needed to make and
understand economic arguments. A second goal, though, is to teach you to
speak mathematics as a second language, that is, to make you comfortable
courses we tend to use equations. But equations often have graphical coun-
terparts and vice versa. Part of getting comfortable about using math to
do economics is knowing how to go from graphs to the underlying equations,
and part is going from equations to the appropriate graphs.
1
CHAPTER 1. ECON AND MATH 2
Figure 1.1: A constrained choice problem
1.1 Some important graphs
One of the fundamental graphs is shown in Figure 1.1. The axes and curves
are not labeled, but that just ampli…es its importance. If the axes are
commodities, the line is a budget line, and the curve is an indi¤erence curve,
the graph depicts the fundamental consumer choice problem. If the axes are
inputs, the curve is an isoquant, and the line is an iso-cost line, the graph
illustrates the …rm’s cost-minimization problem.
Figure 1.1 raises several issues. How do we write the equations for the
line and the curve? The line and curve seem to be tangent. How do we
characterize tangency? At an even more basic level, how do we …nd slopes
of curves? How do we write conditions for the curve to be curved the way
it is? And how do we do all of this with equations instead of a picture?
Figure 1.2 depicts a di¤erent situation. If the upward-sloping line is a
supply curve and the downward-sloping one is a demand curve, the graph
shows how the market price is determined. If the upward-sloping line is
marginal cost and the downward-sloping line is marginal bene…t, the …gure
shows how an individual or …rm chooses an amount of some activity. The
questions for Figure 1.2 are: How do we …nd the point where the two lines
intersect? How do we …nd the change from one intersection point to another?
And how do we know that two curves will intersect in the …rst place?
Figure 1.3 is completely di¤erent. It shows a collection of points with a
CHAPTER 1. ECON AND MATH 3
Figure 1.2: Solving simultaneous equations
line …tting through them. How do we …t the best line through these points?
This is the key to doing empirical work. For example, if the horizontal axis
measures the quantity of a good and the vertical axis measures its price, the
points could be observations of a demand curve. How do we …nd the demand
curve that best …ts the data?
These three graphs are fundamental to economics. There are more as
well. All of them, though, require that we restrict attention to two di-
mensions. For the …rst graph that means consumer choice with only two
commodities, but we might want to talk about more. For the second graph it
means supply and demand for one commodity, but we might want to consider
several markets simultaneously. The third graph allows quantity demanded
to depend on price, but not on income, prices of other goods, or any other
factors. So, an important question, and a primary reason for using equations
instead of graphs, is how do we handle more than two dimensions?
Math does more for us than just allow us to expand the number of di-
mensions. It provides rigor; that is, it allows us to make sure that our
statements are true. All of our assertions will be logical conclusions from
our initial assumptions, and so we know that our arguments are correct and
we can then devote attention to the quality of the assumptions underlying
them.
CHAPTER 1. ECON AND MATH 4
Figure 1.3: Fitting a line to data points
1.2 Math, micro, and metrics
The theory of microeconomics is based on two primary concepts: optimiza-
tion and equilibrium. Finding how much a …rm produces to maximize pro…t
is an example of an optimization problem, as is …nding what a consumer
purchases to maximize utility. Optimization problems usually require …nd-
ing maxima or minima, and calculus is the mathematical tool used to do
this. The …rst section of the book is devoted to the theory of optimization,
and it begins with basic calculus. It moves beyond basic calculus in two
ways, though. First, economic problems often have agents simultaneously
choosing the values of more than one variable. For example, consumers
choose commodity bundles, not the amount of a single commodity. To an-
alyze problems with several choice variables, we need multivariate calculus.
Second, as illustrated in Figure 1.1, the problem is not just a simple maxi-
mization problem. Instead, consumers maximize utility subject to a budget
constraint. We must …gure out how to perform constrained optimization.
Finding the market-clearing price is an equilibrium problem. An equilib-
rium is simply a state in which there is no pressure for anything to change,
and the market-clearing price is the one at which suppliers have no incentive
to raise or lower their prices and consumers have no incentive to raise or
lower their o¤ers. Solutions to games are also based on the concept of equi-
librium. Graphically, equilibrium analysis requires …nding the intersection
of two curves, as in Figure 1.2. Mathematically, it involves the solution of
CHAPTER 1. ECON AND MATH 5
several equations in several unknowns. The branch of mathematics used
for this is linear (or matrix) algebra, and so we must learn to manipulate
matrices and use them to solve systems of equations.
Economic exercises often involve comparative statics analysis, which in-
volves …nding how the optimum or equilibrium changes when one of the un-
derlying parameters changes. For example, how does a consumer’s optimal
bundle change when the underlying commodity prices change? How does
a …rm’s optimal output change when an input or an output price changes?
How does the market-clearing price change when an input price changes? All
of these questions are answered using comparative statics analysis. Mathe-
matically, comparative statics analysis involves multivariable calculus, often
in combination with matrix algebra. This makes it sound hard. It isn’t
really. But getting you to the point where you can perform comparative
statics analysis means going through these two parts of mathematics.
Comparative statics analysis is also at the heart of empirical work, that
is, econometrics. A typical empirical project involves estimating an equa-
tion that relates a dependent variable to a set of independent variables. The
estimated equation then tells how the dependent variable changes, on av-
erage, when one of the independent variables changes. So, for example,
if one estimates a demand equation in which quantity demanded is the de-
pendent variable and the good’s price, some substitute good prices, some
complement good prices, and income are independent variables, the result-
ing equation tells how much quantity demanded changes when income rises,
for example. But this is a comparative statics question. A good empirical
project uses some math to derive the comparative statics results …rst, and
then uses data to estimate the comparative statics results second. Conse-
quently, econometrics and comparative statics analysis go hand-in-hand.
Econometrics itself is the task of …tting the best line to a set of data
points, as in Figure 1.3. There is some math behind that task. Much of it
is linear algebra, because matrices turn out to provide an easy way to present
the relevant equations. A little bit of the math is calculus, because "best"
implies "optimal," and we use calculus to …nd optima. Econometrics also
requires a knowledge of probability and statistics, which is the third branch
of mathematics we will study.

PART I

OPTIMIZATION

(multivariate calculus)

CHAPTER
2
Single variable optimization
One feature that separates economics from the other social sciences is the
premise that individual actors, whether they are consumers, …rms, workers,
or government agencies, act rationally to make themselves as well o¤ as
possible. In other words, in economics everybody maximizes something.
So, doing mathematical economics requires an ability to …nd maxima and
minima of functions. This chapter takes a …rst step using the simplest
possible case, the one in which the agent must choose the value of only a
single variable. In later chapters we explore optimization problems in which
the agent chooses the values of several variables simultaneously.
Remember that one purpose of this course is to introduce you to the
mathematical tools and techniques needed to do economics at the graduate
level, and that the other is to teach you to frame economic questions, and
their answers, mathematically. In light of the second goal, we will begin with
a graphical analysis of optimization and then …nd the math that underlies
the graph.
Many of you have already taken calculus, and this chapter concerns single-
variable, di¤erential calculus. One di¤erence between teaching calculus in
7
CHAPTER 2. SINGLE VARIABLE OPTIMIZATION 8
π
*
q
*
\$
q
π(q)
Figure 2.1: A pro…t function with a maximum
an economics course and teaching it in a math course is that economists
almost never use trigonometric functions. The economy has cycles, but
none regular enough to model using sines and cosines. So, we will skip
trigonometric functions. We will, however, need logarithms and exponential
functions, and they are introduced in this chapter.
2.1 A graphical approach
Consider the case of a competitive …rm choosing how much output to pro-
duce. When the …rm produces and sells ¡ units it earns revenue 1(¡) and
incurs costs of ((¡). The pro…t function is
:(¡) = 1(¡) ÷((¡).
The …rst term on the right-hand side is the …rm’s revenue, and the second
term is its cost. Pro…t, as always, is revenue minus cost.
More importantly for this chapter, Figure 2.1 shows the …rm’s pro…t func-
tion. The maximum level of pro…t is :
+
, which is achieved when output is
¡
+
. Graphically this is very easy. The question is, how do we do it with
Two features of Figure 2.1 stand out. First, at the maximum the slope
of the pro…t function is zero. Increasing ¡ beyond ¡
+
reduces pro…t, and
CHAPTER 2. SINGLE VARIABLE OPTIMIZATION 9
π(q)
\$
q
Figure 2.2: A pro…t function with a minimum
decreasing ¡ below ¡
+
also reduces pro…t. Second, the pro…t function rises
up to ¡
+
and then falls. To see why this is important, compare it to Figure
2.2, where the pro…t function has a minimum. In Figure 2.2 the pro…t
function falls to the minimum then rises, while in Figure 2.1 it rises to the
maximum then falls. To make sure we have a maximum, we have to make
sure that the pro…t function is rising then falling.
This leaves us with several tasks. (1) We must …nd the slope of the pro…t
function. (2) We must …nd ¡
+
by …nding where the slope is zero. (3) We
must make sure that pro…t really is maximized at ¡
+
, and not minimized.
(4) We must relate our …ndings back to economics.
2.2 Derivatives
The derivative of a function provides its slope at a point. It can be denoted
in two ways: 1
t
(r) or d1(r)dr. The derivative of the function 1 at r is
de…ned as
d1(r)
dr
= lim
I÷0
1(r +/) ÷1(r)
/
. (2.1)
The idea is as follows, with the help of Figure 2.3. Suppose we start at r
and consider a change to r + /. Then 1 changes from 1(r) to 1(r + /).
The ratio of the change in 1 to the change in r is a measure of the slope:
CHAPTER 2. SINGLE VARIABLE OPTIMIZATION 10
x
f(x)
f(x+h)
x+h
f(x)
x
f(x)
Figure 2.3: Approximating the slope of a function
[1(r+/) ÷1(r)][(r+/) ÷r]. Make the change in r smaller and smaller to
get a more precise measure of the slope, and, in the limit, you end up with
the derivative.
Finding the derivative comes from applying the formula in equation (2.1).
And it helps to have a few simple rules in hand. We present these rules as
a series of theorems.
Theorem 1 Suppose 1(r) = c. Then 1
t
(r) = 0.
Proof.
1
t
(r) = lim
I÷0
1(r +/) ÷1(r)
/
= lim
I÷0
c ÷c
/
= 0.
Graphically, a constant function, that is, one that yields the same value for
every possible r, is just a horizontal line, and horizontal lines have slopes of
zero. The theorem says that the derivative of a constant function is zero.
Theorem 2 Suppose 1(r) = r. Then 1
t
(r) = 1.
CHAPTER 2. SINGLE VARIABLE OPTIMIZATION 11
Proof.
1
t
(r) = lim
I÷0
1(r +/) ÷1(r)
/
= lim
I÷0
(r +/) ÷r
/
= lim
I÷0
/
/
= 1.
Graphically, the function 1(r) = r is just a 45-degree line, and the slope of
the 45-degree line is one. The theorem con…rms that the derivative of this
function is one.
Theorem 3 Suppose 1(r) = cn(r). Then 1
t
(r) = cn
t
(r).
Proof.
1
t
(r) = lim
I÷0
1(r +/) ÷1(r)
/
= lim
I÷0
cn(r +/) ÷cn(r)
/
= c lim
I÷0
n(r +/) ÷n(r)
/
= cn
t
(r).
This theorem provides a useful rule. When you multiply a function by a
scalar (or constant), you also multiply the derivative by the same scalar.
Graphically, multiplying by a scalar rotates the curve.
Theorem 4 Suppose 1(r) = n(r) +·(r). Then 1
t
(r) = n
t
(r) +·
t
(r).
Proof.
1
t
(r) = lim
I÷0
1(r +/) ÷1(r)
/
= lim
I÷0
[n(r +/) +·(r +/)] ÷[n(r) +·(r)]
/
= lim
I÷0
¸
n(r +/) ÷n(r)
/
+
·(r +/) ÷n(r)
/

= n
t
(r) +·
t
(r).
CHAPTER 2. SINGLE VARIABLE OPTIMIZATION 12
This rule says that the derivative of a sum is the sum of the derivatives.
The next theorem is the product rule, which tells how to take the
derivative of the product of two functions.
Theorem 5 Suppose 1(r) = n(r)·(r). Then 1
t
(r) = n
t
(r)·(r)+n(r)·
t
(r).
Proof.
1
t
(r) = lim
I÷0
1(r +/) ÷1(r)
/
= lim
I÷0
[n(r +/)·(r +/)] ÷[n(r)·(r)]
/
= lim
I÷0
¸
[n(r +/) ÷n(r)]·(r)
/
+
n(r +/)[·(r +/) ÷·(r)]
/

where the move fromline 2 to line 3 entails adding then subtracting lim
I÷0
n(r+
/)·(r)/. Remembering that the limit of a product is the product of the
limits, the above expression reduces to
1
t
(r) = lim
I÷0
[n(r +/) ÷n(r)]
/
·(r) + lim
I÷0
n(r +/)
[·(r +/) ÷·(r)]
/
= n
t
(r)·(r) +n(r)·
t
(r).
We need a rule for functions of the form 1(r) = 1n(r), and it is provided
in the next theorem.
Theorem 6 Suppose 1(r) = 1n(r). Then 1
t
(r) = ÷n
t
(r)[n(r)]
2
.
Proof.
1
t
(r) = lim
I÷0
1(r +/) ÷1(r)
/
= lim
I÷0
1
&(a+I)
÷
1
&(a)
/
= lim
I÷0
n(r) ÷n(r +/)
/[n(r +/)n(r)]
= lim
I÷0
÷[n(r +/) ÷n(r)]
/
lim
I÷0
1
n(r +/)n(r)
= ÷n
t
(r)
1
[n(r)]
2
.
CHAPTER 2. SINGLE VARIABLE OPTIMIZATION 13
Our …nal rule concerns composite functions, that is, functions of functions.
This rule is called the chain rule.
Theorem 7 Suppose 1(r) = n(·(r)). Then 1
t
(r) = n
t
(·(r)) ·
t
(r).
Proof. First suppose that there is some sequence /
1
. /
2
. ... with lim
j÷o
/
j
=
0 and ·(r +/
j
) ÷·(r) = 0 for all i. Then
1
t
(r) = lim
I÷0
1(r +/) ÷1(r)
/
= lim
I÷0
n(·(r +/)) ÷n(·(r))
/
= lim
I÷0
¸
n(·(r +/)) ÷n(·(r))
·(r +/) ÷·(r)

·(r +/) ÷·(r)
/

= lim
I÷0
n(·(r) +/)) ÷n(·(r))
/
lim
I÷0
·(r +/) ÷·(r)
/
= n
t
(·(r)) ·
t
(r).
Now suppose that there is no sequence as de…ned above. Then there exists
a sequence /
1
. /
2
. ... with lim
j÷o
/
j
= 0 and ·(r + /
j
) ÷ ·(r) = 0 for all i.
Let / = ·(r) for all r,and
1
t
(r) = lim
I÷0
1(r +/) ÷1(r)
/
= lim
I÷0
n(·(r +/)) ÷n(·(r))
/
= lim
I÷0
n(/) ÷n(/)
/
= 0.
But n
t
(·(r)) ·
t
(r) = 0 since ·
t
(r) = 0, and we are done.
d
dr
c[1(r)]
a
= c:[1(r)]
a÷1
1
t
(r). (2.2)
CHAPTER 2. SINGLE VARIABLE OPTIMIZATION 14
This holds even if : is negative, and even if : is not an integer. So, for
example, the derivative of r
a
is :r
a÷1
, and the derivative of (2r + 1)
5
is
10(2r + 1)
4
. The derivative of (4r
2
÷1)
÷.4
is ÷.4(4r
2
÷1)
÷1.4
(8r).
Combining the rules also gives us the familiar division rule:
d
dr
¸
n(r)
·(r)

=
n
t
(r)·(r) ÷·
t
(r)n(r)
[·(r)]
2
. (2.3)
To get it, rewrite n(r)·(r) as n(r) [·(r)]
÷1
. We can then use the product
rule and expression (2.2) to get
d
dr

n(r)·
÷1
(r)

= n
t
(r)·
÷1
(r) + (÷1)n(r)·
÷2
(r)·
t
(r)
=
n
t
(r)
·(r)
÷
·
t
(r)n(r)
·
2
(r)
.
Multiplying both the numerator and denominator of the …rst term by ·(r)
yields (2.3).
Getting more ridiculously complicated, consider
1(r) =
(r
3
+ 2r)(4r ÷1)
r
3
.
To di¤erentiate this thing, split 1 into three component functions, 1
1
(r) =
r
3
+ 2r, 1
2
(r) = 4r ÷ 1, and 1
3
(r) = r
3
. Then 1(r) = 1
1
(r) 1
2
(r)1
3
(r),
and
1
t
(r) =
1
t
1
(r)1
2
(r)
1
3
(r)
+
1
1
(r)1
t
2
(r)
1
3
(r)
÷
1
1
(r)1
2
(r)1
t
3
(r)
[1
3
(r)]
2
.
We can di¤erentiate the component functions to get 1
1
(r) = 3r
2
+2, 1
t
2
(r) =
4, and 1
t
3
(r) = 3r
2
. Plugging this all into the formula above gives us
1
t
(r) =
(3r
2
+ 2)(4r ÷1)
r
3
+
4(r
3
+ 2r)
r
3
÷
3(r
3
+ 2r)(4r ÷1)r
2
r
6
.
2.3 Uses of derivatives
In economics there are three major uses of derivatives.
The …rst use comes from the economics idea of "marginal this" and "mar-
ginal that." In principles of economics courses, for example, marginal cost is
CHAPTER 2. SINGLE VARIABLE OPTIMIZATION 15
de…ned as the additional cost a …rm incurs when it produces one more unit
of output. If the cost function is ((¡), where ¡ is quantity, marginal cost is
((¡ + 1) ÷((¡). We could divide output up into smaller units, though, by
measuring in grams instead of kilograms, for example. Continually divid-
ing output into smaller and smaller units of size / leads to the de…nition of
marginal cost as
`((¡) = lim
I÷0
c(¡ +/) ÷c(¡)
/
.
Marginal cost is simply the derivative of the cost function. Similarly, mar-
ginal revenue is the derivative of the revenue function, and so on.
The second use of derivatives comes from looking at their signs (the as-
trology of derivatives). Consider the function n = 1(r). We might ask
whether an increase in r leads to an increase in n or a decrease in n. The
derivative 1
t
(r) measures the change in n when r changes, and so if 1
t
(r) _ 0
we know that n increases when r increases, and if 1
t
(r) _ 0 we know that n
decreases when r increases. So, for example, if the marginal cost function
`((¡) or, equivalently, (
t
(¡) is positive we know that an increase in output
leads to an increase in cost.
The third use of derivatives is for …nding maxima and minima of functions.
This is where we started the chapter, with a competitive …rm choosing output
to maximize pro…t. The pro…t function is :(¡) = 1(¡)÷((¡). As we saw in
Figure 2.1, pro…t is maximized when the slope of the pro…t function is zero,
or
d:

= 0.
This condition is called a …rst-order condition, often abbreviated as FOC.
Using our rules for di¤erentiation, we can rewrite the FOC as
d:

= 1
t

+
) ÷(
t

+
) = 0. (2.4)
which reduces to the familiar rule that a …rm maximizes pro…t by producing
where marginal revenue equals marginal cost.
Notice what we have done here. We have not used numbers or speci…c
functions and, aside from homework exercises, we rarely will. Using general
functions leads to expressions involving general functions, and we want to
interpret these. We know that 1
t
(¡) is marginal revenue and (
t
(¡) is mar-
ginal cost. We end up in the same place we do using graphs, which is a good
CHAPTER 2. SINGLE VARIABLE OPTIMIZATION 16
thing. The power of the mathematical approach is that it allows us to apply
the same techniques in situations where graphs will not work.
2.4 Maximum or minimum?
Figure 2.1 shows a pro…t function with a maximum, but Figure 2.2 shows
one with a minimum. Both of them generate the same …rst-order condition:
d:d¡ = 0. So what property of the function tells us that we are getting a
maximum and not a minimum?
In Figure 2.1 the slope of the curve decreases as ¡ increases, while in
Figure 2.2 the slope of the curve increases as ¡ increases. Since slopes are
just derivatives of the function, we can express these conditions mathemati-
cally by taking derivatives of derivatives, or second derivatives. The second
derivative of the function 1(r) is denoted 1
tt
(r) or d
2
1dr
2
. For the function
to have a maximum, like in Figure 2.1, the derivative should be decreasing,
which means that the second derivative should be negative. For the function
to have a minimum, like in Figure 2.2, the derivative should be increasing,
which means that the second derivative should be positive. Each of these is
called a second-order condition or SOC. The second-order condition for
a maximum is 1
tt
(r) _ 0, and the second-order condition for a minimum is
1
tt
(r) _ 0.
We can guarantee that pro…t is maximized, at least locally, if :
tt

+
) _ 0.
We can guarantee that pro…t is maximized globally if :
tt
(¡) _ 0 for all possible
values of ¡. Let’s look at the condition a little more closely. The …rst
derivative of the pro…t function is :
t
(¡) = 1
t
(¡) ÷ (
t
(¡) and the second
derivative is :
tt
(¡) = 1
tt
(¡) ÷ (
tt
(¡). The second-order condition for a
maximum is :
tt
(¡) _ 0, which holds if 1
tt
(¡) _ 0 and (
tt
(¡) _ 0. So,
we can guarantee that pro…t is maximized if the second derivative of the
revenue function is nonpositive and the second derivative of the cost function
is nonnegative. Remembering that (
t
(¡) is marginal cost, the condition
(
tt
(¡) _ 0 means that marginal cost is increasing, and this has an economic
interpretation: each additional unit of output adds more to total cost than
any unit preceding it. The condition 1
tt
(¡) _ 0 means that marginal revenue
is decreasing, which means that the …rm earns less from each additional unit
it sells.
One special case that receives considerable attention in economics is the
one in which 1(¡) = j¡, where j is the price of the good. This is the
CHAPTER 2. SINGLE VARIABLE OPTIMIZATION 17
revenue function for a price-taking …rm in a perfectly competitive industry.
Then 1
t
(¡) = j and 1
tt
(¡) = 0, and the …rst-order condition for pro…t
maximization is j ÷ (
t
(¡) = 0, which is the familiar condition that price
equals marginal cost. The second-order condition reduces to ÷(
tt
(¡) _ 0,
which says that marginal cost must be nondecreasing.
2.5 Logarithms and the exponential function
The functions ln r and c
a
turn out to play an important role in economics.
The …rst is the natural logarithm, and the second is the exponential function.
They are related:
ln c
a
= c
ln a
= r.
The number c - 2.718. Without going into why these functions are special
for economics, let me show you why they are special for math.
We know that
d
dr

r
a
:

= r
a÷1
.
We can get the function r
2
by di¤erentiating r
3
3, the function r by di¤er-
entiating r
2
2, the function r
÷2
by di¤erentiating ÷r
÷1
, the function r
÷3
by
di¤erentiating ÷r
÷2
2, and so on. But how can we get the function r
÷1
?
We cannot get it by di¤erentiating r
0
0, because that expression does not
exist. We cannot get it by di¤erentiating r
0
, because dr
0
dr = 0. So how
do we get r
÷1
as a derivative? The answer is the natural logarithm:
d
dr
ln r =
1
r
.
Logarithms have two additional useful properties:
ln rn = ln r + ln n.
and
ln(r
o
) = c ln r.
Combining these yields
ln(r
o
n
b
) = c ln r +/ ln n. (2.5)
CHAPTER 2. SINGLE VARIABLE OPTIMIZATION 18
The left-hand side of this expression is non-linear, but the right-hand side is
linear in the logarithms, which makes it easier to work with. Economists
often use the form in (??) for utility functions and production functions.
The exponential function c
a
also has an important di¤erentiation prop-
erty: it is its own derivative, that is,
d
dr
c
a
= c
a
.
This implies that the derivative of c
&(a)
= n
t
(r)c
&(a)
.
2.6 Problems
1. Compute the derivatives of the following functions:
(a) 1(r) = 12(r
3
+ 1)
2
+ 3 ln
a
2
÷5r
÷4
(b) 1(r) = 1(4r ÷2)
5
(c) 1(r) = c
÷14a
3
+2a
(d) 1(r) = (9 ln r)r
0.3
(e) 1(r) =
oa
2
÷b
ca÷o
2. Compute the derivative of the following functions:
(a) 1(r) = 12(r ÷1)
2
(b) o(r) = (ln 3r)(4r
2
)
(c) /(r) = 1(3r
2
÷2r + 1)
4
(d) 1(r) = rc
÷a
(e)
o(r) =
(2r
2
÷3)

5r
3
+ 6
8 ÷9r
3. Use the de…nition of the derivative (expression 2.1) to show that the
derivative of r
2
is 2r.
4. Use the de…nition of a derivative to prove that the derivative of 1r is
÷1r
2
.
CHAPTER 2. SINGLE VARIABLE OPTIMIZATION 19
(a) Is 1(r) = 2r
3
÷12r
2
increasing or decreasing at r = 3?
(b) Is 1(r) = ln r increasing or decreasing at r = 13?
(c) Is 1(r) = c
÷a
r
1.5
increasing or decreasing at r = 4?
(d) Is 1(r) =
4a÷1
a+2
increasing or decreasing at r = 2?
(a) Is 1(r) = (3r ÷2)(4r +r
2
) increasing or decreasing at r = ÷1?
(b) Is 1(r) = 1 ln r increasing or decreasing at r = c?
(c) Is 1(r) = 5r
2
+ 16r ÷12 increasing or decreasing at r = ÷6?
7. Optimize the following functions, and tell whether the optimum is a
local maximum or a local minimum:
(a) 1(r) = ÷4r
2
+ 10r
(b) 1(r) = 120r
0.7
÷6r
(c) 1(r) = 4r ÷3 ln r
8. Optimize the following functions, and tell whether the optimum is a
local maximum or a local minimum:
(a) 1(r) = 4r
2
÷24r + 132
(b) 1(r) = 20 ln r ÷4r
(c) 1(r) = 36r ÷(r + 1)(r + 2)
9. Consider the function 1(r) = cr
2
+/r +c.
(a) Find conditions on c, /, and c that guarantee that 1(r) has a
unique global maximum.
(b) Find conditions on c, /, and c that guarantee that 1(r) has a
unique global minimum.
CHAPTER 2. SINGLE VARIABLE OPTIMIZATION 20
10. Beth has a minion (named Henry) and bene…ts when the minion exerts
e¤ort, with minion e¤ort denoted by :. Her bene…t from : units of
minion e¤ort is given by the function /(:). The minion does not like
exerting e¤ort, and his cost of e¤ort is given by the function c(:).
(a) Suppose that Beth is her own minion and that her e¤ort cost func-
tion is also c(:). Find the equation determining how much e¤ort
she would exert, and interpret it.
(b) What are the second-order condtions for the answer in (a) to be a
maximum?
(c) Suppose that Beth pays the minion n per unit of e¤ort. Find the
equation determining how much e¤ort the minion will exert, and
interpret it.
(d) What are the second-order conditions for the answer in (c) to be
a maximum?
11. A …rm (Bilco) can use its manufacturing facility to make either widgets
or gookeys. Both require labor only. The production function for
widgets is
\ = 201
1/2
and the production function for gookeys is
G = 301.
The wage rate is \$11 per unit of time, and the prices of widgets and
gokeys are \$9 and \$3 per unit, repsectively. The manufacturing facility
can accomodate 60 workers and no more. How much of each product
should Bilco produce per unit of time? (Hint: If Bilco devotes 1 units
of labor to widget production it has 60 ÷ 1 units of labor to devote
to gookey production, and its pro…t function is :(1) = 9 201
1/2
+ 3
30(60 ÷1) ÷11 60.)
CHAPTER
3
Optimization with several variables
Almost all of the intuition behind optimization comes from looking at prob-
lems with a single choice variable. In economics, though, problems often in-
volve more than one choice variable. For example, consumers choose bundles
of commodities, so must choose amounts of several di¤erent goods simultane-
ously. Firms use many inputs and must choose their amounts simultaneously.
This chapter addresses issues that arise when there are several variables.
The previous chapter used graphs to generate intuition. We cannot do
that here because I am bad at drawing graphs with more than two dimen-
sions. Instead, our intuition will come from what we learned in the last
chapter.
3.1 A more complicated pro…t function
In the preceding chapter we looked at a pro…t function in which the …rm chose
how much output to produce. This time, instead of focusing on outputs,
let’s focus on inputs. Suppose that the …rm can use : di¤erent inputs, and
21
CHAPTER 3. OPTIMIZATION WITH SEVERAL VARIABLES 22
denote the amounts by r
1
. .... r
a
. When the …rm uses r
1
units of input 1,
r
2
units of input 2, and so on, its output is given by the production function
( = 1(r
1
. .... r
a
).
Inputs are costly, and we will assume that the …rm can purchase as much of
input i as it wants for price :
j
, and it can sell as much of its output as it
wants at the competitive price j. How much of each input should the …rm
use to maximize pro…t?
We know what to do when there is only one input (: = 1). Call the
input labor (1) and its price the wage (n). The production function is then
( = 1(1). When the …rm employs 1 units of labor it produces 1(1) units
of output and sells them for j units each, for revenue of j1(1). Its only
cost is a labor cost equal to n1 because it pays each unit of labor the wage
n. Pro…t, then, is :(1) = j1(1) ÷n1. The …rst-order condition is
:
t
(1) = j1
t
(1) ÷n = 0.
which can be interpreted as the …rm’s pro…t maximizing labor demand equat-
ing the value marginal product of labor j1
t
(1) to the wage rate. Using one
additional unit of labor costs an additional n but increases output by 1
t
(1),
which increases revenue by j1
t
(1). The …rm employs labor as long as each
additional unit generates more revenue than it costs, and stops when the
What happens if there are two inputs (: = 2), call them capital (1)
and labor (1)? The production function is then ( = 1(1. 1), and the
corresponding pro…t function is
:(1. 1) = j1(1. 1) ÷:1 ÷n1. (3.1)
How do we …nd the …rst-order condition? That is the task for this chapter.
3.2 Vectors and Euclidean space
Before we can …nd a …rst-order condition for (3.1), we …rst need some ter-
minology. A vector is an array of : numbers, written (r
1
. .... r
a
). In our
example of the input-choosing pro…t-maximizing …rm, the vector (r
1
. .... r
a
)
is an input vector. For each i between 1 and :, the quantity r
j
is the
amount of the i-th input. More generally, we call r
j
the i-th component of
CHAPTER 3. OPTIMIZATION WITH SEVERAL VARIABLES 23
the vector (r
1
. .... r
a
). The number of components in a vector is called the
dimension of the vector; the vector (r
1
. .... r
a
) is :-dimensional.
Vectors are collections of numbers. They are also numbers themselves,
and it will help you if you begin to think of them this way. The set of real
numbers is commonly denoted by R, and we depict R using a number line.
We can depict a 2-dimensional vector using a coordinate plane anchored by
two real lines. So, the vector (r
1
. r
2
) is in R
2
, which can be thought of as
R R. We call R
2
the 2-dimensional Euclidean space. When you took
plane geometry in high school, this was Euclidean geometry. When a vector
has : components, it is in R
a
, or :-dimensional Euclidean space.
In this text vectors are sometimes written out as (r
1
. .... r
a
), but some-
times that is cumbersome. We use the symbol r to denote the vector whose
components are r
1
. .... r
a
. That way we can talk about operations involving
two vectors, like r and n.
Three common operations are used with vectors. We begin with addition:
r + n = (r
1
+n
1
. r
2
+n
2
. .... r
a
+n
a
).
Multiplication is more complicated, and there are two notions. One is
scalar multiplication. If r is a vector and c is a scalar (a real number),
then
c r = (cr
1
. cr
2
. .... cr
a
).
Scalar multiplication is achieved by multiplying each component of the vec-
tor by the same number, thereby either "scaling up" or "scaling down" the
vector. Vector subtraction can be achieved through addition and using ÷1
as the scalar: r ÷ n = r + (÷1) n. The other form of multiplication is the
inner product, sometimes called the dot product. It is done using the
formula
r n = r
1
n
1
+r
2
n
2
+... +r
a
n
a
.
Vector addition takes two vectors and yields another vector, and scalar mul-
tiplication takes a vector and a scalar and yields another vector. But the
inner product takes two vectors and yields a scalar, or real number. You
might wonder why we would ever want such a thing.
Here is an example. Suppose that a …rm uses : inputs in amounts
r
1
. .... r
a
. It pays :
j
per unit of input i. What is its total production
cost? Obviously, it is :
1
r
1
+... +:
a
r
a
, which can be easily written as : r.
CHAPTER 3. OPTIMIZATION WITH SEVERAL VARIABLES 24
Similarly, if a consumer purchases a commodity bundle given by the vector
r = (r
1
. .... r
a
) and pays prices given by the vector j = (j
1
. .... j
a
), her total
expenditure is j r. Often it is more convenient to leave the "dot" out of
the inner product, and just write j r. A second use comes from looking at
r r = r
2
1
+... +r
2
a
. Then

r r is the distance from the point r (remember,
it’s a number) to the origin. This is also called the norm of the vector r,
and it is written | r| = ( r r)
1
2
.
Both vector addition and the inner product are commutative, that is, they
do not depend on the order in which the two vectors occur. This will contrast
with matrices in a later chapter, where matrix multiplication is dependent
on the order in which the matrices are written.
Vector analysis also requires some de…nitions for ordering vectors. For
real numbers we have the familiar relations , _, =, _, and <. For vectors,
r = n if r
j
= n
j
for all i = 1. ... :;
r _ n if r
j
_ n
j
for all i = 1. .... :;
r n if r _ n but r = n;
and
r n if r
j
n
j
for all i = 1. .... :.
From the third one it follows that r n if r
j
_ n
j
for all i = 1. .... : and
r
j
n
j
for some i between 1 and :. The fourth condition can be read r is
strictly greater than n component-wise.
3.3 Partial derivatives
The trick to maximizing a function of several variables, like (3.1), is to max-
imize it according to each variable separately, that is, by …nding a …rst-order
condition for the choice of 1 and another one for the choice of 1. In general
both of these conditions will depend on the values of both 1 and 1, so we
will have to solve some simultaneous equations. We will get to that later.
The point is that we want to di¤erentiate (3.1) once with respect to 1 and
once with respect to 1.
Di¤erentiating a function of two or more variables with respect to only
one of them is called partial di¤erentiation. Let 1(r
1
. .... r
a
) be a general
function of : variables. The i-th partial derivative of 1 is
·1
·r
j
( r) = lim
I÷0
1(r
1
. r
2
. .... r
j÷1
. r
j
+/. r
j+1
. .... r
a
) ÷1(r
1
. .... r
a
)
/
. (3.2)
CHAPTER 3. OPTIMIZATION WITH SEVERAL VARIABLES 25
This de…nition might be a little easier to see with one more piece of notation.
The coordinate vector c
j
is the vector with components given by c
j
j
= 1
and c
j
;
= 0 when , = i. The …rst coordinate vector is c
1
= (1. 0. .... 0), the
second coordinate vector is c
2
= (0. 1. 0. .... 0), and so on through the :-th
coordinate vector c
a
= (0. .... 0. 1). So, coordinate vector c
j
has a one in the
i-th place and zeros everywhere else. Using coordinate vectors, the de…nition
of the i-th partial derivative in (3.2) can be rewritten
·1
·r
j
( r) = lim
I÷0
1( r +/ c
j
) ÷1( r)
/
.
The i-th partial derivative of the function 1 is simply the derivative one gets
by holding all of the components …xed except for the i-th component. One
takes the partial by pretending that all of the other variables are really just
constants and di¤erentiating as if it were a single-variable function. For
example, consider the function 1(r
1
. r
2
) = (5r
1
÷2)(7r
2
÷3)
2
. The partial
derivatives are 1
1
(r
1
. r
2
) = 5(7r
2
÷3)
2
and 1
2
(r
1
. r
2
) = 14(5r
1
÷2)(7r
2
÷3).
We sometimes use the notation 1
j
( r) to denote ·1( r)·r
j
. When a
function is de…ned over :-dimensional vectors it has : di¤erent partial deriv-
atives.
It is also possible to take partial derivatives of partial derivatives, much
like second derivatives in single-variable calculus. We use the notation
1
j;
( r) =
·
2
1
·r
j
·r
;
( r).
We call 1
jj
( r) the second partial of 1 with respect to r
j
, and we call 1
j;
( r)
the cross partial of 1( r) with respect to r
j
and r
;
. It is important to note
that, in most cases,
1
j;
( r) = 1
;j
( r).
that is, the order of di¤erentiation does not matter. In fact, this result is
important enough to have a name: Young’s Theorem.
Partial di¤erentiation requires a restatement of the chain rule:
Theorem 8 Consider the function 1 : R
a
÷R given by
1(n
1
(r). n
2
(r). .... n
a
(r)).
where n
1
. .... n
a
are functions of the one-dimensional variable r. Then
d1
dr
= 1
1
n
t
1
(r) +1
2
n
t
2
(r) +... +1
a
n
t
a
(r)
CHAPTER 3. OPTIMIZATION WITH SEVERAL VARIABLES 26
This rule is best explained using an example. Suppose that the function is
1(3r
2
÷2. 5 ln r). It’s derivative with respect to r is
d
dr
1(3r
2
÷2. 5 ln r) = 1
1
(3r
2
÷2. 5 ln r) (6r) +1
2
(3r
2
÷2. 5 ln r)

5
r

.
The basic rule to remember is that when variable we are di¤erentiating with
respect to appears in several places in the function, we di¤erentiate with
respect to each argument separately and then add them together. The
following lame example shows that this works. Let 1(n
1
. n
2
) = n
1
n
2
, but the
values n
1
and n
2
are both determined by the value of r, with n
1
= 2r and
n
2
= 3r
2
. Substituting we have
1(r) = (2r)(3r
2
) = 6r
3
d1
dr
= 18r
2
.
But, if we use the chain rule, we get
d1
dr
=
dn
1
dr
n
2
+
dn
2
dr
n
1
= (2)(3r
2
) + (6r)(2r) = 18r
2
.
It works.
3.4 Multidimensional optimization
Let’s return to our original problem, maximizing the pro…t function given in
expression (3.1). The …rm chooses both capital 1 and labor 1 to maximize
:(1. 1) = j1(1. 1) ÷:1 ÷n1.
Think about the …rm as solving two problems simultaneously: (i) given the
optimal amount of labor, 1
+
, the …rm wants to use the amount of capital
that maximizes :(1. 1
+
); and (ii) given the optimal amount of capital, 1
+
,
the …rm wants to employ the amount of labor that maximizes :(1
+
. 1).
Problem (i) translates into
·
·1
:(1
+
. 1
+
) = 0
CHAPTER 3. OPTIMIZATION WITH SEVERAL VARIABLES 27
and problem (ii) translates into
·
·1
:(1
+
. 1
+
) = 0.
Thus, optimization in several dimensions is just like optimization in each
single dimension separately, with the provision that all of the optimization
problems must be solved together. The two equations above are the …rst-
order conditions for the pro…t-maximization problem.
To see how this works, suppose that the production function is 1(1. 1) =
1
1/2
+ 1
1/2
, that the price of the good is j = 10, the price of capital is
: = 5, and the wage rate is n = 4. Then the pro…t function is :(1. 1) =
10(1
1/2
+1
1/2
) ÷51 ÷41. The …rst-order conditions are
·
·1
:(1. 1) = 51
÷1/2
÷5 = 0
and
·
·1
:(1. 1) = 51
÷1/2
÷4 = 0.
The …rst equation gives us 1
+
= 1 and the second gives us 1
+
= 2516.
In this example the two …rst-order conditions were independent, that is, the
FOC for 1 did not depend on 1 and the FOC for 1 did not depend on 1.
This is not always the case, as shown by the next example.
Example 1 The production function is 1(1. 1) = 1
1/4
1
1/2
, the price is 12,
the price of capital is : = 6, and the wage rate is n = 6. Find the optimal
values of 1 and 1.
Solution. The pro…t function is
:(1. 1) = 121
1/4
1
1/2
÷61 ÷61.
The …rst-order conditions are
·
·1
:(1. 1) = 31
÷3/4
1
1/2
÷6 = 0 (3.3)
and
·
·1
:(1. 1) = 61
1/4
1
÷1/2
÷6 = 0. (3.4)
CHAPTER 3. OPTIMIZATION WITH SEVERAL VARIABLES 28
To solve these, note that (3.4) can be rearranged to get
1
1/4
1
1/2
= 1
1
1/4
= 1
1/2
1 = 1
2
where the last line comes from raising both sides to the fourth power. Plug-
ging this into (3.3) yields
1
1/2
1
3/4
= 2
1
1/2
(1
2
)
3/4
= 2
1
1/2
1
3/2
= 2
1
1
= 2
1 =
1
2
.
Plugging this back into 1 = 1
2
yields 1 = 14.
This example shows the steps for solving a multi-dimensional optimization
problem.
Now let’s return to the general problem to see what the …rst-order con-
ditions tell us. The general pro…t-maximization problem is
max
a
1
.....an
j1(r
1
. .... r
a
) ÷:
1
r
1
÷... ÷:
a
r
a
or, in vector notation,
max
a
j1( r) ÷: r.
The …rst-order conditions are:
j1
1
(r
1
. .... r
a
) ÷:
1
= 0
.
.
.
j1
a
(r
1
. .... r
a
) ÷:
a
= 0.
The i-th FOC is j1
j
( r) = :
j
, which is the condition that the value marginal
product of input i equals its price. This is the same as the condition for a
single variable, and it holds for every input.
CHAPTER 3. OPTIMIZATION WITH SEVERAL VARIABLES 29
3.5 Comparative statics analysis
Being able to do multivariate calculus allows us to do one of the most impor-
tant tasks in microeconomics: comparative statics analysis. The standard
comparative statics questions is, "How does the optimum change when one
of the underlying variables changes?" For example, how does the …rm’s de-
mand for labor change when the output price changes, or when the wage rate
changes?
This is an important problem, and many papers (and dissertations) have
relied on not much more than comparative statics analysis. If there is one
tool you have in your kit at the end of the course, it should be comparative
statics analysis.
To see how it works, let’s return to the pro…t maximization problem with
a single choice variable:
max
1
j1(1) ÷n1.
The FOC is
j1
t
(1) ÷n = 0. (3.5)
The comparative statics question is, how does the optimal value of 1 change
when j changes?
To answer this, let’s …rst assume that the marginal product of labor is
strictly decreasing, so that
1
tt
(1) < 0.
Note that this guarantees that the second-order condition for a maximization
is satis…ed. The trick we now take is to implicitly di¤erentiate equation
(3.5) with respect to j, treating 1 as a function of j. In other words, rewrite
the FOC so that 1 is replaced by the function 1
+
(j):
j1
t
(1
+
(j)) ÷n = 0
and di¤erentiate both sides of the expression with respect to j. We get
1
t
(1
+
(j)) +j1
tt
(1
+
(j))
d1
+
dj
= 0.
The comparative statics question is nowsimply the astrology question, "What
is the sign of d1
+
dj?" Rearranging the above equation to isolate d1
+
dj
on the left-hand side gives us
d1
+
dj
= ÷
1
t
(1
+
)
j1
tt
(1
+
)
.
CHAPTER 3. OPTIMIZATION WITH SEVERAL VARIABLES 30
We know that 1
t
(1) 0 because production functions are increasing, and
we know that j1
tt
(1
+
) < 0 because we assumed strictly diminishing marginal
product of labor, i.e. 1
tt
(1) < 0. So, d1
+
dj has the form of the negative of
a ratio of a positive number to a negative number, which is positive. This
tells us that the …rm demands more labor when the output price rises, which
makes sense: when the output price rises producing output becomes more
pro…table, and so the …rm wants to expand its operation to generate more
pro…t.
We can write the comparative statics problem generally. Suppose that
the objective function, that is, the function the agent wants to optimize,
is 1(r. :), where r is the choice variable and : is a shift parameter. Assume
that the second-order condition holds strictly, so that 1
aa
(r. :) < 0 for a
maximization problem and 1
aa
(r. :) 0 for a minimization problem. These
conditions guarantee that there is no "‡at spot" in the objective function,
so that there is a unique solution to the …rst-order condition. Let r
+
denote
the optimal value of r. The comparative statics question is, "What is the
sign of dr
+
d:?" To get the answer, …rst derive the …rst-order condition:
1
a
(r. :) = 0.
Next implicitly di¤erentiate with respect to : to get
1
aa
(r. :)
dr
+
d:
+1
ac
(r. :) = 0.
Rearranging yields the comparative statics derivative
dr
+
d:
= ÷
1
ac
(r. :)
1
aa
(r. :)
. (3.6)
We want to know the sign of this derivative.
For a maximization problem we have 1
aa
< 0 by assumption and so the
negative sign cancels out the sign of the denominator. Consequently, the
sign of the comparative statics derivative dr
+
d: is the same as the sign of
the numerator, 1
ac
(r. :). For a minimization problem we have 1
aa
0, and
so the comparative statics derivative dr
+
d: has the opposite sign from the
partial derivative 1
ac
(r. :).
Example 2 A person must decide how much to work in a single 24-hour
day. She gets paid n per hour of labor, but gets utility from each hour she
CHAPTER 3. OPTIMIZATION WITH SEVERAL VARIABLES 31
does not spend at work. This utility is given by the function n(t), where t is
the amount of time spent away from work, and n has the properties n
t
(t) 0
and n
tt
(t) < 0. Does the person work more or less when her wage increases?
Solution. Let 1 denote the amount she works, so that 24 ÷ 1 is is the
amount of time she does not spend at work. Her utility from this leisure
time is therefore n(24 ÷1), and her objective is
max
1
n1 +n(24 ÷1).
The FOC is
n ÷n
t
(24 ÷1
+
) = 0.
Write 1
+
as a function of n to get 1
+
(n) and rewrite the FOC as:
n ÷n
t
(24 ÷1
+
(n)) = 0.
Di¤erentiate both sides with respect to n (this is implicit di¤erentiation) to
get
1 +n
tt
(24 ÷1
+
)
d1
+
dn
= 0.
Solving for the comparative statics derivative d1
+
dn yields
d1
+
dn
= ÷
1
n
tt
(24 ÷1
+
)
0.
She works more when the wage rate n increases.
3.5.1 An alternative approach (that I don’t like)
Many people use an alternative approach to comparative statics analysis. It
gets to the same answers, but I do not like this approach as much. We will
get to why later.
The approach begins with total di¤erential, and the total di¤erential
of the function o(r
1
. .... r
a
) is
do = o
1
dr
1
+... +o
a
dr
a
.
We want to use total di¤erentials to get comparative statics derivatives.
CHAPTER 3. OPTIMIZATION WITH SEVERAL VARIABLES 32
Remember our comparative statics problem: we choose r to optimize
1(r. :). The FOC is
1
a
(r. :) = 0.
Let’s take the total di¤erential of both sides. The total di¤erential of the
right-hand side is zero, and the total di¤erential of the left-hand side is
d[1
a
(r. :)] = 1
aa
dr +1
ac
d:.
Setting the above expression equal to zero yields
1
aa
dr +1
ac
d: = 0.
The derivative we want is the comparative statics derivative drd:. We can
solve for this expression in the above equation:
1
aa
dr +1
ac
d: = 0 (3.7)
1
aa
dr = ÷1
ac
d:
dr
d:
= ÷
1
ac
1
aa
.
This is exactly the comparative statics derivative we found above in equation
(3.6). So the method works, and many students …nd it straightforward and
easier to use than implicit di¤erentiation.
Let’s stretch our techniques a little and have a problem with two shift pa-
rameters, : and :, instead of just one. The problem is to optimize 1(r. :. :),
and the FOC is
1
a
(r. :. :) = 0.
If we want to do comparative statics analysis using our (preferred) implicit
di¤erentiation approach, we would …rst write r as a function of the two shift
parameters, so that
1
a
(r(:. :). :. :) = 0.
To …nd the comparative statics derivative drd:, we implicitly di¤erentiate
with respect to : to get
1
aa
dr
d:
+1
av
= 0
dr
d:
= ÷
1
av
1
aa
.
CHAPTER 3. OPTIMIZATION WITH SEVERAL VARIABLES 33
This should not be a surprise, since it is just like expression (3.6) except it
replaces : with :. Using total di¤erentials, we would …rst take the total
di¤erential of 1
a
(r. :. :) to get
1
aa
dr +1
av
d: +1
ac
d: = 0.
We want to solve for drd:, and doing so yields
1
aa
dr +1
av
d: +1
ac
d: = 0
1
aa
dr = ÷1
av
d: ÷1
ac
d:
dr
d:
= ÷
1
av
1
aa
÷
1
ac
1
aa
d:
d:
.
On the face of it, this does not look like the same answer. But, both : and :
are shift parameters, so : is not a function of :. That means that d:d: = 0.
Substituting this in yields
dr
d:
= ÷
1
av
1
aa
as expected.
So what is the di¤erence between the two approaches? In the implicit
di¤erentiation approach we recognized that : does not depend on : at the
beginning of the process, and in the total di¤erential approach we recognized
it at the end. So both work, it’s just a matter of when you want to do your
remembering.
All of that said, I still like the implicit di¤erentiation approach better. To
see why, think about what the derivative drd: means. As we constructed it
back in equation (2.1), drd: is the limit of r: as : ÷0. According
to this intuition, d: is the limit of : as it goes to zero, so d: is zero. But
we divided by it in equation (3.7), and you were taught very young that you
cannot divide by zero. So, on a purely mathematical basis, I object to the
total di¤erential approach because it entails dividing by zero, and I prefer to
think of drd: as a single entity with a long name, and not a ratio of dr and
d:. On a practical level, though, the total di¤erential approach works just
…ne. It’s just not going to show up anywhere else in this book.
3.6 Problems
1. Consider the vectors r = (4. ÷3. 6. 2) and n = (6. 1. 7. 7).
CHAPTER 3. OPTIMIZATION WITH SEVERAL VARIABLES 34
(a) Write down the vector 2 n + 3 r.
(b) Which of the following, if any, are true: r = n, r _ n, r < n, or
r < n?
(c) Find the inner product r n.
(d) Is

r r +

n n _

( r + n) ( r + n)?
2. Consider the vectors r = (5. 0. ÷6. ÷2) and n = (3. 2. 3. 2).
(a) Write down the vector 6 r ÷4 n.
(b) Find the inner product r n.
(c) Verify that

r r +

n n

( r + n) ( r + n).
3. Consider the function 1(r. n) = 4r
2
+ 3n
2
÷12rn + 18r.
(a) Find the partial derivative 1
a
(r. n).
(b) Find the partial derivative 1
&
(r. n).
(c) Find the critical point of 1.
4. Consider the function 1(r. n) = 16rn ÷4r + 2n.
(a) Find the partial derivative 1
a
(r. n).
(b) Find the partial derivative 1
&
(r. n).
(c) Find the critical point of 1.
5. Consider the function n(r. n) = 3 ln r + 2 ln n.
(a) Write the equation for the indi¤erence curve corresponding to the
utility level /.
(b) Find the slope of the indi¤erence curve at point (r. n).
6. A …rm faces inverse demand function j(¡) = 120 ÷ 4¡, where ¡ is the
…rm’s output. Its cost function is c¡.
(a) Write down the …rm’s pro…t function.
(b) Find the pro…t-maximizing level of pro…t as a function of the unit
cost c.
CHAPTER 3. OPTIMIZATION WITH SEVERAL VARIABLES 35
(c) Find the comparative statics derivative d¡dc. Is it positive or
negative?
(d) Write the maximized pro…t function as a function of c.
(e) Find the derivative showing how pro…t changes when c changes.
(f) Show that d:dc = ÷¡.
7. Find drdc from each of the following expressions.
(a)
15r
2
+ 3rc ÷5
r
c
= 20.
(b)
6r
2
c = 5c ÷5rc
2
8. Each worker at a …rm can produce 4 units per hour, each worker must
be paid \$n per hour, and the …rm’s revenue function is 1(1) = 30

1,
where 1 is the number of workers employed (fractional workers are
okay). The …rm’s pro…t function is :(1) = 30

41 ÷n1.
(a) Show that 1
+
= 900n
2
.
(b) Find d1
+
dn. What’s its sign?
(c) Find d:
+
dn. What’s its sign?
9. An isoquant is a curve showing the combinations of inputs that all lead
to the same level of output. When the production function over capital
1 and labor 1 is 1(1. 1), an isoquant corresponding to 100 units of
output is given by the equation 1(1. 1) = 100.
(a) If capital is on the vertical axis and labor on the horizontal, …nd
the slope of the isoquant.
(b) Suppose that production is increasing in both capital and labor.
Does the isoquant slope upward or downward?
CHAPTER
4
Constrained optimization
Microeconomics courses typically begin with either consumer theory or pro-
ducer theory. If they begin with consumer theory the …rst problem they face
is the consumer’s constrained optimization problem: the consumer chooses a
commodity bundle, or vector of goods, to maximize utility without spending
more than her budget. If the course begins with producer theory, the …rst
problem it poses is the …rm’s cost minimization problem: the …rm chooses a
vector of inputs to minimize the cost of producing a predetermined level of
output. Both of these problems lead to a graph of the form in Figure 1.1.
So far we have only looked at unconstrained optimization problems. But
many problems in economics have constraints. The consumer’s budget con-
straint is a classic example. Without a budget constraint, and under the
assumption that more is better, a consumer would choose an in…nite amount
of every good. This obviously does not help us describe the real world, be-
cause consumers cannot purchase unlimited quantities of every good. The
budget constraint is an extra condition that the optimum must satisfy. How
do we make sure that we get the solution that maximizes utility while still
letting the budget constraint hold?
36
CHAPTER 4. CONSTRAINED OPTIMIZATION 37
E
x
2
x
1
Figure 4.1: Consumer’s problem
4.1 A graphical approach
Let’s look more carefully at the consumer’s two-dimensional maximization
problem. The problem can be written as follows:
max
a
1
.a
2
n(r
1
. r
2
)
s.t. j
1
r
1
+j
2
r
2
= `
where j
j
is the price of good i and ` is the total amount the consumer has to
spend on consumption. The second line is the budget constraint, it says that
total expenditure on the two goods is equal to `. The abbreviation "s.t."
stands for "subject to," so that the problem for the consumer is to choose a
commodity bundle to maximize utility subject to a budget constraint.
Figure 4.1 shows the problem graphically, and it should be familiar.
What we want to do in this section is …gure out what equations we need
to characterize the solution. The optimal consumption point, 1, is a tan-
gency point, and it has two features: (i) it is where the indi¤erence curve is
tangent to the budget line, and (ii) it is on the budget line. Let’s translate
these into math.
To …nd the tangency condition we must …gure out how to …nd the slopes
of the two curves. We can do this easily using implicit di¤erentiation. Begin
with the budget line, because it’s easier. Since r
2
is on the vertical axis, we
CHAPTER 4. CONSTRAINED OPTIMIZATION 38
want to …nd a slope of the form dr
2
dr
1
. Treating r
2
as a function of r
1
and rewriting the budget constraint yields
j
1
r
1
+j
2
r
2
(r
1
) = `.
Implicit di¤erentiation gives us
j
1
+j
2
dr
2
dr
1
= 0
because the derivative of ` with respect to r
1
is zero. Rearranging yields
dr
2
dr
1
= ÷
j
1
j
2
. (4.1)
Of course, we could have gotten to the same place by rewriting the equation
for the budget line in slope-intercept form, r
2
= `j
2
÷ (j
1
j
2
)r
1
, but we
have to use implicit di¤erentiation anyway to …nd the slope of the indi¤erence
curve, and it is better to apply it …rst to the easier case.
Now let’s …nd the slope of the indi¤erence curve. The equation for an
indi¤erence curve is
n(r
1
. r
2
) = /
for some scalar /. Treat r
2
as a function of r
1
and rewrite to get
n(r
1
. r
2
(r
1
)) = /.
Now implicitly di¤erentiate with respect to r
1
to get
·n(r
1
. r
2
)
·r
1
+
·n(r
1
. r
2
)
·r
2
dr
2
dr
1
= 0.
Rearranging yields
dr
2
dr
1
= ÷
0&(a
1
.a
2
)
0a
1
0&(a
1
.a
2
)
0a
2
. (4.2)
The numerator is the marginal utility of good 1, and the denominator is
the marginal utility of good 2, so the slope of the indi¤erence curve is the
negative of the ratio of marginal utilities, which is also known as the marginal
rate of substitution.
CHAPTER 4. CONSTRAINED OPTIMIZATION 39
Condition (i), that the indi¤erence curve and budget line are tangent,
requires that the slope of the budget line in (4.1) is the same as the slope of
the indi¤erence curve in (4.2), or
÷
n
1
(r
1
. r
2
)
n
2
(r
1
. r
2
)
= ÷
j
1
j
2
. (4.3)
The other condition, condition (ii), says that the bundle (r
1
. r
2
) must lie on
the budget line, which is simply
j
1
r
1
+j
2
r
2
= `. (4.4)
Equations (4.3) and (4.4) constitute two equations in two unknowns (r
1
and r
2
), and so they completely characterize the solution to the consumer’s
optimization problem. The task now is to characterize the solution in a
more general setting with more dimensions.
4.2 Lagrangians
The way that we solve constrained optimization problems is by using a trick
developed by the 18-th century Italian-French mathematician Joseph-Louis
Lagrange. (There is also a 1972 ZZ Top song called La Grange, so don’t
get confused.) Suppose that our objective is to solve an :-dimensional
constrained utility maximization problem:
max
a
1
.....an
n(r
1
. .... r
a
)
s.t. j
1
r
1
+... +j
a
r
a
= `.
Our …rst step is to set up the Lagrangian
L(r
1
. .... r
a
. \) = n(r
1
. .... r
a
) +\(` ÷j
1
r
1
÷... ÷j
a
r
a
).
This requires some interpretation. First of all, the variable \ is called the
Lagrange multiplier (and the Greek letter is lambda). Second, let’s think
1
r
1
÷... ÷j
a
r
a
. It has to be zero according to the
budget constraint, but suppose it was positive. What would it mean? ` is
income, and j
1
r
1
+... +j
a
r
a
is expenditure on consumption. Income minus
expenditure is simply unspent income. But unspent income is measured in
dollars, and utility is measured in utility units (or utils), so we cannot simply
CHAPTER 4. CONSTRAINED OPTIMIZATION 40
add these together. The Lagrange multiplier converts the dollars into utils,
and is therefore measured in utils/dollar. The expression \(`÷j
1
r
1
÷... ÷
j
a
r
a
) can be interpreted as the utility of unspent income.
The Lagrangian, then, is the utility of consumption plus the utility of
unspent income. The budget constraint, though, guarantees that there is
no unspent income, and so the second term in the Lagrangian is necessarily
zero. We still want it there, though, because it is important for …nding the
right set of …rst-order conditions.
Note that the Lagrangian has not only the r
j
’s as arguments, but also
the Lagrange multiplier \. The …rst-order conditions arise from taking :+1
partial derivatives of L, one for each of the r
j
’s and one for \:
·L
·r
1
=
·n
·r
1
÷\j
1
= 0 (4.5a)
.
.
.
·L
·r
a
=
·n
·r
a
÷\j
a
= 0 (4.5b)
·L
·\
= ` ÷j
1
r
1
÷... ÷j
a
r
a
= 0 (4.5c)
Notice that the last FOC is simply the budget constraint. So, optimiza-
tion using the Lagrangian guarantees that the budget constraint is satis…ed.
Also, optimization using the Lagrangian turns the :-dimensional constrained
optimization problem into an (:+1)-dimensional unconstrained optimization
problem. These two features give the Lagrangian approach its appeal.
4.3 A 2-dimensional example
The utility function is n(r
1
. r
2
) = r
0.5
1
r
0.5
2
, the prices are j
1
= 10 and j
2
= 20,
and the budget is ` = 120. The consumer’s problem is then
max
a
1
.a
2
r
1/2
1
r
1/2
2
s.t. 10r
1
+ 20r
2
= 120.
What are the utility maximizing levels of r
1
and r
2
?
To answer this, we begin by setting up the Lagrangian
L(r
1
. r
2
. \) = r
1/2
1
r
1/2
2
+\(120 ÷10r
1
÷20r
2
).
CHAPTER 4. CONSTRAINED OPTIMIZATION 41
The …rst-order conditions are
·L
·r
1
= 0.5r
÷1/2
1
r
1/2
2
÷10\ = 0 (4.6a)
·L
·r
2
= 0.5r
1/2
1
r
÷1/2
2
÷20\ = 0 (4.6b)
·L
·\
= 120 ÷10r
1
÷20r
2
= 0 (4.6c)
Of course, equation (4.6c) is simply the budget constraint.
We have three equations in three unknowns (r
1
, r
2
, and \). To solve
them, …rst rearrange (4.6a) and (4.6b) to get
\ =
1
20

r
2
r
1

1/2
and
\ =
1
40

r
1
r
2

1/2
.
Set these equal to each other to get
1
20

r
2
r
1

1/2
=
1
40

r
1
r
2

1/2
2

r
2
r
1

1/2
=

r
1
r
2

1/2
2r
2
= r
1
where the last line comes from cross-multiplying. Substitute r
1
= 2r
2
into
(4.6c) to get
120 ÷10(2r
2
) ÷20r
2
= 0
40r
2
= 120
r
2
= 3.
Because r
1
= 2r
2
, we have
r
1
= 6.
CHAPTER 4. CONSTRAINED OPTIMIZATION 42
Finally, we know from the rearrangement of (4.6a) that
\ =
1
20

r
2
r
1

1/2
=
1
20

3
6

1/2
=
1
20

2
.
4.4 Interpreting the Lagrange multiplier
Remember that we said that the second term in the Lagrangian is the utility
value of unspent income, which, of course, is zero because there is no unspent
income. This term is \(` ÷ j
1
r
1
÷ j
2
r
2
). So, the Lagrange multiplier \
should be the marginal utility of (unspent) income, because it is the slope of
the utility-of-unspent-income function. Let’s see if this is true.
To do so, let’s generalize the problem so that income is ` instead of
120. All of the steps are the same as above, so we still have r
1
= 2r
2
.
Substituting into the budget constraint gives us
` ÷10(2r
2
) ÷20r
2
= 0
r
2
=
`
40
r
1
=
`
20
\ =
1
20

2
.
Plugging these numbers back into the utility function gives us
n(r
1
. r
2
) =

`
20

0.5

`
40

0.5
=
`
20

2
.
Di¤erentiating this expression with respect to income ` yields
dn
d`
=
1
20

2
= \.
and the Lagrange multiplier really does measure the marginal utility of in-
come.
CHAPTER 4. CONSTRAINED OPTIMIZATION 43
In general, the Lagrange multiplier measures the marginal value of relaxing
the constraint, where the units used to measure value are determined by the
objective function. In our case the objective function is a utility function, so
the marginal value is marginal utility. The constraint is relaxed by allowing
income to be higher, so the Lagrange multiplier measures the marginal utility
of income.
j
be
the amount of input i employed by the …rm, let :
j
be its price, let 1(r
1
. .... r
a
)
be the production function, and let ¡ be the desired level of output. The
…rm’s problem would be
min
a
1
.....an
:
1
r
1
+... +:
a
r
a
s.t. 1(r
1
. .... r
a
) = ¡
The Lagrangian is then
L(r
1
. .... r
a
. \) = :
1
r
1
+... +:
a
r
a
+\(¡ ÷1(r
1
. .... r
a
)).
Since the …rm is minimizing cost, reducing cost from the optimum would
require reducing the output requirement ¡. So, relaxing the constraint is
lowering ¡. The interpretation of \ is the marginal cost of output, which
was referred to simply as marginal cost way back in Chapter 2. So, using
the Lagrangian to solve the …rm’s cost minimization problem gives you the
4.5 A useful example - Cobb-Douglas
Economists often rely on the Cobb-Douglas class of functions which take the
form
1(r
1
. .... r
a
) = r
o
1
1
r
o
2
2
r
on
a
where all of the c
j
’s are positive. The functional form arose out of a 1928
collaboration between economist Paul Douglas and mathematician Charles
Cobb, and was designed to …t Douglas’s production data.
To see its usefulness, consider a consumer choice problem with a Cobb-
Douglas utility function n( r) = r
o
1
1
r
o
2
2
r
on
a
:
max
a
1
.....an
n( r)
CHAPTER 4. CONSTRAINED OPTIMIZATION 44
s.t. j
1
r
1
+... +j
a
r
a
= `.
Form the Lagrangian
L(r
1
. .... r
a
. \) = r
o
1
1
r
on
a
+\(` ÷j
1
r
1
÷... ÷j
a
r
a
).
The …rst-order conditions take the form
·L
·r
j
=
c
j
r
j
r
o
1
1
r
on
a
÷\j
j
= 0
for i = 1. .... : and
·L
·\
= ` ÷j
1
r
1
÷... ÷j
a
r
a
= 0.
which is just the budget constraint. The expression for ·L·r
j
can be
rearranged to become
c
j
r
j
r
o
1
1
r
on
a
÷\j
j
=
c
j
r
j
n( r) ÷\j
j
= 0.
This yields that
j
j
=
c
j
n( r)
\r
j
(4.7)
for i = 1. .... :. Substitute these into the budget constraint:
` ÷j
1
r
1
÷... ÷j
a
r
a
= 0
` ÷
c
1
n( r)
\r
1
r
1
÷... ÷
c
a
n( r)
\r
a
r
a
= 0
` ÷
n( r)
\
(c
1
+... +c
a
) = 0.
Now solve this for the Lagrange multiplier \:
\ = n( r)
c
1
+... +c
a
`
.
Finally, plug this back into (4.7) to get
j
j
=
c
j
n( r)
r
j

1
\
=
c
j
n( r)
r
j

`
n( r)(c
1
+... +c
a
)
=
c
j
c
1
+... +c
a

`
r
j
.
CHAPTER 4. CONSTRAINED OPTIMIZATION 45
Finally, solve this for r
j
to get the demand function for good i:
r
j
=
c
j
c
1
+... +c
a

`
j
j
. (4.8)
That was a lot of steps, but rearranging (4.8) yields an intuitive and easily
memorizable expression. In fact, most graduate students in economics have
memorized it by the end of their …rst semester because it turns out to be so
handy. Rearrange (4.8) to
j
j
r
j
`
=
c
j
c
1
+... +c
a
.
The numerator of the left-hand side is the amount spent on good i. The
denominator of the left-hand side is the total amount spent. The left-hand
side, then, is the share of income spent on good i. The equation says that
the share of spending is determined entirely by the exponents of the Cobb-
Douglas utility function. In especially convenient cases the exponents sum to
one, in which case the spending share for good i is just equal to the exponent
on good i.
The demand function in (4.8) lends itself to some particularly easy com-
parative statics analysis. The obvious comparative statics derivative for a
demand function is with respect to its own price:
dr
j
dj
j
= ÷
c
j
c
1
+... +c
a

`
j
2
j
_ 0
and so demand is downward-sloping, as it should be. Another comparative
statics derivative is with respect to income:
dr
j
d`
=
c
j
c
1
+... +c
a

1
j
j
_ 0.
All goods are normal goods when the utility function takes the Cobb-Douglas
form. Finally, one often looks for the e¤ects of changes in the prices of other
goods. We can do this by taking the comparative statics derivative of r
j
with respect to price j
;
, where , = i.
dr
j
dj
;
= 0.
This result holds because the other prices appear nowhere in the demand
function (4.8), which is another feature that makes Cobb-Douglas special.
CHAPTER 4. CONSTRAINED OPTIMIZATION 46
We can also use Cobb-Douglas functions in a production setting. Con-
sider the …rm’s cost-minimization problem when the production function is
Cobb-Douglas, so that 1( r) = r
o
1
1
r
on
a
. This time, though, we are going
to assume that c
1
+... +c
a
= 1. The problem is
min
a
1
.....an
j
1
r
1
+... +j
a
r
a
s.t. r
o
1
1
r
on
a
= ¡.
Set up the Lagrangian
L(r
1
. .... r
a
. \) = j
1
r
1
+... +j
a
r
a
+\(¡ ÷r
o
1
1
r
on
a
).
The …rst-order conditions take the form
·L
·r
j
= j
j
÷c
j
\
1( r)
r
j
= 0
for i = 1. .... : and
·L
·\
= ¡ ÷r
o
1
1
r
on
a
= 0.
which is just the production constraint. Rearranging the expression for
·L·r
j
yields
r
j
= c
j
\
¡
j
j
. (4.9)
because the production constraint tells us that 1( r) = ¡. Plugging this into
the production constraint give us

c
1
\
¡
j
1

o
1

c
a
\
¡
j
a

on
= ¡

c
1
j
1

o
1

c
a
j
a

on
\
o
1
+...+on
¡
o
1
+...+on
= ¡.
But c
1
+... +c
a
= 1, so the above expression reduces further to

c
1
j
1

o
1

c
a
j
a

on
\¡ = ¡

c
1
j
1

o
1

c
a
j
a

on
\ = 1
CHAPTER 4. CONSTRAINED OPTIMIZATION 47
\ =

j
1
c
1

o
1

j
a
c
a

on
. (4.10)
We can substitute this back into (4.9) to get
r
j
= c
j
\
¡
j
j
=
c
j
j
j

j
1
c
1

o
1

j
a
c
a

on
¡.
This is the input demand function, and it depends on the amount of output
being produced (¡), the input prices (j
1
. .... j
a
), and the exponents of the
Cobb-Douglas production function.
This doesn’t look particularly useful or intuitive. It can be, though.
Plug it back into the original objective function j
1
r
1
+ ... + j
a
r
a
to get the
cost function
((¡) = j
1
r
1
+... +j
a
r
a
= j
1
c
1
j
1

j
1
c
1

o
1

j
a
c
a

on
¡ +... +j
a
c
a
j
a

j
1
c
1

o
1

j
a
c
a

on
¡
= c
1

j
1
c
1

o
1

j
a
c
a

on
¡ +... +c
a

j
1
c
1

o
1

j
a
c
a

on
¡
= (c
1
+... +c
a
)

j
1
c
1

o
1

j
a
c
a

on
¡
=

j
1
c
1

o
1

j
a
c
a

on
¡.
where the last equality holds because c
1
+... +c
a
= 1.
This one is pretty easy to remember. And it has a cool comparative
statics result:
d((¡)

=

j
1
c
1

o
1

j
a
c
a

on
. (4.11)
Why is this cool? There are three reasons. First, d((¡)d¡ is marginal
cost, and ¡ appears nowhere on the right-hand side. This means that the
Cobb-Douglas production function gives us constant marginal cost. Second,
compare the marginal cost function to the original production function:
1( r) = r
o
1
1
r
on
a
`((¡) =

j
1
c
1

o
1

j
a
c
a

on
.
CHAPTER 4. CONSTRAINED OPTIMIZATION 48
You can get the marginal cost function by replacing the r
j
’s in the production
function with the corresponding j
j
c
j
. And third, remember how, at the
end of the the last section on interpreting the Lagrange multiplier, we said
that in a cost-minimization problem the Lagrange multiplier is just marginal
cost? Compare equations (4.10) and (4.11). They are the same. I told you
so.
4.6 Problems
1. Use the Lagrange multiplier method to solve the following problem:
max
a.&
12r
2
n
4
s.t. 2r + 4n = 120
[Hint: You should be able to check your answer against the general
version of the problem in Section 4.5.]
2. Solve the following problem:
max
o.b
3 ln c + 2 ln /
s.t. 12c + 14/ = 400
3. Solve the following problem:
min
a.&
16r +n
r
1/4
n
3/4
= 1
4. Solve the following problem:
max
a.&
3rn + 4r
s.t. 4r + 12n = 80
5. Solve the following problem:
min
a.&
5r + 2n
s.t. 3r + 2rn = 80
CHAPTER 4. CONSTRAINED OPTIMIZATION 49
6. This is a lame but instructive problem. A farmer has 10 acres of land
and uses it to grow corn. Pro…t from growing an acre of corn is given
by :(r) = 400r + 2r
2
, where r is the number of acres of corn planted.
So, the farmer’s problem is
max
a
400r + 2r
2
s.t. r = 10
(a) Find the …rst derivative of the pro…t function. Does its sign make
sense?
(b) Find the second derivative of the pro…t function. Does its sign
make sense?
(c) Set up the Lagrangian and use it to …nd the optimal value of r.
(Hint: It had better be 10.)
(d) Interpret the Lagrange multiplier.
(e) Find the marginal value of an acre of land without using the La-
grange multiplier.
(f) The second derivative of the pro…t function is positive. Does that
mean that pro…t is minimized when r = 10?
7. Another lame but instructive problem: A …rm has the capacity to use
4 workers at a time. Each worker can produce 4 units per hour, each
worker must be paid \$10 per hour, and the …rm’s revenue function is
1(1) = 30

1, where 1 is the number of workers employed (fractional
workers are okay). The …rm’s pro…t function is :(1) = 30

41÷101.
It must solve the problem
max
1
30

41 ÷101
s.t. 1 = 4
(a) Find the …rst derivative of the pro…t function. Does its sign make
sense?
(b) Find the second derivative of the pro…t function. Does its sign
make sense?
(c) Set up the Lagrangian and use it to …nd the optimal value of 1.
[Hint: It had better be 4.]
CHAPTER 4. CONSTRAINED OPTIMIZATION 50
(d) Interpret the Lagrange multiplier.
(e) Find the marginal pro…t from a worker without using the Lagrange
multiplier.
(f) The second derivative of the pro…t function is negative. Does that
mean pro…t is maximized when 1 = 4?
8. Here is the obligatory comparative statics problem. A consumer
chooses r and n to
max
a.&
r
c
n
1÷c
s.t. j
a
r +j
&
n = `
where j
a
0 is the price of good r, j
&
0 is the price of good n,
` 0 is income, and 0 < c < 1.
(a) Show that r
+
= c`j
a
and n
+
= (1 ÷c)`j
&
.
(b) Find ·r
+
·` and ·n
+
·`. Can you sign them?
(c) Find ·r
+
·j
a
and ·n
+
·j
a
. Can you sign them?
9. This is the same as problem 2.11 but done using Lagrange multipliers.
A …rm (Bilco) can use its manufacturing facility to make either widgets
or gookeys. Both require labor only. The production function for
widgets is
\ = 20n
1/2
.
where n denotes labor devoted to widget production, and the produc-
tion function for gookeys is
G = 30o,
where o denotes labor devoted to gookey production. The wage rate
is \$11 per unit of time, and the prices of widgets and gokeys are \$9 and
\$3 per unit, repsectively. The manufacturing facility can accomodate
60 workers and no more.
(a) Use a Lagrangian to determine how much of each product Bilco
should produce per unit of time.
(b) Interpret the Lagrange multiplier.
CHAPTER 4. CONSTRAINED OPTIMIZATION 51
10. A farmer has a …xed amount 1 of fencing material that she can use to
enclose a property. She does not yet have the property, but it will be a
rectangle with length 1 and width \. Furthermore, state law dictates
that every property must have a side smaller than o in length, and in
this case o < 14. [This last condition makes the constraint binding,
and other than that you need not worry about it.] By convention, \
is always the short side, so the state law dictates that \ _ o. The
farmer wants to use the fencing to enclose the largest possible area, and
she also wants to obey the law.
(a) Write down the farmer’s constrained maximization problem. [Hint:
There should be two constraints.]
(b) Write down the Lagrangian with two mulitpliers, one for each con-
straint, and solve the farmer’s problem. [Hint: The solution will
be a function of 1 and o.] Please use j as the second multiplier.
(c) Which has a greater impact on the area the farmer can enclose, a
marginal increase in o or a marginal increase in 1? Justify your
CHAPTER
5
Inequality constraints
The previous chapter treated all constraints as equality constraints. Some-
times this is the right thing to do. For example, the …rm’s cost-minimization
problem is to …nd the least-cost combination of inputs to produce a …xed
amount of output, ¡. The constraint, then, is that output must be ¡, or,
letting 1(r
1
. .... r
a
) be the production function when the inputs are r
1
. .... r
a
,
the constraint is
1(r
1
. .... r
a
) = ¡.
Other times equality constraints are not the right thing to do. The con-
sumer choice problem, for example, has the consumer choosing a commodity
bundle to maximize utility, subject to the constraint that she does not spend
more than her income. If the prices are j
1
. .... j
a
, the goods are r
1
. .... r
a
,
and income is `, then the budget constraint is
j
1
r
1
+... +j
a
r
a
_ `.
It may be the case that the consumer spends her entire income, in which
case the constraint would hold with equality. If she gets utility from saving,
52
CHAPTER 5. INEQUALITY CONSTRAINTS 53
though, she may not want to spend her entire income, in which case the
budget constraint would not hold with equality.
Firms have capacity constraints. When they build manufacturing fa-
cilities, the size of the facility places a constraint on the maximum output
the …rm can produce. A capacity-constrained …rm’s problem, then, is to
maximize pro…t subject to the constraint that output not exceed capacity,
or ¡ _ ¡. In the real world …rms often have excess capacity, which means
that the capacity constraint does not hold with equality.
Finally, economics often has implicit nonnegativity constraints. Firms
cannot produce negative amounts by transforming outputs back into inputs.
After all, it is di¢cult to turn a cake back into ‡our, baking powder, butter,
salt, sugar, and unbroken eggs. Often we want to assume that we can-
not consume negative amounts. As economists we must deal with these
nonnegativity constraints.
The goal for this chapter is to …gure out how to deal with inequality
constraints. The best way to do this is through a series of exceptionally
lame examples. What makes the examples lame is that the solutions are so
transparent that it is hardly worth going through the math. The beauty of
lame examples, though, is that this transparency allows you to see exactly
what is going on.
5.1 Lame example - capacity constraints
Let’s begin with a simple unconstrained pro…t maximization problem. The
…rm chooses an amount to produce r, the market price is …xed at 80, and
the cost function is 4r
2
. The problem is
max
a
80r ÷4r
2
.
The …rst-order condition is
80 ÷8r = 0,
so the optimum is
r = 10.
The second-order condition is
÷8 < 0
which obviously holds, so the optimum is actually a maximum. The problem
is illustrated in Figure 5.1.
CHAPTER 5. INEQUALITY CONSTRAINTS 54
12 8 10
\$
x
π(x)
Figure 5.1: A lame example using capacity constraints
5.1.1 A binding constraint
Now let’s put in a capacity constraint: r _ 8. This will obviously restrict
the …rm’s output because it would like to produce 10 units but can only
produce 8. (See why it’s a lame example?) The constraint will hold with
equality, in which case we say that the constraint is binding. Let’s look at
the math.
max
a
80r ÷4r
2
s.t. r _ 8
Form the Lagrangian
L(r. \) = 80r ÷4r
2
+\(8 ÷r).
The …rst-order conditions are
·L
·r
= 80 ÷8r ÷\ = 0
·L
·\
= 8 ÷r = 0.
The second equation tells us that r = 8, and the …rst equation tells us that
\ = 80 ÷64 = 16.
CHAPTER 5. INEQUALITY CONSTRAINTS 55
So far we’ve done nothing new. The important step here is to think
about \. Remember that the interpretation of the Lagrange multiplier is
that it is the marginal value of relaxing the constraint. In this case value
is pro…t, so it is the marginal pro…t from relaxing the constraint. We can
compute this directly:
:(r) = 80r ÷4r
2
:
t
(r) = 80 ÷8r
:
t
(8) = 80 ÷64 = 16
Plugging the constrained value (r = 8) into the marginal pro…t function :
t
(r)
tells us that when output is 8, an increase in output leads to an increase in
pro…t by 16. And this is exactly the Lagrange multiplier.
5.1.2 A nonbinding constraint
Now let’s change the capacity constraint to r _ 12 and solve the problem
intuitively. First, we know that pro…t reaches its unconstrained maximum
when r = 10. The constraint does not rule this level of production out,
so the constrained optimum is also r = 10. Because of this the capacity
constraint is nonbinding, that is, it does not hold with equality. Nonbinding
constraints are sometimes called slack.
Let’s think about the Lagrange multiplier. We know that it is the mar-
ginal value of relaxing the constraint. How would pro…t change if we relaxed
the constraint from r _ 12 to, say, r _ 13? The unconstrained maximum
is still feasible, so the …rm would still produce 10 units and still generate
exactly the same amount of pro…t. So, the marginal value of relaxing the
constraint must be zero, and we have \ = 0.
Now that we know the answers, let’s go back and look at the problem.
max
a
80r ÷4r
2
s.t. r _ 12
This problem generates the Lagrangian
L(r. \) = 80r ÷4r
2
+\(12 ÷r).
Since we already know the answers, let’s plug them in. In particular, we
know that \ = 0, so the Lagrangian becomes
L(r. \) = 80r ÷4r
2
CHAPTER 5. INEQUALITY CONSTRAINTS 56
which is just the original unconstrained pro…t function.
We arrived at our answers (r = 10, \ = 0) intuitively. How can we get
them mechanically? After all, the purpose of the math is to make sure we
get the answers right without relying solely on our intuition.
One thing for sure is that we will need a new approach. To see why,
suppose we analyze our 12-unit constraint problem in the usual way. Di¤er-
entiating the Lagrangian yields
·L
·r
= 80 ÷8r ÷\ = 0
·L
·\
= 12 ÷r = 0
The second equation obviously implies that r = 12, in which case the …rst
equation tells us that \ = 80 ÷ 8r = 80 ÷ 96 = ÷16. If we solve the
problem using our old approach we …nd that (1) the constraint is binding,
which is wrong, and (2) the Lagrange multiplier is negative, which means
that relaxing the constraint makes pro…t even lower. You can see this in
Figure 5.1. When output is 12 the pro…t function is downward-sloping.
Since the Lagrange multiplier is marginal pro…t, we get a negative Lagrange
multiplier when we are past the pro…t-maximizing level of output.
5.2 A new approach
The key to the new approach is thinking about how the Lagrangian works.
Suppose that the problem is
max
a
1
.....an
1(r
1
. .... r
a
)
s.t. o(r
1
. .... r
a
) _ `
The Lagrangian is
L(r
1
. .... r
a
. \) = 1(r
1
. .... r
a
) +\[` ÷o(r
1
. .... r
a
)]. (5.1)
When the constraint is binding, the term ` ÷o(r
1
. .... r
a
) is zero, in which
case L(r
1
. .... r
a
. \) = 1(r
1
. .... r
a
). When the constraint is nonbinding the
Lagrange multiplier is zero, in which case L(r
1
. .... r
a
. \) = 1(r
1
. .... r
a
) once
again. So we need a condition that says
\[` ÷o(r
1
. .... r
a
)] = 0.
CHAPTER 5. INEQUALITY CONSTRAINTS 57
Note that
·[\[` ÷o(r
1
. .... r
a
)]]
·\
= ` ÷o(r
1
. .... r
a
)
and in the old approach we set this equal to zero. We can no longer do this
when the constraint is nonbinding, but notice that
\
·[\[` ÷o(r
1
. .... r
a
)]]
·\
= \(` ÷o(r
1
. .... r
a
)).
This is exactly what we need to be equal to zero.
We also need to restrict the Lagrange multiplier to be nonnegative. Re-
member from the lame example when the capacity constraint was binding at
r = 12 we got a negative Lagrange multiplier, and that was the wrong answer.
In fact, looking at expression (5.1) we can make L really large by making both
\ and (` ÷o(r
1
. .... r
a
)) really negative. But when (` ÷o(r
1
. .... r
a
)) < 0
we have violated the constraint, so that is not allowed.
The condition that
\
·L
·\
= \[(` ÷o(r
1
. .... r
a
)] = 0
is known as a complementary slackness condition. It says that one
of two constraints must bind. One constraint is \ _ 0, and it binds if
\ = 0, in which case the complementary slackness condition holds. The
other constraint is o(r
1
. .... r
a
) _ `, and it binds if o(r
1
. .... r
a
) = ` ,
in which case the complementary slackness condition also holds. If one
of the constraints is slack, the other one has to bind. The beauty of the
complementary slackness condition is that it forces one of two constraints to
bind using a single equation.
The …rst-order conditions for the inequality-constrained problem are
·L
·r
1
= 0
.
.
.
·L
·r
a
= 0
\
·L
·\
= 0
\ _ 0
CHAPTER 5. INEQUALITY CONSTRAINTS 58
The …rst set of conditions (·L·r
j
= 0) are the same as in the standard
case. The last two conditions are the ones that are di¤erent. The second-
last condition (\·L·\ = 0) guarantees that either the constraint binds,
in which case ·L·\ = 0, or the constraint does not bind, in which case
\ = 0. The last condition says that the Lagrange multiplier cannot be
negative, which means that relaxing the constraint cannot reduce the value
of the objective function.
We have a set of …rst-order conditions, but this does not tell us how to
solve them. To do this, let’s go back to our lamest example:
max
a
80r ÷4r
2
s.t. r _ 12
which generates the Lagrangian
L(r. \) = 80r ÷4r
2
+\(12 ÷r).
The …rst-order conditions are
·L
·r
= 80 ÷8r ÷\ = 0
\
·L
·\
= \(12 ÷r) = 0
\ _ 0
Now what?
The methodology for solving this system is tedious, but it works. The
second equation (\(12 ÷r) = 0) is true if either (1) \ = 0, (2) 12 ÷r = 0, or
(3) both. So what we have to do is …nd the solution when \ = 0 and …nd
the solution when 12 ÷r = 0. Let’s see what happens.
Case 1: \ = 0. If \ = 0 then the second and third conditions obviously
hold. Plugging \ = 0 into the …rst equation yields 80 ÷ 8r ÷ 0 = 0, or
r = 10.
Case 2: 12 ÷ r = 0. Then r = 12, and plugging this into the …rst
equation yields \ = 80 ÷ 96 = ÷16, which violates the last condition. So
case 2 cannot be the answer.
We are left with only one solution, and it is the correct one: r = 10,
\ = 0.
The general methodology for multiple constraints is as follows: Construct
a Lagrange multiplier for each constraint. Each Lagrange multiplier can be
CHAPTER 5. INEQUALITY CONSTRAINTS 59
either zero, in which case that constraint is nonbinding, or it can be positive,
in which case its constraint is binding. Then try all possible combinations of
zero/positive multipliers. Most of them will lead to violations. If only one
does not lead to a violation, that is the answer. If several combinations do
not lead to violations, then you must choose which one is the best. You can
do this by plugging the values you …nd into the objective function. If you
want to maximize the objective function, you choose the case that generates
the highest value of the objective function.
5.3 Multiple inequality constraints
Let’s look at another lame example:
max
a.&
r
1
3
n
2
3
s.t. r +n _ 60
r +n _ 120
Why is this a lame example? Because we know that the second constraint
must be nonbinding. After all if a number is smaller than 60, it must also
be strictly smaller than 120. The solution to this problem will be the same
as the solution to the problem
max
a.&
r
1
3
n
2
3
s.t. r +n _ 60
This looks like a utility maximization problem, as can be seen in Figure 5.2.
The consumer’s utility function is n(r. n) = r
1/3
n
2/3
, the prices of the two
goods are j
a
= j
&
= 1, and the consumer has 60 to spend. The utility
function is Cobb-Douglas, and from what we learned in Section 4.5 we know
that r = 20 and n = 40.
We want to solve it mechanically, though, to learn the steps. Assign a
separate Lagrange multiplier to each constraint to get the Lagrangian
L(r. n. \
1
. \
2
) = r
1
3
n
2
3
+\
1
(60 ÷r ÷n) +\
2
(120 ÷r ÷n).
CHAPTER 5. INEQUALITY CONSTRAINTS 60
40
20
60
60 120
120
E
y
x
Figure 5.2: A lame example with two budget constraints
The …rst-order conditions are
·L
·r
=
1
3
n
2/3
r
2/3
÷\
1
÷\
2
= 0
·L
·n
=
2
3
r
1/3
n
1/3
÷\
1
÷\
2
= 0
\
1
·L
·\
1
= \
1
(60 ÷r ÷n) = 0
\
2
·L
·\
2
= \
2
(120 ÷r ÷n) = 0
\
1
_ 0
\
2
_ 0
This time we have four possible cases: (1) \
1
= \
2
= 0, so that neither
constraint binds. (2) \
1
0, \
2
= 0, so that only the …rst constraint binds.
(3) \
1
= 0, \
2
0, so that only the second constraint binds. (4) \
1
0,
\
2
0, so that both constraints bind. As before, we need to look at all four
cases.
Case 1: \
1
= \
2
= 0. In this case the …rst equation in the …rst-order
conditions reduces to
1
3
(nr)
2/3
= 0, which implies that n = 0. The second
equation reduces to
2
3
(rn)
1/3
= 0, but this cannot be true if n = 0 because
we are not allowed to divide by zero. So Case 1 cannot be the answer. There
CHAPTER 5. INEQUALITY CONSTRAINTS 61
is an easier way to see this, though. If neither constraint binds, the problem
becomes
max
a.&
r
1
3
n
2
3
The objective function is increasing in both arguments, and since there are
no constraints we want both r and n to be as large as possible. So r ÷·
and n ÷·. But this obviously violates the constraints.
Case 2: \
1
0, \
2
= 0. The …rst-order conditions reduce to
1
3
n
2/3
r
2/3
÷\
1
= 0
2
3
r
1/3
n
1/3
÷\
1
= 0
60 ÷r ÷n = 0
and the solution to this system is r = 20, n = 40, and \
1
=
1
3
2
2/3
0. The
remaining constraint, r + n _ 120, is satis…ed because r + n = 60 < 120.
Case 2 works, and it corresponds to the case shown in Figure 5.2.
Case 3: \
1
= 0, \
2
0. Now the …rst-order conditions reduce to
1
3
n
2/3
r
2/3
÷\
2
= 0
2
3
r
1/3
n
1/3
÷\
2
= 0
120 ÷r ÷n = 0
The solution to this system is r = 40, n = 80, and \
2
=
1
3
2
2/3
0. The
remaining constraint, r +n _ 60, is violated because r +n = 120, so Case 3
does not work. There is an easier way to see this, though, just by looking
at the constraints and exploiting the lameness of the example. If the second
constraint binds, r + n = 120. But then the …rst constraint, r + n _ 60
cannot possibly hold.
Case 4: \
1
0, \
2
0. The …rst of these conditions implies that the
…rst constraint binds, so that r+n = 60. The second condition implies that
r + n = 120, so that we are on the outer budget constraint in Figure 5.2.
But then the inner budget constraint is violated, so Case 4 does not work.
Only Case 2 works, so we know the solution: r
+
= 20, n
+
= 40, \
+
1
=
1
3
2
2/3
, and \
+
2
= 0.
CHAPTER 5. INEQUALITY CONSTRAINTS 62
5.4 A linear programming example
A linear programming problem is one in which the objective function and all
of the constraints are linear, such as in the following example:
max
a.&
2r +n
s.t. 3r + 4n _ 60
r _ 0
n _ 0
This problem has three constraints, so we must use the multiple constraint
methodology from the preceding section. It is useful in what follows to refer
to the …rst constraint, 3r + 4n _ 60, as a budget constraint. The other two
constraints are nonnegativity constraints.
The Lagrangian is
L(r. n. \
1
. \
2
. \
3
) = 2r +n +\
1
(60 ÷3r ÷4n) +\
2
(r ÷0) +\
3
(n ÷0).
Notice that we wrote all of the constraint terms, that is (60 ÷3r ÷4n) and
(r ÷0) and (n ÷0) so that they are nonnegative. We have been doing this
throughout this book.
The …rst-order conditions are
·L
·r
= 2 ÷3\
1
+\
2
= 0
·L
·n
= 1 ÷4\
1
+\
3
= 0
\
1
·L
·\
1
= \
1
(60 ÷3r ÷4n) = 0
\
2
·L
·\
2
= \
2
r = 0
\
3
·L
·\
3
= \
3
n = 0
\
1
_ 0, \
2
_ 0, \
3
_ 0
Since there are three constraints, there are 2
3
= 8 possible cases. We are
going to narrow some of them down intuitively before going on. The …rst
CHAPTER 5. INEQUALITY CONSTRAINTS 63
constraint is like a budget constraint, and the objective function is increas-
ing in both of its arguments. The other two constraints are nonnegativity
constraints, saying that the consumer cannot consume negative amounts of
the goods. Since there is only one budget-type constraint, it has to bind,
which means that \
1
0. The only question is whether one of the other
two constraints binds.
A binding budget constraint means that we cannot have both r = 0 and
n = 0, because if we did then we would have 3r + 4n = 0 < 60, and the
budget constraint would not bind. We are now left with three possibilities:
(1) \
1
0 so the budget constraint binds, \
2
0, and \
3
= 0, so that r = 0
but n 0. (2) \
1
0, \
2
= 0, and \
3
0, so that r 0 and n = 0. (3)
\
1
0, \
2
= 0, and \
3
= 0, so that both r and n are positive. We will
consider these one at a time.
Case 1: \
1
0, \
2
0, \
3
= 0. Since \
2
0 the constraint r _ 0 must
bind, so r = 0. For the budget constraint to hold we must have n = 15.
This yields a value for the objective function of
2r +n = 2 0 + 15 = 15.
Case 2: \
1
0, \
2
= 0, \
3
0. This time we have n = 0, and the
budget constraint implies that r = 20. The objective function then takes
the value
2r +n = 2 20 + 0 = 40.
Case 3: \
1
0, \
2
= 0, \
3
= 0. In this case both r and n are positive.
The …rst two equations in the …rst-order conditions become
2 ÷3\
1
= 0
1 ÷4\
1
= 0
The …rst of these reduces to \
1
= 23, and the second reduces to \
1
= 14.
These cannot both hold, so Case 3 does not work.
We got solutions in Case 1 and Case 2, but not in Case 3. So which is
the answer? The one that generates a larger value for the objective function.
In Case 1 the maximized objective function took a value of 15 and in Case 2
it took a value of 40, which is obviously higher. So Case 2 is the solution.
To see what we just did, look at Figure 5.3. The three constraints identify
a triangular feasible set. Case 1 is the corner solution where the budget line
meets the n-axis, and Case 2 is the corner solution where the budget line
CHAPTER 5. INEQUALITY CONSTRAINTS 64
20
15
y
x
indifference
curve
Figure 5.3: Linear programming problem
meets the r-axis. Case 3 is an "interior" solution that is on the budget line
but not on either axis. The objective was to …nd the point in the feasible set
that maximized the function 1(r. n) = 2r+n. We did this by comparing the
values we got at the two corner solutions, and we chose the corner solution
that gave us the larger value.
The methodology for solving linear programming problems involves …nd-
ing all of the corners and choosing the corner that yields the largest value of
the objective function. A typical problem has more than two dimensions,
so it involves …nding r
1
. .... r
a
, and it has more than one budget constraint.
This generates lots and lots of corners, and the real issue in the study of linear
programming is …nding an algorithm that e¢ciently checks the corners.
5.5 Kuhn-Tucker conditions
Economists sometimes structure the …rst-order conditions for inequality-
constrained optimization problems di¤erently than the way we have done
it so far. The alternative formulation was developed by Harold Kuhn and
A.W. Tucker, and it is known as the Kuhn-Tucker formulation. The …rst-
order conditions we will derive are known as the Kuhn-Tucker conditions.
Begin with the very general maximization problem, letting x be the vector
CHAPTER 5. INEQUALITY CONSTRAINTS 65
x = (r
1
. .... r
a
):
max
x
1(x)
s.t. o
1
(x) _ /
1
. .... o
I
(x) _ /
I
r
1
_ 0. .... r
a
_ 0.
There are / "budget-type" constraints and : non-negativity constraints.
To solve this, form the Lagrangian
L(x. \
1
. .... \
I
. ·
1
. .... ·
I
) = 1(x) +
I
¸
j=1
\
j
[/
j
÷o
j
(x)] +
a
¸
;=1
·
;
r
;
.
We get the following …rst-order conditions:
·L
·r
1
=
·1
·r
1
÷\
1
·o
1
·r
1
÷... ÷\
I
·o
I
·r
1

1
= 0
.
.
.
·L
·r
a
=
·1
·r
a
÷\
1
·o
1
·r
a
÷... ÷\
I
·o
I
·r
a

a
= 0
\
1
·L
·\
1
= \
1
[/
1
÷o
1
(x)] = 0
.
.
.
\
I
·L
·\
I
= \
I
[/
I
÷o
I
(x)] = 0
·
1
·L
··
1
= ·
1
r
1
= 0
.
.
.
·
a
·L
··
a
= ·
a
r
a
= 0
\
1
. .... \
I
. ·
1
. .... ·
a
_ 0
There are 2: +/ conditions plus the : +/ nonnegativity constraints for the
CHAPTER 5. INEQUALITY CONSTRAINTS 66
multipliers. It is useful to have some shorthand to shrink this system down:
·L
·r
j
= 0 for i = 1. .... : (5.2)
\
;
·L
·\
;
= 0 for , = 1. .... /
·
j
r
j
= 0 for i = 1. .... :
\
;
_ 0 for , = 1. .... /
·
j
_ 0 for i = 1. .... :
1(x. \
1
. .... \
I
) = 1(x) +
I
¸
j=1
\
j
[/
j
÷o
j
(x)].
This Lagrangian, known as the Kuhn-Tucker Lagrangian, only has / multipli-
ers for the / budget-type constraints, and no multipliers for the nonnegativity
constraints. The two Lagrangians are related, with
L(x. \
1
. .... \
I
. ·
1
. .... ·
I
) = 1(x. \
1
. .... \
I
) +
a
¸
;=1
·
;
r
;
.
We can rewrite the system of …rst-order conditions (5.2) as
·L
·r
j
=
·1
·r
j

j
= 0 for i = 1. .... :
\
;
·L
·\
;
= \
;
·1
·\
;
= 0 for , = 1. .... /
·
j
r
j
= 0 for i = 1. .... :
\
;
_ 0 for , = 1. .... /
·
j
_ 0 for i = 1. .... :
Pay close attention to the …rst and third equations. If ·
j
= 0 then the …rst
equation yields
·
j
= 0 ==
·1
·r
j
= 0 ==r
j
·1
·r
j
= 0.
On the other hand, if ·
j
0 then the i-th inequality constraint, r
j
_ 0, is
binding which means that
·
j
0 ==r
j
= 0 ==r
j
·1
·r
j
= 0.
CHAPTER 5. INEQUALITY CONSTRAINTS 67
Either way we have
r
j
·1
·r
j
= 0 for i = 1. .... :.
The Kuhn-Tucker conditions use this information. The new set of
…rst-order conditions is
r
j
·1
·r
j
= 0 for i = 1. .... : (5.3)
\
;
·1
·\
;
= 0 for , = 1. .... /
\
;
_ 0 for , = 1. .... /
r
j
_ 0 for i = 1. .... :
This is a system of : + / equations in : + / unknowns plus : + / non-
negativity constraints. Thus, it simpli…es the original set of conditions
by removing : equations and : unknowns. It is also a very symmetric-
looking set of conditions. Remember that the Kuhn-Tucker Lagrangian is
1(r
1
. .... r
a
. \
1
. .... \
I
). Instead of distinguishing between r’s and \’s, let
them all be .’s, in which case the Kuhn-Tucker Lagrangian is 1(.
1
. .... .
a+I
).
Then the Kuhn-Tucker conditions reduce to .
;
·1·.
;
= 0 and .
;
_ 0 for
, = 1. .... :+/. This is fairly easy to remember, which is an advantage. The
key to Kuhn-Tucker conditions, though, is remembering that they are just a
particular reformulation of the standard inequality-constrained optimization
problem with multiple constraints.
5.6 Problems
1. Consider the following problem:
max
a.&
r
2
n
s.t. 2r + 3n _ 24
4r +n _ 20
The Lagrangian can be written
L(r. n. \
1
. \
2
) = r
2
n +\
1
(24 ÷2r ÷3n) +\
2
(20 ÷4r ÷n)
CHAPTER 5. INEQUALITY CONSTRAINTS 68
(a) Solve the alternative problem
max
a.&
r
2
n
s.t. 2r + 3n = 24
Do the resulting values of r and n satisfy 4r +n _ 20?
(b) Solve the alternative problem
max
a.&
r
2
n
s.t. 4r +n = 20
Do the resulting values of r and n satisfy 2r + 3n _ 24?
(c) Based on your answers to (a) and (b), which of the two constraints
bind? What do these imply about the values of \
1
and \
2
?
(d) Solve the original problem.
(e) Draw a graph showing what is going on in this problem.
2. Consider the following problem:
max
a.&
r
2
n
s.t. 2r + 3n _ 24
4r +n _ 36
The Lagrangian can be written
L(r. n. \
1
. \
2
) = r
2
n +\
1
(24 ÷2r ÷3n) +\
2
(36 ÷4r ÷n)
(a) Solve the alternative problem
max
a.&
r
2
n
s.t. 2r + 3n = 24
Do the resulting values of r and n satisfy 4r +n _ 36?
(b) Solve the alternative problem
max
a.&
r
2
n
s.t. 4r +n = 36
Do the resulting values of r and n satisfy 2r + 3n _ 24?
CHAPTER 5. INEQUALITY CONSTRAINTS 69
(c) Based on your answers to (a) and (b), which of the two constraints
bind? What do these imply about the values of \
1
and \
2
?
(d) Solve the original problem.
(e) Draw a graph showing what is going on in this problem.
3. Consider the following problem:
max
a.&
4rn ÷3r
2
s.t. r + 4n _ 36
5r + 2n _ 45
The Lagrangian can be written
L(r. n. \
1
. \
2
) = 4rn ÷3r
2
+\
1
(36 ÷r ÷4n) +\
2
(45 ÷5r ÷2n)
(a) Solve the alternative problem
max
a.&
4rn ÷3r
2
s.t. r + 4n = 36
Do the resulting values of r and n satisfy 5r + 2n _ 45?
(b) Solve the alternative problem
max
a.&
4rn ÷3r
2
s.t. 5r + 2n = 45
Do the resulting values of r and n satisfy r + 4n _ 36?
(c) Find the solution to the original problem, including the values of
\
1
and \
2
.
4. Consider the following problem:
max
a.&
3rn ÷8r
s.t. r + 4n _ 24
5r + 2n _ 30
The Lagrangian can be written
L(r. n. \
1
. \
2
) = 3rn ÷8r +\
1
(24 ÷r ÷4n) +\
2
(30 ÷5r ÷2n)
CHAPTER 5. INEQUALITY CONSTRAINTS 70
(a) Solve the alternative problem
max
a.&
3rn ÷8r
s.t. r + 4n = 24
Do the resulting values of r and n satisfy 5r + 2n _ 30?
(b) Solve the alternative problem
max
a.&
3rn ÷8r
s.t. 5r + 2n = 30
Do the resulting values of r and n satisfy r + 4n _ 24?
(c) Find the solution to the original problem, including the values of
\
1
and \
2
.
5. Consider the following problem:
max
a.&
r
2
n
s.t. 4r + 2n _ 42
r _ 0
n _ 0
(a) Write down the Kuhn-Tucker Lagrangian for this problem.
(b) Write down the Kuhn-Tucker conditions.
(c) Solve the problem.
6. Consider the following problem:
max
a.&
rn + 40r + 60n
s.t. r +n _ 12
r. n _ 0
(a) Write down the Kuhn-Tucker Lagrangian for this problem.
(b) Write down the Kuhn-Tucker conditions.
(c) Solve the problem.

PART II

SOLVING SYSTEMS OF
EQUATIONS

(linear algebra)

CHAPTER
6
Matrices
Matrices are 2-dimensional arrays of numbers, and they are useful for many
things. They also behave di¤erently that ordinary real numbers. This
chapter tells how to work with matrices and what they are for.
6.1 Matrix algebra
A matrix is a rectangular array of numbers, such as the one below:
¹ =

¸
6 ÷5
2 3
÷1 4
¸

.
Matrices are typically denoted by capital letters. They have dimensions
corresponding to the number of rows and number of columns. The matrix ¹
above has 3 rows and 2 columns, so it is a 3 2 matrix. Matrix dimensions
are always written as (# rows) (# columns).
An element of a matrix is one of the entries. The element in row i and
72
CHAPTER 6. MATRICES 73
column , is denoted c
j;
, and so in general a matrix looks like
¹ =

¸
¸
¸
¸
c
11
c
12
c
1I
c
21
c
22
c
2I
.
.
.
.
.
.
.
.
.
.
.
.
c
a1
c
a2
c
aI
¸

.
The matrix ¹ above is an : / matrix. A matrix in which the number
of rows equals the number of columns is called a square matrix. In such a
matrix, elements of the form c
jj
are called diagonal elements because they
land on the diagonal of the square matrix.
An :-dimensional vector can be thought of as an :1 matrix. Therefore,
in matrix notation vectors are written vertically:
r =

¸
¸
r
1
.
.
.
r
a
¸

.
When we write a vector as a column matrix we typically leave o¤ the accent
and write it simply as r.
Matrix addition is done element by element:

¸
¸
c
11
c
1I
.
.
.
.
.
.
.
.
.
c
a1
c
aI
¸

+

¸
¸
/
11
/
1I
.
.
.
.
.
.
.
.
.
/
a1
/
aI
¸

=

¸
¸
c
11
+/
11
c
1I
+/
1I
.
.
.
.
.
.
.
.
.
c
a1
+/
a1
c
aI
+/
aI
¸

.
Before one can add matrices, though, it is important to make sure that the
dimensions of the two matrices are identical. In the above example, both
matrices are : /.
Just like with vectors, it is possible to multiply a matrix by a scalar. This
is done element by element:
t¹ = t

¸
¸
c
11
c
1I
.
.
.
.
.
.
.
.
.
c
a1
c
aI
¸

=

¸
¸
tc
11
tc
1I
.
.
.
.
.
.
.
.
.
tc
a1
tc
aI
¸

.
The big deal in matrix algebra is matrix multiplication. To multiply
matrices ¹ and 1, several things are important. First, the order matters, as
CHAPTER 6. MATRICES 74
you will see. Second, the number of columns in the …rst matrix must equal
the number of rows in the second. So, one must multiply an : / matrix
on the left by a / : matrix on the right. The result will be an : :
matrix, with the /’s canceling out. The formula for multiplying matrices is
as follows. Let ( = ¹1, with ¹ an : / matrix and 1 a / : matrix.
Then
c
j;
=
I
¸
c=1
c
jc
/
c;
.
This is easier to see when we write the matrices ¹ and 1 side-by-side:
( = ¹1 =

¸
¸
¸
¸
c
11
c
12
c
1I
c
21
c
22
c
2I
.
.
.
.
.
.
.
.
.
.
.
.
c
a1
c
a2
c
aI
¸

¸
¸
¸
¸
/
11
/
12
/
1n
/
21
/
22
/
2n
.
.
.
.
.
.
.
.
.
.
.
.
/
I1
/
I2
/
In
¸

.
Element c
11
is
c
11
= c
11
/
11
+c
12
/
21
+... +c
1c
/
c1
+... +c
1I
/
I1
.
So, element c
11
is found by multiplying each member of row 1 in matrix ¹
by the corresponding member of column 1 in matrix 1 and then summing.
Element c
j;
is found by multiplying each member of row i in matrix ¹ by
the corresponding member of column , in matrix 1 and then summing. For
there to be the right number of elements for this to work, the number of
columns in ¹ must equal the number of rows in 1.
As an example, multiply the two matrices below:
¹ =

6 ÷1
4 3

, 1 =

3 4
÷2 4

.
Then
¹1 =

6 3 + (÷1)(÷2) 6 4 + (÷1)4
4 3 + 3(÷2) 4 4 + 3 4

=

20 20
6 28

.
However,
1¹ =

3 6 + 4 4 3(÷1) + 4 3
(÷2)6 + 4 4 (÷2)(÷1) + 4 3

=

34 9
4 14

.
CHAPTER 6. MATRICES 75
Obviously, ¹1 = 1¹, and matrix multiplication is not commutative. Be-
cause of this, we use the terminology that we left-multiply by 1 when we
want 1¹ and right-multiply by 1 when we want ¹1.
The square matrix
1 =

¸
¸
¸
¸
1 0 0
0 1 0
.
.
.
.
.
.
.
.
.
.
.
.
0 0 1
¸

is special, and is called the identity matrix. To see why it is special,
consider any :/ matrix ¹, and let 1 be the :-dimensional identity matrix.
Letting 1 = 1¹, we get /
j;
= 0 c
1;
+ 0 c
2;
+... + 1 c
j;
+... + 0 c
a;
= c
j;
.
So, 1¹ = ¹. The same thing happens when we right-multiply ¹ by a /-
dimensional identity matrix. Then ¹1 = ¹. So, multiplying a matrix by
the identity matrix is the same as multiplying an ordinary number by 1.
The transpose of the matrix
¹ =

¸
¸
¸
¸
c
11
c
12
c
1I
c
21
c
22
c
2I
.
.
.
.
.
.
.
.
.
.
.
.
c
a1
c
a2
c
aI
¸

.
is the matrix ¹
T
given by
¹
T
=

¸
¸
¸
¸
c
11
c
21
c
a1
c
12
c
22
c
a2
.
.
.
.
.
.
.
.
.
.
.
.
c
1I
c
2I
c
aI
¸

.
The transpose is generated by switching the rows and columns of the original
matrix. Because of this, the transpose of an : / matrix is a / : matrix.
Note that
(¹1)
T
= 1
T
¹
T
so that the transpose of the product of two matrices is the product of the
transposes of the two matrices, but you have to switch the order of the ma-
trices. To check this, consider the following example employing the same
two matrices we used above.
¹ =

6 ÷1
4 3

, 1 =

3 4
÷2 4

.
CHAPTER 6. MATRICES 76
¹1 =

20 20
6 28

, (¹1)
T
=

20 6
20 28

.
¹
T
=

6 4
÷1 3

, 1
T
=

3 ÷2
4 4

.
1
T
¹
T
=

3 ÷2
4 4

6 4
÷1 3

=

20 6
20 28

= (¹1)
T
.
as desired, but
¹
T
1
T
=

6 4
÷1 3

3 ÷2
4 4

=

34 4
9 14

= (1¹)
T
.
6.2 Uses of matrices
Suppose that you have a system of : equations in : unknowns, such as this
one:
c
11
r
1
+... +c
1a
r
a
= /
1
c
21
r
1
+... +c
2a
r
a
= /
2
.
.
.
c
a1
r
1
+... +c
aa
r
a
= /
a
(6.1)
We can write this system easily using matrix notation. Letting
¹ =

¸
¸
c
11
c
1a
.
.
.
.
.
.
.
.
.
c
a1
c
aa
¸

, r =

¸
¸
r
1
.
.
.
r
a
¸

, and / =

¸
¸
/
1
.
.
.
/
a
¸

,
we can rewrite the system of equations (6.1) as
¹r = /. (6.2)
The primary use of matrices is to solve systems of equations. As you have
seen in the optimization examples, systems of equations regularly arise in
economics.
Equation (6.2) raises two questions. First, when does a solution exist,
that is, when can we …nd a vector r such that ¹r = /? Second, how do we
…nd the solution when it exists? The answers to both questions depend on
the inverse matrix ¹
÷1
, which is a matrix having the property that
¹
÷1
¹ = ¹¹
÷1
= 1,
CHAPTER 6. MATRICES 77
that is, the matrix that you multiply ¹ by to get the identity matrix. Re-
membering that the identity matrix plays the role of the number 1 in matrix
multiplication, and that for ordinary numbers the inverse of n is the num-
ber n
÷1
such that n
÷1
n = 1, this formula is exactly what we are looking
for. If ¹ has an inverse (and that is a big if), then (6.2) can be solved by
left-multiplying both sides of the equation by the inverse matrix ¹
÷1
:
¹
÷1
¹r = ¹
÷1
/
r = ¹
÷1
/
For ordinary real numbers, every number except 0 has a multiplicative
inverse. Many more matrices than just one fail to have an inverse, though,
so we must devote considerable attention to whether or not a matrix has an
inverse.
A second use of matrices is for deriving second-order conditions for multi-
variable optimization problems. Recall that in a single-variable optimization
problem with objective function 1, the second-order condition is determined
by the sign of the second derivative 1
tt
. When 1 is a function of several
variables, however, we can write the vector of …rst partial derivatives

¸
¸
0;
0a
1
.
.
.
0;
0an
¸

and the matrix of second partials

¸
¸
¸
¸
¸
0
2
;
0a
2
1
0
2
;
0a
1
0a
2

0
2
;
0a
1
0an
0
2
;
0a
2
0a
1
0
2
;
0a
2
2

0
2
;
0a
2
0an
.
.
.
.
.
.
.
.
.
.
.
.
0
2
;
0an0a
1
0
2
;
0an0a
2

0
2
;
0a
2
n
¸

.
The relevant second-order conditions will come from conditions on the matrix
of second partials, and we will do this in Chapter 9.
6.3 Determinants
Determinants of matrices are useful for determining (hence the name) whether
a matrix has an inverse and also for solving equations such as (6.2). The de-
CHAPTER 6. MATRICES 78
terminant of a square matrix ¹ (and the matrix must be square) is denoted
[¹[. De…ning it depends on the size of the matrix.
11
). The determinant [¹[ is simply c
11
.
It can be either positive or negative, so don’t confuse the determinant with
the absolute value, even though they both use the same symbol.
Now look at a 2 2 matrix
¹ =

c
11
c
12
c
21
c
22

.
Here the determinant is de…ned as
[¹[ =

c
11
c
12
c
21
c
22

= c
11
c
22
÷c
21
c
12
.
For a 3 3 matrix we go through some more steps. Begin with the 3 3
matrix
¹ =

¸
c
11
c
12
c
13
c
21
c
22
c
23
c
31
c
32
c
33
¸

.
We can get a submatrix of ¹ by deleting a row and column. For example, if
we delete the second row and the …rst column we are left with the submatrix
¹
21
=

c
12
c
13
c
32
c
33

.
In general, submatrix ¹
j;
is obtained from ¹ by deleting row i and column
,. Note that there is one submatrix for each element, and you can get that
submatrix by eliminating the element’s row and column from the original
matrix. Every element also has something called a cofactor which is based
on the element’s submatrix. Speci…cally, the cofactor of c
j;
is the number
c
j;
given by
c
j;
= (÷1)
j+;

j;
[,
that is, it is the determinant of the submatrix ¹
j;
multiplied by ÷1 if i + ,
is odd and multiplied by 1 if i +, is even.
Using these de…nitions we can …nally get the determinant of a 33 matrix,
or any other square matrix for that matter. There are two ways to do it.
The most common is to choose a column ,. Then
[¹[ = c
1;
c
1;
+c
2;
c
2;
+... +c
a;
c
a;
.
CHAPTER 6. MATRICES 79
Before we see what this means for a 3 3 matrix let’s check that it works
for a 2 2 matrix. Choosing column , = 1 gives us

c
11
c
12
c
21
c
22

= c
11
c
11
+c
12
c
12
= c
11
c
22
+c
12
(÷c
21
).
where c
11
= c
22
because 1 + 1 is even and c
21
= ÷c
12
because 1 + 2 is odd.
We get exactly the same thing if we choose the second column, , = 2:

c
11
c
12
c
21
c
22

= c
12
c
12
+c
22
c
22
= c
12
(÷c
21
) +c
22
c
11
.
Finally, let’s look at a 3 3 matrix, choosing , = 1.

c
11
c
12
c
13
c
21
c
22
c
23
c
31
c
32
c
33

= c
11
c
11
+c
21
c
21
+c
31
c
31
= c
11
(c
22
c
33
÷c
32
c
23
)
÷c
21
(c
12
c
33
÷c
32
c
13
)
+c
31
(c
12
c
23
÷c
22
c
13
).
We can also …nd determinants by choosing a row. If we choose row i,
then the determinant is given by
[¹[ = c
j1
c
j1
+c
j2
c
j2
+... +c
ja
c
ja
.
The freedomto choose any rowor column allows one to use zeros strategically.
For example, when evaluating the determinant

6 8 ÷1
2 0 0
÷9 4 7

it would be best to choose the second row because it has two zeros, and the
determinant is simply c
21
c
21
= 2(÷60) = ÷120.
6.4 Cramer’s rule
The process of using determinants to solve the system of equations given by
¹r = / is known as Cramer’s rule. Begin with the matrix
¹ =

¸
¸
c
11
c
1a
.
.
.
.
.
.
.
.
.
c
a1
c
aa
¸

.
CHAPTER 6. MATRICES 80
Construct the matrix 1
j
from ¹ by replacing the i-th column of ¹ with the
column vector /, so that
1
1
=

¸
¸
¸
¸
/
1
c
12
c
1a
/
2
c
22
c
2a
.
.
.
.
.
.
.
.
.
.
.
.
/
a
c
a2
c
aa
¸

and
1
j
=

¸
¸
¸
¸
c
11
c
1(j÷1)
/
1
c
1(j+1)
c
1a
c
21
c
2(j÷1)
/
2
c
2(j+1)
c
2a
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
c
a1
c
a(j÷1)
/
a
c
a(j+1)
c
aa
¸

.
According to Cramer’s rule, the solution to ¹r = / is the column vector r
where
r
j
=
[1
j
[
[¹[
.
Let’s make sure this works using a simple example. The system of
equations is
4r
1
+ 3r
2
= 18
5r
1
÷3r
2
= 9
Adding the two equations together yields
9r
1
= 27
r
1
= 3
r
2
= 2
Now let’s do it using Cramer’s rule. We have
¹ =

4 3
5 ÷3

and / =

18
9

.
Generate the matrices
1
1
=

18 3
9 ÷3

and 1
2
=

4 18
5 9

.
CHAPTER 6. MATRICES 81
Now compute determinants to get
[¹[ = ÷27, [1
1
[ = ÷81, and [1
2
[ = ÷54.
Applying Cramer’s rule we get
r =

[1
1
[
[¹[
[1
2
[
[¹[

=

÷81
÷27
÷54
÷27

=

3
2

.
6.5 Inverses of matrices
One important implication of Cramer’s rule links the determinant of ¹ to
the existence of an inverse. To see why, recall that the solution, if it exists,
to the system ¹r = / is r = ¹
÷1
/. Also, we know from Cramer’s rule that
r
j
= [1
j
[ [¹[. For this number to exist, it must be the case that [¹[ = 0.
This is su¢ciently important that it has a name: the matrix ¹ is singular
if [¹[ = 0 and it is nonsingular if [¹[ = 0. Singular matrices do not have
inverses, but nonsingular matrices do.
With some clever manipulation we can use Cramer’s rule to invert the
matrix ¹. To see how, recall that the inverse is de…ned so that
¹¹
÷1
= 1.
We want to …nd ¹
÷1
, and it helps to de…ne
¹
÷1
=

¸
¸
r
11
r
1a
.
.
.
.
.
.
.
.
.
r
a1
r
aa
¸

and
r
j
=

¸
¸
r
1j
.
.
.
r
aj
¸

.
Remember that the coordinate vector c
j
has zeros everywhere except for the
i-th element, which is 1, and so
c
1
=

¸
¸
¸
¸
1
0
.
.
.
0
¸

CHAPTER 6. MATRICES 82
and c
j
is the column vector with c
j
j
= 1 and c
j
;
= 0 when , = i. Note that c
j
is the i-th column of the identity matrix. Then i-th column of the inverse
matrix ¹
÷1
can be found by applying Cramer’s rule to the system
¹r
j
= c
j
. (6.3)
Construct the matrix 1
j
;
from ¹ by replacing the ,-th column of ¹ with the
column vector c
j
. Then
1
j
;
=

¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
c
11
c
1(;÷1)
0 c
1(;+1)
c
1a
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
c
(j÷1)1
c
(j÷1)(;÷1)
0 c
(j÷1)(;+1)
c
(j÷1)a
c
j1
c
j(;÷1)
1 c
j(;+1)
c
ja
c
(j+1)1
c
(j+1)(;÷1)
0 c
(j+1)(;+1)
c
(j+1)a
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
c
a1
c
a(;÷1)
0 c
a(;+1)
c
aa
¸

The solution to (6.3) is
r
;j
=

1
j
;

[¹[
.
Once again we can only get an inverse if [¹[ = 0.
We can simplify this further. Note that only one element of the ,-th
column of 1
j
;
is non-zero. So, we can take the determinant of 1
j
;
by using
the ,-th column and getting

1
j
;

= (÷1)
j+;

j;
[ = c
j;
,
which is a cofactor of the matrix ¹. So, we get the formula for the inverse:
r
;j
=
(÷1)
j+;

j;
[
[¹[
.
Note that the subscript on r is ,i but the subscript on ¹ is i,.
Let’s check to see if this works for a 2 2 matrix. Use the matrix
¹ =

c /
c d

.
We get
1
1
1
=

1 /
0 d

, 1
2
1
=

0 /
1 d

, 1
1
2
=

c 1
c 0

, and 1
2
2
=

c 0
c 1

.
CHAPTER 6. MATRICES 83
The determinants are

1
1
1

= d,

1
2
1

= ÷/,

1
1
2

= ÷c, and

1
2
2

= c.
We also know that [¹[ = cd ÷/c. Thus,
¹
÷1
=
1
cd ÷/c

d ÷/
÷c c

.
We can check this easily:
¹
÷1
¹ =
1
cd ÷/c

d ÷/
÷c c

c /
c d

=
1
cd ÷/c

cd ÷/c ÷/c +/c
÷cc +cc ÷c/ +cd

=

1 0
0 1

.
6.6 Problems
1. Perform the following computations:
(a) 4

6 ÷4 2
3 3 9

÷5

1 0 6
2 3 ÷5

(b)

2 1
÷1 4

3 ÷2 ÷1
4 4 1

(c)

¸
2 1 0 0
3 2 1 0
1 1 0 7
¸

¸
¸
¸
2 2 1
3 ÷3 0
7 2 ÷1
÷4 ÷5 0
¸

(d)

2 1 1

¸
3 1 ÷1
2 1 1
1 2 2
¸

¸
5
÷1
÷1
¸

2. Perform the following computations:
(a) 6

¸
3 ÷3
÷4 5
1 2
¸

÷
1
2

¸
8 10
÷6 ÷4
÷6 2
¸

CHAPTER 6. MATRICES 84
(b)

¸
5 1 3
4 0 6
÷1 2 ÷1
¸

¸
0 ÷4
1 5
2 5
¸

(c)

6 ÷1 3 0
1 5 2 1

¸
¸
¸
5 ÷1 4
0 ÷2 4
1 ÷3 4
0 6 4
¸

(d)

10 5 1

¸
2 0 1
1 3 1
5 0 2
¸

¸
5
÷1
÷3
¸

3. Find the determinants of the following matrices:
(a)

2 1
÷4 5

(b)

¸
3 1 0
÷2 7 ÷2
2 0 6
¸

4. Find the determinants of the following matrices:
(a)

3 6
÷4 ÷1

(b)

¸
2 0 ÷1
1 3 0
0 6 ÷1
¸

5. Use Cramer’s rule to solve the following system of equations:
6r ÷2n ÷3. = 1
2r + 4n +. = ÷2
3r ÷. = 8
6. Use Cramer’s rule to solve the following system of equations:
5r ÷2n +. = 9
3r ÷n = 9
3n + 2. = 15
CHAPTER 6. MATRICES 85
7. Invert the following matrices:
(a)

2 3
÷2 1

(b)

¸
1 1 2
0 1 1
1 1 0
¸

8. Invert the following matrices:
(a)

÷4 1
2 ÷4

(b)

¸
5 1 ÷1
0 2 1
0 1 3
¸

CHAPTER
7
Systems of equations
Think about the general system of equations
¹r = / (7.1)
where ¹ is an : : matrix and r and / are : 1 vectors. This expands to
the system of equations
c
11
r
1
+c
12
r
2
+... +c
1a
r
a
= /
1
c
21
r
1
+c
22
r
2
+... +c
2a
r
a
= /
2
.
.
.
c
a1
r
1
+c
a2
r
2
+... +c
aa
r
a
= /
a
The task for this chapter is to determine (1) whether the systemhas a solution
(r
1
. .... r
a
), and (2) whether that solution is unique.
We will use three examples to motivate our results. They use : = 2 to
allow graphical analysis.
86
CHAPTER 7. SYSTEMS OF EQUATIONS 87
Example 3
2r +n = 6
r ÷n = ÷3
The solution to this one is (r. n) = (1. 4).
Example 4
r ÷2n = 1
4n ÷2r = ÷2
This one has an in…nite number of solutions de…ned by (r. n) = (r.
a÷1
2
).
Example 5
r ÷n = 4
2n ÷2r = 5
This one has no solution.
7.1 Identifying the number of solutions
7.1.1 The inverse approach
If ¹ has an inverse ¹
÷1
, then left-multiplying both sides of (7.1) by ¹
÷1
yields r = ¹
÷1
/. So, what we really want to know is, when does an inverse
exist? We already know that an inverse exists if the determinant is nonzero,
that is, if [¹[ = 0.
7.1.2 Row-echelon decomposition
Write the augmented matrix
1 = (¹[/) =

¸
¸
c
11
c
1a
.
.
.
.
.
.
.
.
.
c
a1
c
aa

/
1
.
.
.
/
a
¸

which is :(:+1). The goal is to transformthe matrix 1 through operations
that consist of multiplying one row by a scalar and adding it to another row,
and ending up with a matrix in which all the elements below the diagonal
are 0. This is the row-echelon form of the matrix.
CHAPTER 7. SYSTEMS OF EQUATIONS 88
Example 6 (3 continued) Form the augmented matrix
1 =

2 1
1 ÷1

6
÷3

Multiply the …rst row by ÷
1
2
and add it to the second row to get
1 =

2 1
1 ÷1 ÷1 ÷
1
2

6
÷3 ÷3

=

2 1
0 ÷
3
2

6
÷6

Example 7 (4 continued) Form the augmented matrix
1 =

1 ÷2
÷2 4

1
÷2

Multiply the …rst row by 2 and add it to the second row to get
1 =

1 ÷2
÷2 + 2 4 ÷4

1
÷2 + 2

=

1 ÷2
0 0

1
0

Example 8 (5 continued) Form the augmented matrix
1 =

1 ÷1
÷2 2

4
5

Multiply the …rst row by 2 and add it to the second row to get
1 =

1 ÷1
÷2 + 2 2 ÷2

4
5 + 8

=

1 ÷1
0 0

4
13

Example 3 has a unique solution, Example 4 has an in…nite number of
them, and Example 5 has no solution. These results correspond to properties
of the row-echelon matrix 1. If the row-echelon form of the augmented
matrix has only nonzero diagonal elements, there is a unique solution. If it
has some rows that are zero, there are in…nitely many solutions. If there are
rows with zeros everywhere except in the last column, there is no solution.
De…nition 1 The rank of a matrix is the number of nonzero rows in its
row-echelon form.
Proposition 9 The : : square matrix ¹ has an inverse if and only if its
rank is :.
CHAPTER 7. SYSTEMS OF EQUATIONS 89
x – y = 4
2y – 2x = 5
x – 2y = 1
4y – 2x = –2
2x + y = 6
x – y = –3
y
x
y
x
y
x
Figure 7.1: Graphing in (r. n) space: One solution when the lines inter-
sect (left graph), in…nite number of solutions when the lines coincide (center
graph), and no solutions when the lines are parallel (right graph)
7.1.3 Graphing in (x,y) space
This is pretty simple and is shown in Figure 7.1. The equations are lines. In
example 3 the equations constitute two di¤erent lines that cross at a single
point. In example 4 the lines coincide. In example 5 the lines are parallel.
We get a unique solution in example 3, an in…nite number of them in example
4, and no solution in example 5.
What happens if we move to three dimensions? An equation reduces the
dimension by one, so each equation identi…es a plane. Two planes intersect
in a line. The intersection of a line and a plane is a point. So, we get a
unique solution if the planes intersect in a single point, an in…nite number of
solutions if the three planes intersect in a line or in a plane, and no solution
if two or more of the planes are parallel.
7.1.4 Graphing in column space
This approach is completely di¤erent but really useful. Each column of ¹ is
an : 1 vector. So is /. The question becomes, is / in the column space,
that is, the space spanned by the columns of ¹?
A linear combination of vectors c and

/ is given by r c + n

/, where r
and n are scalars.
The span of the vectors r and n is the set of linear combinations of the
CHAPTER 7. SYSTEMS OF EQUATIONS 90
b = (4,5)
(–1,2)
(1,–2)
(–2,4)
b = (1,–2)
b = (6,–3)
(1,-1)
(2,1)
y
x
y
x
y
x
Figure 7.2: Graphing in the column space: When the vector / is in the
column space (left graph and center graph) a solution exists, but when the
vector / is not in the column space there is no solution (right graph)
two vectors.
Example 9 (3 continued) The column vectors are (2. 1) and (1. ÷1). These
span the entire plane. For any vector (.
1
. .
2
), it is possible to …nd numbers
r and n that solve r(2. 1) + n(1. ÷1) = (.
1
. .
2
). Written in matrix notation
this is

2 1
1 ÷1

r
n

=

.
1
.
2

.
We already know that the matrix has an inverse, so we can solve this.
Look at left graph in Figure 7.2. The column space is spanned by the
two vectors (2. 1) and (1. ÷1). The vector / = (6. ÷3) lies in the span of the
column vectors. This leads to our rule for a solution: If / is in the span
of the columns of ¹, there is a solution.
Example 10 (4 continued) In the center graph in Figure 7.2, the column
vectors are (1. ÷2) and (÷2. 4). They are on the same line, so they only
span that line. In this case the vector / = (1. ÷2) is also on that line, so
there is a solution.
Example 11 (5 continued) In the right panel of Figure 7.2, the column
vectors are (1. ÷2) and (÷1. 2), which span a single line. This time, though,
the vector / = (4. 5) is not on that line, and there is no solution.
CHAPTER 7. SYSTEMS OF EQUATIONS 91
Two vectors c and

/ are linearly dependent if there exists a scalar : = 0
such that c = :

/.
Two vectors c and

/ are linearly independent if there is no scalar : = 0
such that c = :

/. Equivalently, c and

/ are linearly independent if there do
not exist scalars :
1
and :
2
, not both equal to zero, such that
:
1
c +:
2

/ =

0. (7.2)
One more way of writing this is that c and

/ are linearly independent if the
only solution to the above equation has :
1
= :
2
= 0.
This last one has some impact when we write it in matrix form. Suppose
that the two vectors are 2-dimensional, and construct the matrix ¹ by using
the vectors c and

/ as its columns:
¹ =

c
1
/
1
c
2
/
2

.
Now we can write (7.2) as
¹

:
1
:
2

=

0
0

.
If ¹ has an inverse, there is a unique solution to this equation, given by

:
1
:
2

= ¹
÷1

0
0

=

0
0

.
So, invertibility of ¹ and the linear independence of its columns are inextri-
Proposition 10 The square matrix ¹ has an inverse if its columns are mu-
tually linearly independent.
7.2 Summary of results
The system of : linear equations in : unknowns given by
¹r = /
has a unique solution if:
CHAPTER 7. SYSTEMS OF EQUATIONS 92
1. ¹
÷1
exists.
2. [¹[ = 0.
3. The row-echelon form of ¹ has no rows with all zeros.
4. ¹ has rank :.
5. The columns of ¹ span :-dimensional space.
6. The columns of ¹ are mutually linearly independent.
The system of : linear equations in : unknowns given by ¹r = / has an
in…nite number of solutions if:
1. The row-echelon form of the augmented matrix (¹[/) has rows with all
zeros.
2. The vector / is contained in the span of a subset of the columns of ¹.
The system of : linear equations in : unknowns given by ¹r = / has no
solution if:
1. The row-echelon form of the augmented matrix (¹[/) has at least one
row with all zeros except in the last column.
2. The vector / is not contained in the span of the columns of ¹.
7.3 Problems
1. Determine whether or not the following systems of equations have a
unique solution, an in…nite number of solutions, or no solution.
(a)
3r + 6n = 4
2r ÷5. = 8
r ÷n ÷. = ÷10
CHAPTER 7. SYSTEMS OF EQUATIONS 93
(b)
4r ÷n + 8. = 160
17r ÷8n + 10. = 200
÷3r + 2n + 2. = 40
(c)
2r ÷3n = 6
3r + 5. = 15
2r + 6n + 10. = 18
(d)
4r ÷n + 8. = 30
3r + 2. = 20
5r +n ÷2. = 40
(e)
6r ÷n ÷. = 3
5r + 2n ÷2. = 10
n ÷2. = 4
2. Find the values of c for which the following matrices do not have an
inverse.
(a)

6 ÷1
2 c

(b)

¸
5 c 0
4 2 1
÷1 3 1
¸

(c)

5 3
÷3 c

CHAPTER 7. SYSTEMS OF EQUATIONS 94
(d)

¸
÷1 3 1
0 5 c
6 2 1
¸

CHAPTER
8
Using linear algebra in economics
8.1 IS-LM analysis
Consider the following model of a closed macroeconomy:
) = ( +1 +G
( = c((1 ÷t)) )
1 = i(1)
` = 1 :(). 1)
with
0 < c
t
<
1
1 ÷t
i
t
< 0
:
Y
0. :
1
< 0
Here ) is GDP, which you can think of as production, income, or spending.
The variables (, 1, and G are the spending components of GDP, with (
95
CHAPTER 8. USING LINEAR ALGEBRA IN ECONOMICS 96
standing for consumption, 1 for investment (which is spending by businesses
on new capital and by consumers on new housing, not what you do in the
stock market), and G for government spending. The …rst equation says
that total spending ) is equal to the sum of the components of spending,
( + 1 + G. There are no imports or exports, so this is a closed-economy
model.
The amount of consumption depends on consumers’ after-tax income, and
when ) is income and t is the tax rate, after-tax (or disposable) income is
(1 ÷ t)) . So the second equation says that consumption is a function c()
of disposable income, and c
t
0 means that it is an increasing function.
Investment is typically spending on large items, and it is often …nanced
through borrowing. Because of this, investment depends on the interest rate
1, and when the interest rate increases borrowing becomes more expensive
and the amount of investment falls. Consequently the investment function
i(1) is decreasing.
` is money supply, and the right-hand side of the fourth equation is
money demand. 1 is the price level, and when things become more expensive
it takes more money to purchase the same amount of stu¤. When income )
increases people want to buy more stu¤, and they need more money to do it
with, so money demand increases when income increases. Also, since money
is cash and checking account balances, which tend not to earn interest, and so
when interest rates rise people tend to move some of their assets into interest-
bearing accounts. This means that money demand falls when interest rates
rise.
The four equations provide a model of the economy, known as the IS-LM
model. The …rst three equations describe the IS curve from macro courses
and the fourth equation describes the LM curve. The model is useful for
describing a closed economy in the short run, that is, before the price level
At this point you should be wondering why we care. The answer is that
we want to do some comparative statics analysis. The variables G, t, and `
are exogenous policy variables. 1 is predetermined and assumed constant for
the problem. Everything else is endogenous, so everything else is a function
of G, t, and `. We are primarily interested in the variables ) and 1, and
we want to see how they change when the policy variables change.
Let’s look for the comparative statics derivatives d)dG and d1dG.
To …nd them, …rst simplify the system of four equations to a system of two
CHAPTER 8. USING LINEAR ALGEBRA IN ECONOMICS 97
equations:
) = c((1 ÷t)) ) +i(1) +G
` = 1 :(). 1)
Implicitly di¤erentiate with respect to G to get
d)
dG
= (1 ÷t)c
t
d)
dG
+i
t
d1
dG
+ 1
0 = 1 :
Y
d)
dG
+1 :
1
d1
dG
Rearrange as
d)
dG
÷(1 ÷t)c
t
d)
dG
÷i
t
d1
dG
= 1
:
Y
d)
dG
+:
1
d1
dG
= 0
We can write this in matrix form

1 ÷(1 ÷t)c
t
÷i
t
:
Y
:
1

oY
oG
o1
oG

=

1
0

Now use Cramer’s rule to solve for d)dG and d1dG:
d)
dG
=

1 ÷i
t
0 :
1

1 ÷(1 ÷t)c
t
÷i
t
:
Y
:
1

=
:
1
[1 ÷(1 ÷t)c
t
]:
1
+:
Y
i
t
.
Both the numerator and denominator are negative, so d)dG 0. GDP
rises when government spending increases. Now for interest rates:
d1
dG
=

1 ÷(1 ÷t)c
t
1
:
Y
0

1 ÷(1 ÷t)c
t
÷i
t
:
Y
:
1

=
÷:
Y
[1 ÷(1 ÷t)c
t
]:
1
+:
Y
i
t
.
The numerator is negative and so is the denominator. Thus, d1dG 0.
An increase in government spending increases both GDP and interest rates
in the short run.
It is also possible to …nd the comparative statics derivatives d)dt, d1dt,
d)d`, and d1d`. You should …gure them out yourselves.
CHAPTER 8. USING LINEAR ALGEBRA IN ECONOMICS 98
8.2 Econometrics
We want to look at the estimation equation
n
(a1)
= A
(aI)

(I1)
+ c.
(a1)
(8.1)
The matrix n contains the data on our dependent variable, and the matrix A
contains the data on the independent, or explanatory, variables. Each row
is an observation, and each column is an explanatory variable. From the
equation we see that there are : observations and / explanatory variables.
The matrix is a vector of / coe¢cients, one for each of the / explanatory
variables. The estimates will not be perfect, and so the matrix c contains
error terms, one for each of the : observations. The fundamental problem in
econometrics is to use data to estimate the coe¢cients in in order to make
the errors c small. The notion of "small," and the one that is consistent
with the standard practice in econometrics, is to make the sum of the squared
errors as small as possible.
8.2.1 Least squares analysis
We want to minimize the sum of the squared errors. Rewrite (8.1) as
c = n ÷A
and note that
c
T
c =
a
¸
j=1
c
2
j
.
Then
c
T
c = (n ÷A)
T
(n ÷A)
= n
T
n ÷
T
A
T
n ÷n
T
A +
T
A
T
A
We want to minimize this expression with respect to the parameter vector .
But notice that there is a and a
T
in the expression. Let’s treat these as
two separate variables to get two FOCs:
÷A
T
n +A
T
A = 0
÷n
T
A +
T
A
T
A = 0
CHAPTER 8. USING LINEAR ALGEBRA IN ECONOMICS 99
These are two copies of the same equation, because the …rst is the transpose
of the second. So let’s use the …rst one because it has instead of
T
.
Solving for yields
A
T
A = A
T
n
^
= (A
T
A)
÷1
A
T
n
We call
^
the OLS estimator. Note that it is determined entirely by the data,
that is, by the independent variable matrix A and the dependent variable
vector n.
8.2.2 A lame example
Consider a regression with two observations and one independent variable,
with the data given in the table below.
Observation Dependent Independent
number variable n variable A
1 4 3
2 8 4
There is no constant. Our two observations lead to the two equations
4 = 3 +c
1
8 = 4 +c
2
We want to …nd the value of that minimizes c
2
1
+c
2
2
.
Since c
1
= 4 ÷3 and c
2
= 8 ÷4, we get
c
2
1
+c
2
2
= (4 ÷3)
2
+ (8 ÷4)
2
= 16 ÷24 + 9
2
+ 64 ÷64 + 16
2
= 80 ÷88 + 25
2
.
Minimize this with respect to . The FOC is
÷88 + 50 = 0
=
88
50
=
44
25
.
CHAPTER 8. USING LINEAR ALGEBRA IN ECONOMICS 100
e

y = (4,8)
X =(3,4)
x
2
x
1
Figure 8.1: Graphing the lame example in column space
Now let’s do it with matrices. The two equations can be written

4
8

=

3
4

() +

c
1
c
2

.
The OLS estimator is
^
= (A
T
A)
÷1
A
T
n
=

3 4

3
4

÷1

3 4

4
8

= (25)
÷1
(44) =
44
25
.
8.2.3 Graphing in column space
We want to graph the previous example in column space. The example is
lame for precisely this reason – so I can graph it in two dimensions.
The key here is to think about what we are doing when we …nd the value
of to minimize c
2
1
+ c
2
2
. A is a point, shown by the vector A = (3. 4)
in Figure 8.1, and A is the equation of the line through that point. n is
CHAPTER 8. USING LINEAR ALGEBRA IN ECONOMICS 101
another point, shown by the vector n = (4. 8) in the …gure. c is the vector
connecting some point A
^
on the line A to the point n. c
2
1
+ c
2
2
is the
square of the length of c, so we want to minimize the length of c, and we
do that by …nding the point on the line A that is closest to the point n.
Graphically, the closest point is the one that causes the vector c to be at a
right angle to the vector A.
Two vectors c and

/ are orthogonal if c

/ = 0. This means we can
…nd our coe¢cients by having the vector c be orthogonal to the vector A.
Remembering that c = n ÷A, we get
A (n ÷A
^
) = 0.
Now notice that if we write the two vectors c and

/ as column matrices ¹
and 1, we have
c

/ = ¹
T
1.
Thus we can rewrite the above expression as
A
T
(n ÷A
^
) = 0
A
T
n ÷A
T
A
^
= 0
A
T
A
^
= A
T
n
^
= (A
T
A)
÷1
A
T
n.
which is exactly the answer we got before.
8.2.4 Interpreting some matrices
We have the estimated parameters given by
^
= (A
T
A)
÷1
A
T
n.
This tells us that the predicted values of n are
^ n = A
^
= A(A
T
A)
÷1
A
T
n.
The matrix
A(A
T
A)
÷1
A
T
is a projection matrix, and it projects the vector n onto the column space
of A.
CHAPTER 8. USING LINEAR ALGEBRA IN ECONOMICS 102
The residuals vector can be written
c = n ÷A
^

= (1 ÷A(A
T
A)
÷1
A
T
)n.
The two matrices A(A
T
A)
÷1
A
T
and (1 ÷A(A
T
A)
÷1
A
T
) have the spe-
cial property that they are idempotent, that is, they satisfy the property
that
¹¹ = ¹.
Geometrically it is clear why this happens. Suppose we apply the projection
matrix A(A
T
A)
÷1
A
T
to the vector n. That projects n onto the column
space of A, so that A(A
T
A)
÷1
A
T
n lies in the column space of A. If
we apply the same projection matrix a second time, it doesn’t do anything
because n is already in the column space of A. Similarly, the matrix (1 ÷
A(A
T
A)
÷1
A
T
) projects n onto the space that is orthogonal to the column
space of A. Doing it a second time does nothing, because it is projecting
into the same space a second time.
8.3 Stability of dynamic systems
In macroeconomics and time series econometrics a common theme is the
stability of the economy. In this section I show how stability works and
relate stability to matrices.
8.3.1 Stability with a single variable
A dynamic system takes the form of
n
t+1
= cn
t
.
The variable t denotes the period number, so period t + 1 is the period that
immediately follows period t. The variable we are interested in is n, and, in
particular, we would like to know how n varies over time. The initial period
is period 0, and the initial value of n
t
is n
0
, which for the sake of argument
we will assume is positive. The process is not very interesting if c = 0,
because then n
1
= n
2
= n
3
= ... = 0, and it’s also not very interesting if
c = 1, because then n
1
= n
2
= ... = n
0
. We want some movement in n
t
, so
let’s assume that c = 0 and c = 1.
CHAPTER 8. USING LINEAR ALGEBRA IN ECONOMICS 103
The single-variable system is pretty straightforward (some might even say
lame), and it follows the following process:
n
1
= cn
0
n
2
= cn
1
= c
2
n
0
.
.
.
n
t
= c
t
n
0
.
.
.
The process n
0
. n
1
. ... is if it eventually converges to a …nite value, or, in
mathematical terms, if there exists a value n such that
lim
t÷o
n
t
= n.
If the process is not stable then it explodes, diverging to either +· or ÷·.
Whether or not the process is stable is determined by the magnitude of
the parameter c. To see how, look at
lim
t÷o
n
t
= lim
t÷o
c
t
n
0
= n
0
lim
t÷o
c
t
.
If c 1 then c
t
÷ ·, and the process cannot be stationary unless n
0
just
happens to be zero, which is unlikely. Similarly, if c < ÷1 the process also
diverges, this time cycling between positive and negative values depending
on whether t is positive or negative, respectively (because n
0
is positive).
On the other hand, if 0 < c < 1, the value c
t
< 1 and thus n
t
= c
t
n
0
< n
0
.
What’s more, as t ÷·, the quantity c
t
÷0, and therefore n
t
÷0. Thus,
the process is stable when c ÷ [0. 1). It also turns out to be stable when
÷1 < c < 0. The reasoning is the same. When c ÷ (÷1. 0) we have
c
t
÷ (÷1. 1) and lim
t÷o
c
t
= 0.
This reasoning gives us two stability conditions:
[c[ < 1
lim
t÷o
n
t
= 0.
CHAPTER 8. USING LINEAR ALGEBRA IN ECONOMICS 104
8.3.2 Stability with two variables
All of that was pretty simple. Let’s look at a dynamic system with two
variables:
n
t+1
= cn
t
+/.
t
.
t+1
= cn
t
+d.
t
Now both variables depend on the past values of both variables. When is
this system stable?
We can write the system in matrix form:

n
t+1
.
t+1

=

c /
c d

n
t
.
t

. (8.2)
or, in shorthand notation,
n
t+1
= ¹ n
t
.
But this gives us a really complicated system. Sure, we know that

n
t
.
t

=

c /
c d

t

n
0
.
0

.
but the matrix

c /
c d

t
is really complicated. For example,

c /
c d

3
=

c
3
+ 2/cc +/cd / (c
2
+cd +d
2
+/c)
c (c
2
+cd +d
2
+/c) d
3
+ 2/cd +c/c

and

c /
c d

10
is too big to …t on the page.
Things would be easy if the matrix was diagonal, that is, if / = c = 0.
Then we would have two separate single-variable dynamic processes
n
t+1
= cn
t
.
t+1
= d.
t
CHAPTER 8. USING LINEAR ALGEBRA IN ECONOMICS 105
and we already know the stability conditions for these. But the matrix is
not diagonal. So let’s mess with things to get a system with a diagonal
matrix. It will take a while, so just remember the goal when we get there:
we want to generate a system that looks like
r
t+1
=

\
1
0
0 \
2

r
t
(8.3)
because then we can treat the stability of the elements of the vector r sepa-
rately.
8.3.3 Eigenvalues and eigenvectors
Begin by remembering that 1 is the identity matrix. An eigenvalue of the
square matrix ¹ is a value \ such that
det(¹ ÷\1) =

c ÷\ /
c d ÷\

= 0.
Taking the determinant yields

c ÷\ /
c d ÷\

= cd ÷c\ ÷d\ +\
2
÷/c.
So, the eigenvalues are the solutions to the quadratic equation
\
2
÷(c +d)\ + (cd ÷/c) = 0.
In general quadratic equations have two solutions, call them \
1
and \
2
.
For example, suppose that the matrix ¹ is
¹ =

3 2
6 ÷1

.
The eigenvalues satisfy the equation
\
2
÷(3 ÷1)\ + (÷3 ÷12) = 0
\
2
÷2\ ÷15 = 0
\ = 5. ÷3.
CHAPTER 8. USING LINEAR ALGEBRA IN ECONOMICS 106
These are the eigenvalues of ¹. Look at the two matrices we get from the
formula ¹ ÷\1:

3 ÷5 2
6 ÷1 ÷5

=

÷2 2
6 ÷6

3 ÷(÷3) 2
6 ÷1 ÷(÷3)

=

6 2
6 2

and both of these matrices are singular, which is what you get when you
make the determinant equal to zero.
An eigenvector of ¹ is a vector · such that
(¹ ÷\1)· = 0
where \ is an eigenvalue of ¹. For our example, the two eigenvectors are
the solutions to

÷2 2
6 ÷6

·
11
·
21

=

0
0

and

6 2
6 2

·
12
·
22

=

0
0

.
where I made the second subscript denote the number of the eigenvector
and the …rst subscript denote the element of that eigenvector. These two
equations have simple solutions. The …rst equation holds when

·
11
·
21

=

1
1

because

÷2 2
6 ÷6

1
1

=

0
0

.
and the second equation holds when

·
12
·
22

=

1
÷3

because

6 2
6 2

1
÷3

=

0
0

.
CHAPTER 8. USING LINEAR ALGEBRA IN ECONOMICS 107
The relationship between eigenvalues and eigenvectors is useful for our
task. Recall that if \ is an eigenvalue and · is an eigenvector then
(¹ ÷\1)· = 0
¹· ÷\1· = 0
¹· = \·.
In particular, we have
¹

·
11
·
21

= \
1

·
11
·
21

and
¹

·
12
·
22

= \
2

·
12
·
22

.
Construct the matrix \ so that the two eigenvectors are columns:
\ =

·
11
·
12
·
21
·
22

.
Then we have
¹\ =

¹

·
11
·
21

¹

·
12
·
22

=

\
1

·
11
·
21

\
2

·
12
·
22

=

·
11
·
12
·
21
·
22

\
1
0
0 \
2

= \

\
1
0
0 \
2

.
We are almost there. If \ has an inverse, \
÷1
, we can left-multiply both
sides by \
÷1
to get
\
÷1
¹\ =

\
1
0
0 \
2

. (8.4)
This is the diagonal matrix we were looking for to make our dynamic system
easy.
CHAPTER 8. USING LINEAR ALGEBRA IN ECONOMICS 108
8.3.4 Back to the dynamic system
Go back to the dynamic system
n
t+1
= ¹ n
t
.
Create a di¤erent vector r according to
r = \
÷1
n,
where \ is the matrix of eigenvectors we constructed above. This implies
that
n = \ r. (8.5)
Then
r
t+1
= \
÷1
n
t+1
.
Since n
t+1
= ¹ n
t
, we get
r
t+1
= \
÷1
(¹ n
t
)
=

\
÷1
¹

n
t
= (\
÷1
¹) (\ r
t
)
=

\
÷1
¹\

r
t
.
We …gured out the formula for \
÷1
¹\ in equation (8.4), which gives us
r
t+1
=

\
1
0
0 \
2

r
t
.
But this is exactly the diagonal system we wanted in equation (8.3). So we
Let’s look back at what we have. We began with a matrix ¹. We
found its two eigenvalues \
1
and \
2
, and we found the two corresponding
eigenvectors and combined them to form the matrix \ . All of this comes
from the matrix ¹, so we haven’t added anything that wasn’t in the problem.
But our original problem was about the vector n
t
, and our new problem is
t
.
Using the intuition we gained from the section with a single-variable dy-
namic system, we say that the process n
0
. n
1
. ... is stable if
lim
t÷o
n
t
= 0.
CHAPTER 8. USING LINEAR ALGEBRA IN ECONOMICS 109
The dynamic process n
t+1
= ¹ n
t
was hard to work with, and so it was di¢cult
to determine whether or not it was stable. But the dynamic process

n
t+1
r
t+1

=

\
1
0
0 \
2

n
t
r
t

is easy to work with, because multiplying it out yields the two simple, single-
variable equations
n
t+1
= \
1
n
t
r
t+1
= \
2
r
t
.
And we know the stability conditions for single-variable equations. The …rst
one is stable if [\
1
[ < 1, and the second one is stable if [\
2
[ < 1. So, if these
two equations hold, we have
lim
t÷o
r
t
= lim
t÷o

n
t
r
t

=

0
0

.
Finally, remember that we constructed r
t
according to (see equation (8.5))
n
t
= \ r
t
.
and so
lim
t÷o
n
t
= lim
t÷o
\ r
t
= \ lim
t÷o
r
t
= 0
and the original system is also stable.
That was a lot of steps, but it all boils down to something simple, and it
works for more than two dimensions. The dynamic system
n
t+1
= ¹ n
t
is stable if all of the eigenvalues of ¹ have magnitude smaller than one.
8.4 Problems
1. Consider the following IS-LM model:
) = ( +1 +G
( = c((1 ÷t)) )
1 = i(1)
` = 1 :(). 1)
CHAPTER 8. USING LINEAR ALGEBRA IN ECONOMICS 110
with
c
t
0
i
t
< 0
:
Y
0. :
1
< 0
The variables G, t, and ` are exogenous policy variables. 1 is prede-
termined and assumed constant for the problem.
(a) Assume that (1 ÷ t)c
t
< 1, so that a \$1 increase in GDP leads to
less than a dollar increase in spending. Compute and interpret
d)dt and d1dt.
(b) Compute and interpret d)d` and d1d`.
(c) Compute and interpret d)d1 and d1d1.
2. Consider a di¤erent IS-LMmodel, this time for an open economy, where
A is net exports and 1 is total tax revenue (as opposed to t which was
the marginal tax rate).
) = ( +1 +G+A
( = c() ÷1)
1 = i(1)
A = r(). 1)
` = 1 :(). 1)
with
c
t
0
i
t
< 0
r
Y
< 0. r
1
< 0
:
Y
0. :
1
< 0
The variables G, 1, and ` are exogenous policy variables. 1 is
predetermined and assumed constant for the problem.
(a) For this problem assume that c
t
+r
Y
< 1, so that a \$1 increase in
GDP leads to less than a dollar increase in spending. Compute
and interpret d)dG and d1dG.
CHAPTER 8. USING LINEAR ALGEBRA IN ECONOMICS 111
(b) Compute and interpret d)d1 and d1d1.
(c) Compute and interpret d)d` and d1d`.
3. The following is a model of the long-run economy:
) = ( +1 +G+A
( = c((1 ÷t)) )
1 = i(1)
A = r(). 1)
` = 1 :(). 1)
) =

)
with
c
t
0
i
t
< 0
r
Y
< 0. r
1
< 0
:
Y
0. :
1
< 0
The variables G, t, and ` are exogenous policy variables, and

) is
also exogenous but not a policy variable. It is interpreted as poten-
tial GDP, or full-employment GDP. The variables ). (. 1. A. 1 are all
endogenous.
(a) Compute and …nd the signs of d)dG, d1dG, and d1dG.
(b) Compute and …nd the signs of d)d`, d1d`, and d1d`.
4. Consider the following system of equations:
¡
1
= 1(j. 1)
¡
S
= o(j. n)
¡
1
= ¡
S
The …rst equation says that the quantity demanded in the market de-
pends on the price of the good j and household income 1. Consistent
with this being a normal good, we have 1
j
< 0 and 1
1
0. The sec-
ond equation says that the quantity supplied in the market depends on
CHAPTER 8. USING LINEAR ALGEBRA IN ECONOMICS 112
the price of the good and the wage rate of the work force, n. We have
o
j
0 and o
&
< 0. The third equation says that markets must clear,
so that quantity demanded equals quantity supplied. Three variables
are endogenous: ¡
1
, ¡
S
, and j. Two variables are exogenous: 1 and
n.
(a) Show that the market price increases when income increases.
(b) Show that the market price increases when the wage rate increases.
5. Find the coe¢cients for a regression based on the following data table:
Observation number r
1
r
2
n
1 1 9 6
2 1 4 2
3 1 3 5
6. Consider a regression based on the following data table:
Observation number r
1
r
2
n
1 2 8 1
2 6 24 0
3 -4 -16 -1
(a) Show that the matrix A
T
A is not invertible.
(b) Explain intuitively, using the idea of a column space, why there is
no unique coe¢cient vector for this regression.
7. Find the coe¢cients for a regression based on the following data table:
Observation number r
1
r
2
n
1 5 4 10
2 2 1 2
3 3 6 7
8. Suppose that you are faced with the following data table:
Observation number r
1
r
2
n
1 1 2 10
2 1 3 0
3 1 5 25
4 1 4 15
CHAPTER 8. USING LINEAR ALGEBRA IN ECONOMICS 113
3
, to
the regression. r
3
is given by
r
3
=

¸
¸
¸
24
36
60
48
¸

Explain why this would be a bad idea.
9. Find the eigenvalues and eigenvectors of the following matrices:
(a)

5 1
4 2

(b)

10 ÷1
12 3

(c) ( =

3 ÷6
÷2 4

(d) 1 =

3 4
0 2

(e) 1 =

10 4
÷3 ÷6

10. Consider the dynamic system
n
t+1
= ¹ n
t
.
(a) If ¹ =

13 0
0 ÷15

, is the dynamic system stable?
(b) If ¹ =

43 ÷14
13 14

, is the dynamic system stable?
(c) If ¹ =

14 1
0 ÷23

, is the dynamic system stable?
(d) If ¹ =

15 1
2 89

, is the dynamic system stable?
CHAPTER
9
Second-order conditions
9.1 Taylor approximations for R ÷R
Consider a di¤erentiable function 1 : R ÷ R. Since 1 is di¤erentiable at
point r
0
it has a linear approximation at that point. Note that this is a
linear approximation to the function, as opposed to a linear approximation
at the point. The linear approximation can be written
1(r) = 1(r
0
) +1
t
(r
0
)(r ÷r
0
).
When r = r
0
we get the original point, so 1(r
0
) = 1(r
0
). When r is not
equal to r
0
we multiply the di¤erence (r÷r
0
) by the slope at point r
0
, then
0
).
All of this is shown in Figure 9.1. The linear approximation 1(r) is
a straight line that is tangent to the function 1(r) at 1(r
0
). To get the
approximation at point r, take the horizontal distance (r÷r
0
) and multiply
it by the slope of the tangent line, which is just 1
t
(r
0
). This gives us the
quantity 1
t
(r
0
)(r÷r
0
), which must be added to 1(r
0
) to get the right linear
approximation.
114
CHAPTER 9. SECOND-ORDER CONDITIONS 115
f’(x
0
)(x – x
0
)
x – x
0
L(x)
x
0
f(x
0
)
L(x)
x
f(x)
x
f(x)
Figure 9.1: A …rst-order Taylor approximation
The Taylor approximation builds on this intuition. Suppose that 1 is
: times di¤erentiable. Then an :-th order approximation is
1(r) - 1(r
0
) +
1
t
(r
0
)
1!
(r ÷r
0
) +
1
tt
(r
0
)
2!
(r ÷r
0
)
2
+... +
1
(a)
(r
0
)
:!
(r ÷r
0
)
a
.
We care mostly about second-degree approximations, or
1(r) - 1(r
0
) +1
t
(r
0
)(r ÷r
0
) +
1
tt
(r
0
)
2
(r ÷r
0
)
2
.
The key to understanding the numbers in the denominators of the dif-
ferent terms is noticing that the …rst derivative of the linear approxima-
tion matches the …rst derivative of the function, the second derivative of
the second-order approximation equals the second derivative of the original
function, and so on. So, for example, we can write the third-degree approx-
imation of 1(r) at r = r
0
as
o(r) = 1(r
0
) +1
t
(r
0
)(r ÷r
0
) +
1
tt
(r
0
)
2
(r ÷r
0
)
2
+
1
ttt
(r
0
)
6
(r ÷r
0
)
3
.
CHAPTER 9. SECOND-ORDER CONDITIONS 116
Di¤erentiate with respect to r to get
o
t
(r) = 1
t
(r
0
) +1
tt
(r
0
)(r ÷r
0
) +
1
ttt
(r
0
)
2
(r ÷r
0
)
2
o
tt
(r) = 1
tt
(r
0
) +1
ttt
(r
0
)(r ÷r
0
)
o
ttt
(r) = 1
ttt
(r
0
)
9.2 Second order conditions for R ÷R
Suppose that 1(r) is maximized when r = r
0
. Take a second-order Taylor
approximation:
1(r) - 1(r
0
) +1
t
(r
0
)(r ÷r
0
) +
1
tt
(r
0
)
2
(r ÷r
0
)
2
.
Since 1 is maximized when r = r
0
, the …rst-order condition has 1
t
(r
0
) = 0.
Thus the second term in the Taylor approximation disappears. We are left
with
1(r) - 1(r
0
) +
1
tt
(r
0
)
2
(r ÷r
0
)
2
.
If 1 is maximized, it must mean that any departure from r
0
decrease in 1. In other words,
1(r
0
) +
1
tt
(r
0
)
2
(r ÷r
0
)
2
_ 1(r
0
)
for all r. Simplifying gives us
1
tt
(r
0
)
2
(r ÷r
0
)
2
_ 0
and, since (r ÷ r
0
)
2
_ 0, it must be the case that 1
tt
(r
0
) _ 0. This is how
we can get the second order condition from the Taylor approximation.
9.3 Taylor approximations for R
:
÷R
This time we are only going to look for a second-degree approximation. We
need some notation:
\1( r) =

¸
¸
1
1
( r)
.
.
.
1
n
( r)
¸

CHAPTER 9. SECOND-ORDER CONDITIONS 117
and is called the gradient of 1 at r. Also
H( r) =

¸
¸
1
11
( r) 1
1n
( r)
.
.
.
.
.
.
.
.
.
1
n1
( r) 1
nn
( r)
¸

is called the Hessian. It is the matrix of second derivatives.
A second-order Taylor approximation for a function of : variables can
be written
1( r) - 1( r
0
) +
n
¸
j=1
1
j
( r
0
)(r
j
÷r
0
j
) +
1
2
n
¸
j=1
n
¸
;=1
1
j;
( r
0
)(r
j
÷r
0
j
)(r
;
÷r
0
;
)
Let’s write this in matrix notation. The …rst term is simply 1( r
0
). The
second term is
( r ÷ r
0
) \1( r
0
) = ( r ÷ r
0
)
T
\1( r
0
).
The third term is
1
2
( r ÷ r
0
)
T
H( r
0
)( r ÷ r
0
).
Let’s check to make sure this last one works. First check the dimensions,
which are (1 :)(::)(:1) = (1 1), which is what we want. Then
break the problem down. ( r ÷ r
0
)
T
H( r
0
) is a (1 :) matrix with element
, given by
n
¸
j=1
1
j;
( r
0
)(r
j
÷r
0
j
).
To get ( r ÷ r
0
)
T
H( r
0
)( r ÷ r
0
) we multiply each element of ( r ÷ r
0
)
T
H( r
0
)
by the corresponding element of ( r ÷ r
0
) and sum, to get
n
¸
j=1
n
¸
;=1
1
j;
( r
0
)(r
j
÷r
0
j
)(r
;
÷r
0
;
).
So, the second degree Taylor approximation of the function 1 : R
n
÷R
is given by
1( r) - 1( r
0
) + ( r ÷ r
0
)
T
\1( r
0
) +
1
2
( r ÷ r
0
)
T
H( r
0
)( r ÷ r
0
)
CHAPTER 9. SECOND-ORDER CONDITIONS 118
9.4 Second order conditions for R
:
÷R
Suppose that 1 : R
n
÷ R is maximized when r = r
0
. Then the …rst-order
condition is
\1( r
0
) =

0
and the second term in the Taylor approximation drops out. For 1 to be
maximized when r = r
0
it must be the case that
1( r
0
) +
1
2
( r ÷ r
0
)
T
H( r
0
)( r ÷ r
0
) _ 1( r
0
)
or
( r ÷ r
0
)
T
H( r
0
)( r ÷ r
0
) _ 0.
9.5 Negative semide…nite matrices
The matrix ¹ is negative semide…nite if, for every column matrix r, we
have
r
T
¹r _ 0.
Obviously, the second order condition for a maximum is that H( r
0
) is
negative semide…nite. In the form it is written in, though, it is a di¢cult
thing to check.
Form a submatrix ¹
j
from the square matrix ¹ by keeping the square
matrix formed by the …rst i rows and …rst i columns. (Note that this is
di¤erent from the submatrix we used to …nd determinants in Section 6.3.)
The determinant of ¹
j
is called the i-th leading principal minor of ¹.
Theorem 11 Let ¹ be a symmetric : : matrix. Then ¹ is negative
semide…nite if and only if its : leading principle minors alternate in sign so
that
(÷1)
j

j
[ _ 0
for i = 1. .... :.
There are other corresponding notions:
« ¹ is negative de…nite if r
T
¹r < 0 for all nonzero vectors r, and this
occurs if and only if its : leading principle minors alternate in sign so
that
(÷1)
j

j
[ 0
for i = 1. .... :.
CHAPTER 9. SECOND-ORDER CONDITIONS 119
« ¹ is positive de…nite if r
T
¹r 0 for all nonzero vectors r, and this
occurs if and only if its : leading principle minors are positive, so that

j
[ 0
for i = 1. .... :.
« ¹ is positive semide…nite if r
T
¹r _ 0 for all vectors r, and this
occurs if and only if its : leading principle minors are nonnegative, so
that

j
[ _ 0
for i = 1. .... :.
« ¹ is inde…nite if none of the other conditions holds.
9.5.1 Application to second-order conditions
Suppose that the problem is to choose r
1
and r
2
to maximize 1(r
1
. r
2
). The
FOCs are
1
1
= 0
1
2
= 0
and the SOC is
H =

1
11
1
21
1
21
1
22

is negative semide…nite. Note that 1
21
appears in both o¤-diagonal elements,
which is okay because 1
12
= 1
21
. That is, it doesn’t matter if you di¤erentiate
1 …rst with respect to r
1
and then with respect to r
2
or the other way around.
The requirements for H to be negative semide…nite are
1
11
_ 0

1
11
1
12
1
21
1
22

= 1
11
1
22
÷1
2
12
_ 0
Note that the two conditions together imply that 1
22
_ 0.
CHAPTER 9. SECOND-ORDER CONDITIONS 120
9.5.2 Examples
¹ =

÷3 3
3 ÷4

is negative de…nite because c
11
< 0 and c
11
c
22
÷
c
12
c
21
= 3 0.
¹ =

6 1
1 3

is positive de…nite because c
11
0 and c
11
c
22
÷c
12
c
21
=
17 0.
¹ =

÷5 ÷3
÷3 4

is inde…nite because c
11
< 0 and c
11
c
22
÷ c
12
c
21
=
÷29 < 0.
9.6 Concave and convex functions
All of the second-order conditions considered so far rely on the objective
function being twice di¤erentiable. In the single-variable case we require
that the second derivative is nonpositive for a maximum and nonnegative for
a minimum. In the many-variable case we require that the matrix of second
and cross partials (the Hessian) is negative semide…nite for a maximum and
positive semide…nite for a minimum. But objective functions are not always
di¤erentiable, and we would like to have some second order conditions that
work for these cases, too.
Figure 9.2 shows a function with a maximum. It also has the following
property. If you choose any two points on the curve, such as points c and
/, and draw the line segment connecting them, that line segment always lies
below the curve. When this property holds for every pair of points on the
curve, we say that the function is concave.
It is also possible to characterize a concave function mathematically.
Point c in the …gure has coordinates (r
o
. 1(r
o
)), and point / has coordinates
(r
b
. 1(r
b
)). Any value r between r
o
and r
b
can be written as
r = tr
o
+ (1 ÷t)r
b
for some value t ÷ [0. 1]. Such a point is called convex combination of r
o
and r
b
. When t = 1 we get r = 1 r
o
+ 0 r
b
= r
o
, and when t = 0 we get
r = 0 r
o
+ 1 r
b
= r
b
. When t =
1
2
we get r =
1
2
r
o
+
1
2
r
b
which is the
midpoint between r
o
and r
b
, as shown in Figure 9.2.
The points on the line segment connecting c and / in the …gure have
CHAPTER 9. SECOND-ORDER CONDITIONS 121
x
b x
a
½x
a
+ ½x
b
f(½x
a
+ ½x
b
)
½f(x
a
) + ½f(x
b
)
a
b
f(x)
x
f(x)
Figure 9.2: A concave function
coordinates
(tr
o
+ (1 ÷t)r
b
. t1(r
o
) + (1 ÷t)1(r
b
))
for t ÷ [0. 1]. Let’s choose one value of t, say t =
1
2
. The point on the line
segment is

1
2
r
o
+
1
2
r
b
.
1
2
1(r
o
) +
1
2
1(r
b
)

and it is the midpoint between points c and /. But
1
2
r
o
+
1
2
r
b
is just a value
of r, and we can evaluate 1(r) at r =
1
2
r
o
+
1
2
r
b
. According to the …gure,
1(
1
2
r
o
+
1
2
r
b
) _
1
2
1(r
o
) +
1
2
1(r
b
)
where the left-hand side is the height of the curve and the right-hand side is
the height of the line segment connecting c to /.
Concavity says that this is true for all possible values of t, not just t =
1
2
.
De…nition 2 A function 1(r) is concave if, for all r
o
and r
b
,
1(tr
o
+ (1 ÷t)r
b
) _ t1(r
o
) + (1 ÷t)1(r
b
)
for all t ÷ [0. 1].
CHAPTER 9. SECOND-ORDER CONDITIONS 122
a
b
f(x)
x
f(x)
Figure 9.3: A convex function
Concave functions tend to be upward sloping, downward sloping, or have
a maximum (that is, upward sloping and then downward sloping). Thus,
instead of assuming that a function has the right properties of its second
derivative, we can instead assume that it is concave. And notice that noth-
ing in the de…nition says anything about whether r is single-dimensional or
multidimensional. The same de…nition of concave works for both single-
variable and multi-variable optimization problems.
If a concave function is twice di¤erentiable, it has a nonpositive second
derivative.
A convex function has the opposite property: the line segment connect-
ing any two points on the curve lies above the curve, as in Figure 9.3. We
get a corresponding de…nition.
De…nition 3 A function 1(r) is convex if, for all r
o
and r
b
,
1(tr
o
+ (1 ÷t)r
b
) _ t1(r
o
) + (1 ÷t)1(r
b
)
for all t ÷ [0. 1].
Convexity is the appropriate assumption for minimization, and if a convex
function is twice di¤erentiable its second derivative is nonnegative.
As an example, consider the standard pro…t-maximization problem, where
output is ¡ and pro…t is given by
:(¡) = :(¡) ÷c(¡).
CHAPTER 9. SECOND-ORDER CONDITIONS 123
c(q)
\$
q
r(q)
Figure 9.4: Pro…t maximization with a concave revenue function and a convex
cost function
where :(¡) is the revenue function and c(¡) is the cost function. The standard
assumptions are that the revenue function is concave and the cost function
is convex. If both functions are twice di¤erentiable, the …rst-order condition
is the familiar
:
t
(¡) ÷c
t
(¡) = 0
and the second-order condition is
:
tt
(¡) ÷c
tt
(¡) _ 0.
This last expression holds if :
tt
(¡) _ 0 which occurs when :(¡) is concave,
and if c
tt
(¡) _ 0 which occurs when c(¡) is convex. Figure 9.4 shows the
standard revenue and cost functions in a pro…t maximization problem, and
in the graph the revenue function is concave and the cost function is convex.
Some functions are neither convex nor concave. More precisely, they
have some convex portions and some concave portions. Figure 9.5 provides
an example. The function is convex to the left of r
0
and concave to the right
of r
0
.
CHAPTER 9. SECOND-ORDER CONDITIONS 124
x
0
f(x)
x
f(x)
Figure 9.5: A function that is neither concave nor convex
9.7 Quasiconcave and quasiconvex functions
The important property for a function with a maximum, such as the one
shown in Figure 9.2, is that it rise and then fall. But the function depicted
in Figure 9.6 also rises then falls, and clearly has a maximum. But it is
not concave. Instead it is quasiconcave, which is the weakest second-order
condition we can use. The purpose of this section is to de…ne the terms
"quasiconcave" and "quasiconvex," which will take some work.
Before we can de…ne them, we need to de…ne a di¤erent term. A set o
is convex if, for any two points r. n ÷ o , the point cr + (1 ÷ c)n ÷ o for
all c ÷ [0. 1]. Let’s break this into pieces using Figure 9.7. In the left-hand
graph, the set o is the interior of the oval, and choose two points r and n
in o. These can be either in the interior of o or on its boundary, but the
ones depicted are in the interior. The set ¦.[. = cr + (1 ÷ c)n for some
c ÷ [0. 1]¦ is just the line segment connecting r to n, as shown in the …gure.
The set o is convex if the line segment is inside of o, no matter which r
and n we choose. Or, using di¤erent terminology, the set o is convex if any
convex combination of two points in o is also in o.
In contrast, the set o in the right-hand graph in Figure 9.7 is not convex.
Even though points r and n are inside of o, the line segment connecting
them passes outside of o. In this case the set is nonconvex (there is no
such thing as a concave set).
CHAPTER 9. SECOND-ORDER CONDITIONS 125
f(x)
x
f(x)
Figure 9.6: A quasiconcave function
f(x)
x
S
x
y
f(x)
x
S
x
y
Figure 9.7: Convex sets: The set in the left-hand graph is convex because all
line segments connecting two points in the set also lie completely within the
set. The set in the right-hand graph is not convex because the line segment
drawn does not lie completely within the set.
CHAPTER 9. SECOND-ORDER CONDITIONS 126
x
1
x
2
y
f(x)
x
f(x)
B(y)
Figure 9.8: De…ning quasiconcavity: For any value n, the set 1(n) is convex
The graphs in Figure 9.7 are for 2-dimensional sets. The de…nition of
convex, though, works in any number of dimensions. In particular, it works
for 1-dimensional sets. A 1-dimensional set is a subset of the real line, and
it is convex if it is an interval, either open, closed, or half-open/half-closed.
Now look at Figure 9.8, which has the same function as in Figure 9.6.
Choose any value n, and look for the set
1(n) = ¦r[1(r) _ n¦.
This is a better-than set, and it contains the values of r that generate a value
of 1(r) that is at least as high as n. In Figure 9.8 the set 1(n) is the closed
interval [r
1
. r
2
], which is a convex set. This gives us our de…nition of a
quasiconcave function.
De…nition 4 The function 1(r) is quasiconcave if for any value n, the set
1(n) = ¦r[1(r) _ n¦ is convex.
To see why quasiconcavity is both important and useful, ‡ip way back
to the beginning of the book to look at Figure 1.1 on page 2. That graph
depicted either a consumer choice problem, in which case the line is a bud-
get line and the curve is an indi¤erence curve, or it depicted a …rm’s cost-
minimization problem, in which case the line is an isocost line and the curve
CHAPTER 9. SECOND-ORDER CONDITIONS 127
x
1
x
2
y
f(x)
x
f(x)
W(y)
Figure 9.9: A quasiconvex function
is an isoquant. Think about it as a consumer choice problem. The area
above the indi¤erence curve is the set of points the consumer prefers to those
on the indi¤erence curve. So the set of points above the indi¤erence curve
is the better-than set. And it’s convex. So the appropriate second-order
condition for utility maximization problems is that the utility function is
quasiconcave. Similarly, an appropriate second-order condition for cost-
minimization is that the production function (the function that gives you
the isoquant) is quasiconcave. When you take microeconomics, see where
quasiconcavity shows up as an assumption.
Functions can also be quasiconvex.
De…nition 5 The function 1(r) is quasiconvex if for any value n, the set
\(n) = ¦r[1(r) _ n¦ is convex.
Quasiconvex functions are based on worse-than sets \(n). This time the
set of points generating values lower than n must be convex. To see why,
look at Figure 9.9. This time the points that generate values of 1(r) lower
than n form an interval, but the better-than set is not an interval.
Concave functions are also quasiconcave, which you can see by looking at
Figure 9.2, and convex functions are also quasiconvex, which you can see by
looking at Figure 9.3. But a quasiconcave function may not be concave, as
CHAPTER 9. SECOND-ORDER CONDITIONS 128
in Figure 9.8, and a quasiconvex function may not be convex, as in Figure
9.9.
The easiest way to remember the de…nition for quasiconcave is to draw a
concave function with a maximum. We know that it is also quasiconcave.
Choose a value of n and draw the horizontal line, like we did in Figure 9.8.
Which set is convex, the better-than set or the worse-than set? As shown
in the …gure, it’s the better-than set that is convex, so we get the right
de…nition. If you draw a convex function with a minimum and follow the
same steps you can …gure out the right de…nition for a quasiconvex function.
9.8 Problems
1(r
1
. r
2
. r
3
) = 2r
1
r
2
3
+ 3r
1
r
2
2
÷4r
2
1
and then evaluate it at the point (5. 2. 0).
2. Find the second-degree Taylor approximations of the following func-
tions at r
0
= 1:
(a) 1(r) = ÷2r
3
÷5r + 9
(b) 1(r) = 10r ÷40

r + ln r
(c) 1(r) = c
a
3. Find the second-degree Taylor approximation of the function 1(r) =
3r
3
÷4r
2
÷2r + 12 at r
0
= 0.
4. Find the second-degree Taylor approximation of the function 1(r) =
cr
2
+/r +c at r
0
= 0.
5. Tell whether the following matrices are negative de…nite, negative semi-
de…nite, positive semide…nite, positive de…nite, or inde…nite.
(a)

÷3 2
2 1

(b)

1 2
2 4

CHAPTER 9. SECOND-ORDER CONDITIONS 129
(c)

3 4
4 ÷3

(d)

¸
4 0 1
0 ÷3 ÷2
1 ÷2 1
¸

(e)

6 1
1 3

(f)

÷4 16
16 ÷4

(g)

÷2 1
1 ÷4

(h)

¸
3 2 3
2 4 0
3 0 ÷1
¸

6. State whether the second-order condition is satsi…ed for the following
problems.
(a) min
a.&
4n
2
÷rn
(b) max
a.&
7 + 8r + 6n ÷r
2
÷n
2
(c) max
a.&
5rn ÷2n
2
(d) min
a.&
6r
2
+ 3n
2
7. Is (6. 2) a convex combination of (11. 4) and (÷1. 0)? Explain.
8. Use the formula for convexity, and not the second derivative, to show
that the function 1(r) = r
2
is convex.

PART III

ECONOMETRICS

(probability and statistics)

CHAPTER
10
Probability
10.1 Some de…nitions
An experiment is an activity that involves doing something or observing
something resulting in an outcome. The performance of an experiment is
a trial. Experiments can be physical, biological, social, or anything else.
The sample space for an experiment is the set of possible outcomes of
the experiment. The sample space is denoted (the Greek letter omega) and
a typical element, or outcome, is denoted . (lower case omega). The impos-
sible event is the empty set, ?. Suppose, for example, that the experiment
consists of tossing a coin twice. The sample space is
= ¦(H. H). (H. 1). (1. H). (1. 1)¦
and each pair is an outcome. Another experiment is an exam. Scores are
out of 100, so the sample space is
= ¦0. 1. .... 100¦.
131
CHAPTER 10. PROBABILITY 132
We are going to work with subsets, and we are going to work with some
mathematical symbols pertaining to subsets. The notation is given in the
following table.
In math In English
. ÷ ¹ omega is an element of A
¹ ¨ 1 A intersection B
¹ ' 1 A union B
¹ · 1 A is a strict subset of B
¹ _ 1 A is a weak subset of B
¹
C
The complement of A
An event is a subset of the sample space. If the experiment consists of
tossing a coin twice, the event that there is at least one head can be written
¹ = ¦(H. H). (H. 1). (1. H)¦.
Note that the entire sample space is an event, and so is the impossible event
?. Sometimes we want to talk about single-element events, and write .
It is best to think of outcomes and events as occurring. For the latter,
an event ¹ occurs if there is some outcome . ÷ ¹ such that . occurs.
Our eventual goal is to assign probabilities to events. To do this we need
notation for the set of all possible events. Call it (for sigma-algebra, which
is a concept we will not get into).
Two events ¹ and 1 are mutually exclusive if there is no outcome that
is in both events, that is, ¹ ¨ 1 = ?. If ¹ and 1 are mutually exclusive
then if event ¹ occurs event 1 is impossible, and vice versa.
10.2 De…ning probability abstractly
A probability measure is a mapping 1 : ÷ [0. 1], that is, a function
mapping events into numbers between 0 and 1. The function 1 has three
key properties:
Axiom 1 1(¹) _ 0 for any event ¹ ÷ .
Axiom 2 1() = 1
CHAPTER 10. PROBABILITY 133
Axiom 3 If ¹
1
. ¹
2
. ... is a (possibly …nite) sequence of mutually exclusive
events, then
1(¹
1
' ¹
2
' ...) = 1(¹
1
) +1(¹
2
) +...
The …rst axiom states that probabilities cannot be negative. The second
one states that the probability that something happens is one. The third
axiom states that when events are mutually exclusive, the probability of the
union is simply the sum of the probabilities.
These three axioms imply some other properties.
Theorem 12 1( ?) = 0.
Proof. The events and ? are mutually exclusive since ¨ ? = ?. Since
' ? = , axiom 3 implies that
1 = 1( ' ?) = 1() +1(?) = 1 +1(?),
and it follows that 1( ?) = 0.
The next result concerns the relation _, where ¹ _ 1 means that either
¹ is contained in 1 or ¹ is equal to 1.
Theorem 13 If ¹ _ 1 then 1(¹) _ 1(1).
Proof. Suppose that ¹ and 1 are events with ¹ _ 1. De…ne ( = ¦. : . ÷
1 but . ÷ ¹¦. Then ¹ and ( are mutually exclusive with ¹'( = 1, and
axiom 3 implies that
1(1) = 1(¹ ' () = 1(¹) +1(() _ 1(¹).
For the next theorems, let ¹
C
denote the complement of the event ¹,
that is, ¹
C
= ¦. ÷ : . ÷ ¹¦.
Theorem 14 1(¹
C
) = 1 ÷1(¹).
CHAPTER 10. PROBABILITY 134
Proof. Note that ¹
C
' ¹ = and ¹
C
¨ ¹ = ?. Then 1(¹
C
' ¹) =
1(¹
C
) +1(¹) = 1, and therefore 1(¹
C
) = 1 ÷1(¹).
Note that this theorem implies that 1(¹) _ 1 for any event ¹. To see
why, …rst write 1(¹) = 1 ÷ 1(¹
C
), and by axiom 1 we have 1(¹
C
) _ 0.
The result follows.
Theorem 15 1(¹ ' 1) = 1(¹) +1(1) ÷1(¹ ¨ 1).
Proof. First note that ¹ ' 1 = ¹ ' (¹
C
¨ 1). You can see this in Figure
10.1. The points in ¹'1 are either in ¹ or they are outside of ¹ but in 1.
The events ¹ and ¹
C
¨ 1 are mutually exclusive. Axiom 3 says
1(¹ ' 1) = 1(¹) +1(¹
C
¨ 1).
Next note that 1 = (¹¨1)'(¹
C
¨1). Once again this is clear in Figure 10.1,
but it says that we can separate 1 into two parts, the part that intersects ¹
and the part that does not. These two parts are mutually exclusive, so
1(1) = 1(¹ ¨ 1) +1(¹
C
¨ 1). (10.1)
Rearranging yields
1(¹
C
¨ 1) = 1(1) ÷1(¹ ¨ 1).
Substituting this into equation (10.1) yields
1(¹ ' 1) = 1(¹) +1(¹
C
¨ 1)
= 1(¹) +1(1) ÷1(¹ ¨ 1).
10.3 De…ning probabilities concretely
The previous section told us what the abstract concept probability measure
means. Sometimes we want to know actual probabilities. How do we get
them? The answer relies on the ability to partition the sample space into
equally likely outcomes.
CHAPTER 10. PROBABILITY 135
Ω
A
B
A ∩ B
A
C
∩ B A ∩ B
C
Figure 10.1: Finding the probability of the union of two sets
Theorem 16 Suppose that every outcome in the sample space = ¦.
1
. .... .
a
¦
is equally likely. Then the probability of any event ¹ is the number of out-
comes in ¹ divided by :.
Proof. We know that 1() = 1(¦.
1
¦ ' ' ¦.
a
¦) = 1. By construction
¦.
j
¦ ¨ ¦.
;
¦ = ? when i = ,, and so the events ¦.
1
¦. .... ¦.
a
¦ are mutually
exclusive. By Axiom 3 we have
1() = 1(¦.
1
¦) +... +1(¦.
a
)¦ = 1.
Since each of the outcomes is equally likely, this implies that
1(¦.
j
¦) = 1:
for i = 1. .... :. If event ¹ contains / of the outcomes in the set ¦.
1
. .... .
a
¦,
it follows that 1(¹) = /:.
This theorem allows us to compute probabilities from experiments like
‡ipping coins, rolling dice, and so on. For example, if a die has six sides,
the probability of the outcome 5 is 16. The probability of the event ¦1. 2¦
is 16 + 16 = 13, and so on.
For an exercise, …nd the probability of getting exactly one head in four
tosses of a fair coin. Answer: There are 16 possible outcomes. Four of
them have one head. So, the probability is 14.
CHAPTER 10. PROBABILITY 136
In general events are not equally likely, so we cannot determine proba-
bilities theoretically in this manner. Instead, the probabilities of the events
are given directly.
10.4 Conditional probability
Suppose that 1(1) 0, so that the event 1 occurs with positive probability.
Then 1(¹[1) is the conditional probability that event ¹ occurs given that
event 1 has occurred. It is given by the formula
1(¹[1) =
1(¹ ¨ 1)
1(1)
.
Think about what this expression means. The numerator is the probability
that both ¹ and 1 occur. The denominator is the probability that 1 oc-
curs. Clearly ¹ ¨ 1 is a subset of 1, so the numerator is smaller than the
denominator, as required. The ratio can be interpreted as the fraction of
the time when 1 occurs that ¹ also occurs.
Consider the probability distribution given in the table below. Outcomes
are two dimensional, based on the values of r and n. The probabilities of
the di¤erent outcomes are given in the entries of the table. Note that all the
entries are nonnegative and that the sum of all the entries is one, as required
for a probability measure.
n = 1 n = 2 n = 3 n = 4
r = 1 0.02 0.01 0.02 0.10
r = 2 0.05 0.00 0.03 0.11
r = 3 0.04 0.15 0.02 0.09
r = 4 0.10 0.16 0.02 0.08
Find the conditional probability 1(r = 2[n = 4). The formula is 1(r = 2
and n = 4)1(n = 4). The probability that r = 2 and n = 4 is just the entry
in a single cell, and is 0.11. The probability that n = 4 is the sum of the
probabilities in the last column, or 0.38. So, 1(r = 2[n = 4) = 0.110.38 =
1138.
Now …nd the conditional probability that n is either 1 or 2 given that
r _ 3. The probability that n _ 2 and r _ 3 is the sum of the four cells
CHAPTER 10. PROBABILITY 137
in the lower left, or 0.45. The probability that r _ 3 is 0.66. So, the
conditional probability is 4566 = 1522.
Now look at a medical example. A patient can have condition A or not.
He takes a test which turns out positive or not. The probabilities are given
in the following table:
Test positive Test negative
Condition A 0.010 0.002
Healthy 0.001 0.987
Note that condition A is quite rare, with only 12 people in 1000 having it.
Also, a positive test is ten times more likely to come from a patient with the
condition than from a patient without the condition. We get the following
conditional probabilities:
1(¹[positive) = 1011 = 0.909.
1(healthy[negative) = 987989 = 0.998.
1(positive[¹) = 1012 = 0.833.
1(negative[healthy) = 987988 = 0.999.
10.5 Bayes’ rule
Theorem 17 (Bayes’ rule) Assume that 1(1) 0. Then
1(¹[1) =
1(1[¹)1(¹)
1(1)
.
Proof. Note that
1(¹[1) =
1(¹ ¨ 1)
1(1)
and
1(1[¹) =
1(¹ ¨ 1)
1(¹)
.
Rearranging the second one yields
1(¹ ¨ 1) = 1(1[¹)1(¹)
CHAPTER 10. PROBABILITY 138
and the theorem follows from substitution.
Let’s make sure Bayes’ rule works for the medical example given above.
We have 1(positive[¹) = 1012, 1(positive) = 111000, and 1(¹) = 121000.
Using Bayes’ rule we get
1(¹[positive) =
1(positive[¹) 1(¹)
1(positive)
=
10
12

12
1000
11
1000
=
10
11
.
The interpretation of Bayes’ rule is for responding to updated information.
We are interested in the occurrence of event ¹ after we receive some new
information. We start with the prior 1(¹). Then we …nd out that 1
holds. We should use this new information to update the probability of ¹
occurring. 1(¹[1) is called the posterior probability. Bayes’ rule tells us
how to do this. We multiply the prior 1(¹) by the likelihood
1(1[¹)
1(1)
.
If this ratio is greater than one, the posterior probability is higher than the
prior probability. If the ratio is smaller than one, the posterior probability
is lower. The ratio is greater than one if ¹ occurring makes 1 more likely.
Bayes’ rule is important in game theory, …nance, and macro.
People don’t seem to follow it. Here is a famous example (Kahneman
and Tversky, 1973 Psychological Review). Some subjects are told that a
group consists of 70 lawyers and 30 engineers. The rest of the subjects are
told that the group has 30 lawyers and 70 engineers. All subjects were then
given the following description:
Dick is a 30 year old man. He is married with no children. A
man of high ability and high motivation, he promises to be quite
successful in his …eld. He is well liked by his colleagues.
Subjects were then asked to judge the probability that Dick is an engineer.
Subjects in both groups said that it is about 0.5, ignoring the prior infor-
mation. The new information is uninformative, so 1(1[¹)1(1) = 1, and
according to Bayes’ rule the posterior should be the same as the prior.
This example has people overweighting the new information. Psycholo-
gists have also come up with studies in which subjects overweight the prior.
When subjects overweight the new information it is called representativeness,
and when they overweight the prior it is called conservatism.
CHAPTER 10. PROBABILITY 139
10.6 Monty Hall problem
At the end of the game show Let’s Make a Deal the host, Monty Hall, o¤ers
a contestant the choice among three doors, labeled ¹, 1, and (. There is a
prize behind one of the doors, and nothing behind the other two. After the
contestant chooses a door, to build suspense Monty Hall reveals one of the
doors with no prize. He then asks the contestant if she would like to stay
with her original door or switch to the other one. What should she do?
The answer is that she should take the other door. To see why, suppose
she chooses door ¹, and that Monty reveals door 1. What is the probability
that the prize is behind door ( given that door 1 was revealed? Bayes’ rule
says we use the formula
1(prize behind ([ reveal 1) =
1(reveal 1[ prize () 1(prize ()
1(reveal 1)
.
Before revealing the door, the prize was equally likely to be behind each of
the three doors, so 1(prize ¹) = 1(prize 1) = 1(prize () = 13. Next …nd
the conditional probability that Monty reveals door 1 given that the prize is
behind door (. Remember that Monty cannot reveal the door with the prize
behind it or the door chosen by the contestant. Therefore Monty must reveal
door 1 if the prize is behind door (, and the conditional probability 1(reveal
1[ prize () = 1. The remaining piece of the formula is the probability that
he reveals 1. We can write this as
1(reveal 1) = 1(reveal 1[ prize ¹) 1(¹)
+1(reveal 1[ prize 1) 1(1)
+1(reveal 1[ prize () 1(().
The middle term is zero because he cannot reveal the door with the prize
behind it. The last term is 13 for the reasons given above. If the prize is
behind ¹ he can reveal either 1 or ( and, assuming he does so randomly,
the conditional probability 1(reveal 1[ prize ¹) = 12. Consequently the
…rst term is
1
2

1
3
=
1
6
. Using all this information, the probability of revealing
door 1 is 1(reveal 1) =
1
6
+
1
3
=
1
2
. Plugging this into Bayes’ rule yields
1(prize ([ reveal 1) =
1
1
3
1
2
=
2
3
.
The probability that the prize is behind ¹ given that he revealed door 1 is
1 ÷1(prize ([ reveal 1) = 13. The contestant should switch.
CHAPTER 10. PROBABILITY 140
10.7 Statistical independence
Two events ¹ and 1 are independent if and only if 1(¹¨1) = 1(¹)1(1).
Consider the following table relating accidents to drinking and driving.
Accident No accident
Drunk driver 0.03 0.10
Sober driver 0.03 0.84
Notice that from this table that half of the accidents come from sober
drivers, but there are many more sober drivers than drunk ones. The ques-
tion is whether accidents are independent of drunkenness. Compute 1(drunk
¨ accident) = 0.03, 1(drunk) = 0.13, 1(accident) = 0.06, and …nally
1(d:n:/) 1(cccidc:t) = 0.13 0.06 = 0.0078 = 0.03.
So, accidents and drunk driving are not independent events. This is not sur-
prising, as we would expect drunkenness to be a contributing factor to acci-
dents. Note that 1(accident[drunk) = 313 = 0.23 while 1(accident[sober) =
387 = 0.03 4.
We can prove an easy theorem about independent events.
Theorem 18 If ¹ and 1 are independent then 1(¹[1) = 1(¹).
Proof. We have
1(¹[1) =
1(¹ ¨ 1)
1(1)
=
1(¹)1(1)
1(1)
= 1(¹).
According to this theorem, if accidents and drunkenness were independent
events, then 1(accident[drunk) = 1(accident); that is, the probability of
getting in an accident when drunk is the same as the overall probability of
getting in an accident.
CHAPTER 10. PROBABILITY 141
10.8 Problems
1. Answer the questions using the table below.
n = 1 n = 2 n = 3 n = 4 n = 5
r = 5 0.01 0.03 0.17 0.00 0.00
r = 20 0.03 0.05 0.04 0.20 0.12
r = 30 0.11 0.04 0.02 0.07 0.11
(a) What is the most likely outcome?
(b) What outcomes are impossible?
(c) Find the probability that r = 30.
(d) Find the probability that r ÷ ¦5. 20¦ and 2 _ n _ 4.
(e) Find the probability that n _ 2 conditional on r _ 20.
(f) Verify Bayes’ rule for 1(n = 4[r = 20).
(g) Are the events r _ 20 and n ÷ ¦1. 4¦ statistically independent?
2. Answer the questions from the table below:
/ = 1 / = 2 / = 3 / = 4
c = 1 0.02 0.02 0.21 0.02
c = 2 0.03 0.01 0.05 0.06
c = 3 0.01 0.01 0.01 0.06
c = 4 0.00 0.05 0.00 0.12
c = 5 0.12 0.06 0.00 0.14
(a) Which event is more likely, ¹ = ¦(c. /) : 3 _ c _ 4¦ or 1 =
¦(c. /) : / _ 2 and c = 5¦?
(b) List the largest impossible event.
(c) Find the probability that / = 3.
(d) Find 1(/ = 2[c = 5).
(e) Find 1(c _ 3[/ ÷ ¦1. 4¦).
(f) Are the events c ÷ ¦1. 3¦ and / ÷ ¦1. 2. 4¦ statistically indepen-
dent?
CHAPTER 10. PROBABILITY 142
3. A disease hits 1 in every 20,000 people. A diagnostic test is 95%
accurate, that is, the test is positive for 95% of people with the disease,
and negative for 95% of the people who do not have the disease. Max
just tested positive for the disease. What is the probability he has it?
4. You have data that sorts individuals into occupations and age groups.
There are three occupations: doctor, lawyer, and entrpreneur. There
are two age categories: below 40 (young) and above 40 (old). You
wanted to know the probability that an old person is an entrepreneur.
the following information:
20% of the sample are doctors and 30% are entrepreneurs
40% of the doctors are young
20% of the entrepreneurs are young
70% of the lawyers are young
Find the probability that the an old person is an entrepreneur.
CHAPTER
11
Random variables
11.1 Random variables
A random variable is a variable whose value is a real number, and that
number is determined by the outcome of an experiment. For example, the
number of heads in ten coin tosses is a random variable, and the Dow-Jones
Industrial Average is a random variable.
The standard notation for a random variable is to place a tilde over the
variable. So ~ r is a random variable.
The realization of a randomvariable is based on the outcome of an actual
experiment. For example, if I toss a coin ten times and …nd four heads, the
realization of the random variable is 4. When the random variable is denoted
~ r its realization is denoted r.
A random variable is discrete if it can take only discrete values (either
a …nite number or a countable in…nity of values). A random variable is
continuous if it can take any value in an interval.
Random variables have probability measures, and we can use random
variables to de…ne events. For example, 1(~ r = r) is the probability that the
143
CHAPTER 11. RANDOM VARIABLES 144
realization of the random variable ~ r is r, and 1(~ r ÷ [2. 3]) is the probability
that the realization of the random variable ~ r falls in the interval [2. 3]. The
event in the latter example is the event that the realization of ~ r falls in the
interval [2. 3].
11.2 Distribution functions
The distribution function for the random variable ~ r with probability mea-
sure 1 is given by
1(r) = 1(~ r _ r).
The distribution function 1(r) tells the probability that the realization of
the random variable is no greater than r. Distribution functions are almost
always denoted by capital letters.
Theorem 19 Distribution functions are nondecreasing and take values in
the interval [0. 1].
Proof. The second part of the statement is obvious. For the …rst part,
suppose r < n. Then the event ~ r _ r is contained in the event ~ r _ n, and
by Theorem 13 we have
1(r) = 1(~ r _ r) _ 1(~ r _ n) = 1(n).
11.3 Density functions
If the distribution function 1(r) is di¤erentiable, the density function is
1(r) = 1
t
(r).
If the distribution function 1(r) is discrete, the density function is 1(~ r = r)
for each possible value of r. Sometimes distribution functions are neither
di¤erentiable nor discrete. This causes headaches that we will not deal with
here.
Note that it is possible to go from a density function to a distribution
function:
1(r) =

a
÷o
1(t)dt.
CHAPTER 11. RANDOM VARIABLES 145
So, the distribution function is the accumulated value of the density func-
function is often called the cumulative density function, or c.d.f. The density
function is often called the probability density function, or p.d.f.
The support of a distribution 1(r) is the smallest closed set containing
¦r[1(r) = 0¦, that is, the set of points for which the density is positive. For a
discrete distribution this is just the set of outcomes to which the distribution
assigns positive probability. For a continuous distribution the support is the
smallest closed set containing all of the points that have positive probability.
For our purposes there is no real reason for using a closed (as opposed to
open) set, but the de…nition given here is mathematically correct.
11.4 Useful distributions
11.4.1 Binomial (or Bernoulli) distribution
The binomial distribution arises when the experiment consists of repeated
trials with the same two possible outcomes in each trial. The most obvious
example is ‡ipping a coin : times. The outcome is a series of heads and
tails, and the probability distribution governing the number of heads in the
series of : coin tosses is the binomial distribution.
To get a general formula, label one possible outcome of the trial a success
and the other a failure. These are just labels. In coin tossing, we could
count a head as a "success" and a tail as a "failure," or we could do it the
other way around. If we are checking lightbulbs to see if they work, we could
label a working lightbulb as a "success" and a nonworking bulb as a "failure,"
or we could do it the other way around. A coauthor (Harold Winter) and
I used the binomial distribution to model juror bias, and the two possible
outcomes were a juror biased toward conviction and a juror biased toward
acquittal. We obviously cared about the total bias of the group. We had
to arbitrarily label one form of bias a "success" and the other a "failure."
Suppose that the probability of a success is j in any given trial, which
means that the probability of a failure is ¡ = 1 ÷ j. Also, assume that the
trials are statistically independent, so that a success in trial t has no e¤ect
on the probability of success in period t + 1.
The question is, what is the probability of r successes in : trials?
Let’s work this out for the case of two trials. Let the vector (outcome 1,
CHAPTER 11. RANDOM VARIABLES 146
outcome 2) denote the event in which outcome 1 is realized in the …rst trial
and outcome 2 in the second trial. Using 1 as the probability measure, we
have
1(success, success) = j
2
1(success, failure) = j¡
1(failure, success) = j¡
1(failure, failure) = ¡
2
The probability of two successes in two trials is j
2
, the probability of one
success in two trials is 2j¡, and the probability of two failures is ¡
2
.
With three trials, letting : denote a success and 1 a failure, we have
1(:. :. :) = j
3
1(:. :. 1) = 1(:. 1. :) = 1(1. :. :) = j
2
¡
1(:. 1. 1) = 1(1. :. 1) = 1(1. 1. :) = j¡
2
1(1. 1. 1) = ¡
3
Thus the probability of three successes is j
3
, the probability of two successes
is 3j
2
¡, the probability of one success is 3j¡
2
, and the probability of no
successes is ¡
3
.
In general, the rule for r successes in : trials when the probability of
success in a single trial is j is
/(r. :. j) =

:
r

j
a
¡
a÷a
There are two pieces of the formula. The probability of a single, particular
con…guration of r successes and : ÷ r failures is j
a
¡
a÷a
. For example, the
probability that the …rst r trials are successes and the last : ÷ r trials are
failures is j
a
¡
a÷a
. The number of possible con…gurations with r successes
and : ÷r failures is

a
a

, which is given by

:
r

=
:!
r!(: ÷r)!
=
1 2 ... :
[1 ... r][1 ... (: ÷r)]
.
Note that the function /(r. :. j) is a density function. The binomial
distribution is a discrete distribution, so /(r. :. j) = 1(~ r = r), where ~ r is
the random variable measuring the number of successes in : trials.
CHAPTER 11. RANDOM VARIABLES 147
The binomial distribution function is
1(r. :. j) =
a
¸
j=0
/(r. :. j).
so that it is the probability of getting r or fewer successes in : trials.
Microsoft Excel, and probably similar programs, make it easy to compute
the binomial density and distribution. The formula for /(r. :. j) is
=BINOMDIST(r. :. j. 0)
and the formula for 1(r. :. j) is
=BINOMDIST(r. :. j. 1)
The last argument in the function just tells the program whether to compute
the density or the distribution.
Let’s go back to my jury example. Suppose we want to draw a pool
of 12 jurors from the population, and that 20% of them are biased toward
acquittal, with the rest biased toward conviction. The probability of drawing
2 jurors biased toward acquittal and 10 biased toward conviction is
/(2. 12. 0.2) = 0.283
The probability of getting at most two jurors biased toward acquittal is
1(2. 12. 0.2) = .558
The probability of getting a jury in which every member is biased toward
conviction (and no member is biased toward acquittal) is
/(0. 12. 0.2) = 0.069
11.4.2 Uniform distribution
The great appeal of the uniform distribution is that it is easy to work with
mathematically. It’s density function is given by
1(r) =

1
b÷o
r ÷ [c. /]
if
0 r ÷ [c. /]
CHAPTER 11. RANDOM VARIABLES 148
The corresponding distribution function is
1(r) =

0 r < c
a÷o
b÷o
if c _ r _ /
1 r /
Okay, so these look complicated. But, if r is in the support interval [c. /],
then the density function is the constant function 1(r) = 1(/ ÷c) and the
distribution function is the linear function 1(r) = (r ÷c)(/ ÷c).
Graphically, the uniform density is a horizontal line. Intuitively, it
spreads the probability evenly (or uniformly) throughout the support [c. /],
which is why it has its name. The distribution function is just a line with
slope 1(/ ÷c) in the support [c. /].
11.4.3 Normal (or Gaussian) distribution
The normal distribution is the one that gives us the familiar bell-shaped
density function, as shown in Figure 11.1. It is also central to statistical
analysis, as we will see later in the course. We begin with the standard
normal distribution. For now, its density function is
1(r) =
1

2:
c
÷a
2
/2
and its support is the entire real line: (÷·. ·). The distribution function
is
1(r) =
1

2:

a
÷o
c
÷t
2
/2
dt
and we write it as an integral because there is no simple functional form.
Later on we will …nd the mean and standard deviation for di¤erent dis-
tributions. The standard normal has mean 0 and standard deviation 1. A
more general normal distribution with mean j and standard deviation o has
density function
1(r) =
1
o

2:
c
÷(a÷j)
2
/2o
2
and distribution function
1(r) =
1
o

2:

a
÷o
c
÷(t÷j)
2
/2o
2
dt.
CHAPTER 11. RANDOM VARIABLES 149
-5 -4 -3 -2 -1 0 1 2 3 4 5
0.1
0.2
0.3
0.4
x
y
Figure 11.1: Density function for the standard normal distribution
The e¤ects of changing the mean j can be seen in Figure 11.2. The peak
of the normal distribution is at the mean, so the standard normal peaks at
r = 0, which is the thick curve in the …gure. The thin curve in the …gure
has a mean of 2.5.
Changing the standard deviation o has a di¤erent e¤ect, as shown in
Figure 11.3. The thick curve is the standard normal with o = 1, and the
thin curve has o = 2. As you can see, increasing the standard deviation
lowers the peak and spreads the density out, moving probability away from
the mean and into the tails.
11.4.4 Exponential distribution
The density function for the exponential distribution is
1(r) =
1
o
c
÷a/0
and the distribution function is
1(r) = 1 ÷c
÷a/0
.
It is de…ned for r 0. The exponential distribution is often used for the
failure rate of equipment: the probability that a piece of equipment will fail
CHAPTER 11. RANDOM VARIABLES 150
5 2.5 0 -2.5 -5
0.3
0.2
0.1
0
x
y
x
y
Figure 11.2: Changing the mean j of the normal density
5 2.5 0 -2.5 -5
0.3
0.2
0.1
0
x
y
x
y
Figure 11.3: Increasing the standard deviation o of the normal density
CHAPTER 11. RANDOM VARIABLES 151
0 1 2 3 4 5
0.0
0.1
0.2
0.3
0.4
0.5
0.6
x
y
Figure 11.4: Density function for the lognormal distribution
by time r is 1(r). Accordingly, 1(r) is the probability of a failure at time
r.
11.4.5 Lognormal distribution
The random variable ~ r has the lognormal distribution if the random vari-
able ~ n = ln ~ r has the normal distribution. The density function is
1(r) =
1

2:

c
÷(ln a)
2
/2

1
r
and it is de…ned for r _ 0. It is shown in Figure 11.4.
11.4.6 Logistic distribution
The logistic distribution function is
1(r) =
1
1 +c
÷a
.
Its density function is
1(r) =
c
÷a
(1 +c
÷a
)
2
.
CHAPTER 11. RANDOM VARIABLES 152
5 2.5 0 -2.5 -5
0.25
0.2
0.15
0.1
0.05
x
y
x
y
Figure 11.5: Density function for the logistic distribution
It is de…ned over the entire real line and gives the bell-shaped density function
shown in Figure 11.5.
CHAPTER
12
Integration
If you took a calculus class from a mathematician, you probably learned two
things about integration: (1) integration is the opposite of di¤erentiation,
and (2) integration …nds the area under a curve. Both of these are cor-
rect. Unfortunately, in economics we rarely talk about the area under a
curve. There are exceptions, of course. We sometimes think about pro…t as
the area between the price line and the marginal cost curve, and we some-
times compute consumer surplus as the area between the demand curve and
the price line. But this is not the primary reason for using integration in
economics.
Before we get into the interpretation, we should …rst deal with the me-
chanics. As already stated, integration is the opposite of di¤erentiation.
To make this explicit, suppose that the function 1(r) has derivative 1(r).
The following two statements provide the fundamental relationship between
derivatives and integrals:

b
o
1(r)dr = 1(/) ÷1(c). (12.1)
153
CHAPTER 12. INTEGRATION 154
and
1(r)dr = 1(r) +c. (12.2)
where c is a constant. The integral in (12.1) is a de…nite integral, and
its distinguishing feature is that the integral is taken over a …nite interval.
The integral in (12.2) is an inde…nite integral, and it has no endpoints.
The reason for the names is that the solution in (12.1) is unique, or de…nite,
while the solution in (12.2) is not unique. This occurs because when we
integrate the function 1(r), all we know is the slope of the function 1(r),
and we do not know anything about its height. If we choose one function
that has slope 1(r), call it 1
+
(r), and we shift it upward by one unit, its
slope is still 1(r). The role of the constant c in (12.2), then, is to account
for the indeterminacy of the height of the curve when we take an integral.
The two equations (12.1) and (12.2) are consistent with each other. To
see why, notice that

1(r)dr =

o
÷o
1(r)dr.
so an inde…nite integral is really just an integral over the entire real line
(÷·. ·). Furthermore,

b
o
1(r)dr =

b
÷o
1(r)dr ÷

o
÷o
1(r)dr
= [1(/) +c] ÷[1(c) +c]
= 1(/) ÷1(c).
Some integrals are straightforward, but others require some work. From
our point of view the most important ones follow, and they can be checked
by di¤erentiating the right-hand side.

r
a
dr =
r
a+1
: + 1
+c for : = ÷1

1
r
dr = ln r +c

c
va
dr =
c
va
:
+c

ln rdr = r ln r ÷r +c
CHAPTER 12. INTEGRATION 155
There are also two useful rules for more complicated integrals:

c1(r)dr = c

1(r)dr

[1(r) +o(r)] dr =

1(r)dr +

o(r)dr.
The …rst of these says that a constant inside of the integral can be moved
outside of the integral. The second one says that the integral of the sum
of two functions is the sum of the two integrals. Together they say that
integration is a linear operation.
12.1 Interpreting integrals
Repeat after me (okay, way after me, because I wrote this in April 2008):
They can also be used to …nd the area under a curve, but in economics the
To see why, suppose we wanted to do something strange like measure
the amount of water ‡owing in a particular river during a year. We have
not …gured out how to measure the total volume, but we can measure the
‡ow at any instant in time using our Acme Hydro‡owometer
TM
. At time
t the volume of water ‡owing past a particular point as measured by the
Hydro‡owometer
TM
is /(t).
Suppose that we break the year into : intervals of length 1: each, where
1 is the amount of time in a year. We measure the ‡owonce per time interval,
and our measurement times are t
1
. .... t
a
. We use the measured ‡ow at time
t
j
to calculate the total ‡ow for period i according to the formula
/(t
j
)
1
:
where /(t
j
) is our measure of the instantaneous ‡ow and 1: is the length
of time in the interval. The total estimated volume for the year is then
\ (:) =
tn
¸
t=t
1
/(t
j
)
1
:
. (12.3)
CHAPTER 12. INTEGRATION 156
We can make our estimate of the volume more accurate by taking more
measurements of the ‡ow. As we do this : becomes larger and 1: becomes
smaller.
Now suppose that we could measure the ‡ow at every instant of time.
Then 1: ÷ 0, and if we tried to do the summation in equation (12.3) we
would add up a whole bunch of numbers, each of which is multiplied by zero.
But the total volume is not zero, so this cannot be the right approach. It’s
not. The right approach uses integrals. The idea behind an integral is
adding an in…nite number of zeroes together to get something that is not
zero. Our correct formula for the volume would be
\ =

T
0
/(t)dt.
The expression dt takes the place of the expression 1: in the sum, and it
is the length of each measurement interval, which is zero.
We can use this approach in a variety of economic applications. One
major use is for taking expectations of continuous random variables, which
is the topic of the next chapter. Before going on, though, there are two
useful tricks involving integrals that deserve further attention.
12.2 Integration by parts
Integration by parts is a very handy trick that is often used in economics.
It is also what separates us from the lower animals. The nice thing about
integration by parts is that it is simple to reconstruct. Start with the product
rule for derivatives:
d
dr
[1(r)o(r)] = 1
t
(r)o(r) +1(r)o
t
(r).
Integrate both sides of this with respect to r and over the interval [c. /]:

b
o
d
dr
[1(r)o(r)]dr =

b
o
1
t
(r)o(r)dr +

b
o
1(r)o
t
(r)dr. (12.4)
Note that the left-hand term is just the integral (with respect to r) of a
derivative (with respect to r), and combining those two operations leaves
CHAPTER 12. INTEGRATION 157
the function unchanged:

b
o
d
dr
[1(r)o(r)]dr = 1(r)o(r)[
b
o
= 1(/)o(/) ÷1(c)o(c).
Plugging this into (12.4) yields
1(r)o(r)[
b
o
=

b
o
1
t
(r)o(r)dr +

b
o
1(r)o
t
(r)dr.
Now rearrange this to get the rule for integration by parts:

b
o
1
t
(r)o(r)dr = 1(r)o(r)[
b
o
÷

b
o
1(r)o
t
(r)dr. (12.5)
Now you know the secret – it’s just the product rule for derivatives.
12.2.1 Application: Choice between lotteries
Here is my favorite use of integration by parts. You may not understand the
economics yet, but you will someday. Suppose that an individual is choosing
between two lotteries. Lotteries are just probability distributions, and the
individual’s objective function is

b
o
n(r)1
t
(r)dr (12.6)
where c is the lowest possible payo¤ from the lottery, / is the highest possible
payo¤, n is a utility function de…ned over amounts of money, and 1
t
(r) is
the density function corresponding to the probability distribution function
1(r). (I use the derivative notation 1
t
(r) instead of the density notation
1(r) to make the use of integration by parts more transparent.)
The individual can choose between lottery 1(r) and lottery G(r), and
we would like to …nd properties of 1(r) and G(r) that guarantee that the
individual likes 1(r) better. How can we do this? We must start with what
we know about the individual. The individual likes money, and n(r) is the
utility of money. If she likes money, n(r) must be nondecreasing, so
n
t
(r) _ 0.
CHAPTER 12. INTEGRATION 158
And that’s all we know about the individual.
Now look back at expression (12.6). It is the integral of the product of
n(r) and 1
t
(r). But the only thing we know about the individual is that
n
t
(r) _ 0, and expression (12.6) does not have n
t
(r) in it. So let’s integrate
by parts.

b
o
n(r)1
t
(r)dr = n(r)1(r)[
b
o
÷

b
o
n
t
(r)1(r)dr.
To simplify this we need to know a little more about probability distribution
functions. Since c is the lowest possible payo¤ from the lottery, the distrib-
ution function must satisfy 1(c) = 0 (this comes from Theorem 19). Since /
is the highest possible payo¤ from the lottery, the distribution function must
satisfy 1(/) = 1. So, the above expression reduces to

b
o
n(r)1
t
(r)dr = n(/) ÷

b
o
n
t
(r)1(r)dr. (12.7)
We can go through the same analysis for the other lottery, G(r), and …nd

b
o
n(r)G
t
(r)dr = n(/) ÷

b
o
n
t
(r)G(r)dr. (12.8)
The individual chooses the lottery 1(r) over the lottery G(r) if

b
o
n(r)1
t
(r)dr _

b
o
n(r)G
t
(r)dr.
that is, if the lottery 1(r) generates a higher value of the objective function
than the lottery G(r) does. Or written di¤erently, she chooses 1(r) over
G(r) if

b
o
n(r)1
t
(r)dr ÷

b
o
n(r)G
t
(r)dr _ 0.
CHAPTER 12. INTEGRATION 159
Subtracting (12.8) from (12.7) yields

b
o
n(r)1
t
(r)dr ÷

b
o
n(r)G
t
(r)dr
=
¸
n(/) ÷

b
o
n
t
(r)1(r)dr

÷
¸
n(/) ÷

b
o
n
t
(r)G(r)dr

=

b
o
n
t
(r)G(r)dr ÷

b
o
n
t
(r)1(r)dr
=

b
o
n
t
(r) [G(r) ÷1(r)] dr.
The di¤erence depends on something we know about: n
t
(r), which we know
is nonnegative. The individual chooses 1(r) over G(r) if the above expres-
sion is nonnegative, that is, if

b
o
n
t
(r) [G(r) ÷1(r)] dr _ 0.
We knowthat n
t
(r) _ 0. We can guarantee that the product n
t
(r) [G(r) ÷1(r)]
is nonnegative if the other term, [G(r) ÷1(r)], is also nonnegative. So, we
are certain that she will choose 1(r) over G(r) if
G(r) ÷1(r) _ 0 for all r.
This turns out to be the answer to our question. Any individual who
likes money will prefer lottery 1(r) to lottery G(r) if G(r) ÷1(r) _ 0 for all
r. There is even a name for this condition – …rst-order stochastic dominance.
But the goal here was not to teach you about choice over lotteries. The goal
was to show the usefulness of integration by parts. So let’s look back and
see exactly what it did for us. The objective function was an integral of the
product of two terms, n(r) and 1
t
(r). We could not assume anything about
n(r), but we could assume something about n
t
(r). So we used integration by
parts to get an expression that involved the term we knew something about.
And that is its beauty.
12.3 Di¤erentiating integrals
As you have probably noticed, in economics we di¤erentiate a lot. Some-
times, though, the objective function has an integral, as with expression
CHAPTER 12. INTEGRATION 160
(12.6) above. Often we want to di¤erentiate an objective function to …nd an
optimum, and when the objective function has an integral we need to know
how to di¤erentiate it. There is a rule for doing so, called Leibniz’s rule,
named after the 17th-century German mathematician who was one of the
two independent inventors of calculus (along with Newton).
We want to …nd
d
dt

b(t)
o(t)
1(r. t)dr.
Note that we are di¤erentiating with respect to t, and we are integrating
with respect to r. Nevertheless, t shows up three times in the expression,
once in the upper limit of the integral, /(t), once in the lower limit of the
integral, c(t), and once in the integrand, 1(r. t). We need to …gure out what
to do with these three terms.
Apicture helps. Look at Figure 12.1. The integral is the area underneath
the curve 1(r. t) between the endpoints c(t) and /(t). Three things happen
when t changes. First, the function 1(r. t) shifts, and the graph shows
an upward shift, which makes the integral larger because the area under a
higher curve is larger. Second, the right endpoint /(t) changes, and the graph
shows it getting larger. This again increases the integral because now we
are integrating over a larger interval. Third, the left endpoint c(t) changes,
and again the graph shows it getting larger. This time, though, it makes
the integral smaller because moving the left endpoint rightward shrinks the
interval over which we are integrating.
Leibniz’s rule accounts for all three of these shifts. Leibniz’s rule says
d
dt

b(t)
o(t)
1(r. t)dr =

b(t)
o(t)
·1(r. t)
·t
dr +/
t
(t)1(/(t). t) ÷c
t
(t)1(c(t). t).
Each of the three terms corresponds to one of the shifts in Figure 12.1. The
…rst term accounts for the upward shift of the curve 1(r. t). The term
·1(r. t)·t tells how far upward the curve shifts at point r, and the integral

b(t)
o(t)
·1(r. t)
·t
dr
tells how much the area changes because of the upward shift in 1(r. t).
The second term accounts for the movement in the right endpoint, /(t).
Using the graph, the amount added to the integral is the area of a rectangle
CHAPTER 12. INTEGRATION 161
b(t)
x
1
a(t)
x
1
f(x,t)
x
f(x,t)
Figure 12.1: Leibniz’s rule
that has height 1(/(t). t), that is, 1(r. t) evaluated at r = /(t), and width
/
t
(t), which accounts for how far /(t) moves when t changes. Since area is
just length times width, we get /
t
(t)1(/(t). t), which is exactly the second
term.
The third term accounts for the movement in the left endpoint, c(t).
Using the graph again, the change in the integral is the area of a rectangle
that has height 1(c(t). t) and width c
t
(t). This time, though, if c(t) increases
we are reducing the size of the integral, so we must subtract the area of the
rectangle. Consequently, the third term is ÷c
t
(t)1(c(t). t).
Putting these three terms together gives us Leibniz’s rule, which looks
complicated but hopefully makes sense.
12.3.1 Application: Second-price auctions
A simple application of Leibniz’s rule comes from auction theory. A …rst-
price sealed bid auction has bidders submit bids simultaneously to an auc-
tioneer who awards the item to the highest bidder who then pays his bid.
This is a very common auction form. A second-price sealed bid auction has
bidders submit bids simultaneously to an auctioneer who awards the item to
the highest bidder, just like before, but this time the winning bidder pays
the second-highest price.
To model the second-price auction, suppose that there are : bidders and
CHAPTER 12. INTEGRATION 162
that bidder i values the item being auctioned at ·
j
, which is independent
of how much everyone else values the item. Bidders do not know their
opponents’ valuations, but they do know the probability distribution of the
opponents’ valuations. Bidder i must choose his bid /
j
.
Let 1
j
(/) be the probability that the highest other bid faced by i, that is,
the highest bid except for /
j
, is no larger than /. Then 1
j
(/) is a probability
distribution function, and its density function is 1
j
(/). Bidder i’s expected
payo¤ is
\
j
(/
j
) =

b
i
0

j
÷/)1
j
(/)d/.
Let’s interpret this function. Bidder i wins if his is the highest bid, which
occurs if the highest other bid is between 0 (the lowest possible bid) and his
own bid /
j
. If the highest other bid is above /
j
bidder i loses and gets a
payo¤ of zero. This is why the integral is taken over the interval [0. /
j
]. If
bidder i wins he pays the highest other bid /, which is distributed according
to the density function 1
j
(/). His surplus if he wins is ·
j
÷/, his value minus
how much he pays.
Bidder i chooses the bid /
j
to maximize his expected payo¤ \
j
(/
j
). Since
this is a maximization problem we should …nd the …rst-order condition:
\
t
j
(/
j
) =
d
d/
j

b
i
0

j
÷/)1
j
(/)d/ = 0.
Notice that we are di¤erentiating with respect to /
j
, which shows up only as
the upper endpoint of the integral. Using Leibniz’s rule we can evaluate this
…rst-order condition:
0 =
d
d/
j

b
i
0

j
÷/)1
j
(/)d/
=

b
i
0
·
·/
j
[(·
j
÷/)1
j
(/)] d/ +
d/
j
d/
j

j
÷/
j
)1
j
(/
j
) ÷
d0
d/
j

j
÷0)1
j
(0).
The …rst term is zero because (·
j
÷ /)1
j
(/) is not a function of /
j
, and so
the partial derivative is zero. The second term reduces to (·
j
÷ /
j
)1
j
(/
j
)
because d/
j
d/
j
is simply one. The third term is zero because the derivative
d0d/
j
= 0. This leaves us with the …rst-order condition
0 = (·
j
÷/
j
)1
j
(/
j
).
CHAPTER 12. INTEGRATION 163
Since density functions take on only nonnegative values, the …rst-order con-
dition holds when ·
j
÷ /
j
= 0, or /
j
= ·
j
. In a second-price auction the
bidder should bid his value.
This result makes sense intuitively. Let /
j
be bidder i’s bid, and let /
denote the highest other bid. Suppose …rst that bidder i bids more than his
value, so that /
j
·
j
. If the highest other bid is in between these, so that
·
j
< / < /
j
, bidder i wins the auction but pays /÷·
j
more than his valuation.
He could have avoided this by bidding his valuation, ·
j
. Now suppose that
bidder i bids less than his value, so that /
j
< ·
j
. If the highest other bid is
between these two, so that /
j
< / < ·
j
, bidder i loses the auction and gets
nothing. But if he had bid his value he would have won the auction and
paid / < ·
j
, and so he would have been better o¤. Thus, the best thing for
him to do is bid his value.
12.4 Problems
1. Suppose that 1(r) is the density function for a random variable dis-
tributed uniformly over the interval [2. 8].
(a) Compute

8
2
r1(r)dr
(b) Compute

8
2
r
2
1(r)dr
2. Compute the following derivative:
d
dt

t
2
÷t
2
tr
2
dr
3. Find the following derivative:
d
dt

4t
2
÷3t
t
2
r
3
dr
4. Let l(c. /) denote the uniformdistribution over the interval [c. /]. Find
conditions on c and / that guarantee that l(c. /) …rst-order stochasti-
cally dominates l(0. 1).
CHAPTER
13
Moments
13.1 Mathematical expectation
Let ~ r be a random variable with density function 1(r) and let n(r) be a
real-valued function. The expected value of n(~ r) is denoted 1[n(~ r)] and
it is found by the following rules. If ~ r is discrete taking on value r
j
with
probability 1(r
j
) then
1[n(~ r)] =
¸
j
n(r
j
)1(r
j
).
If ~ r is continuous the expected value of n(~ r) is given by
1[n(~ r)] =

o
÷o
n(r)1(r)dr.
Since integrals are for adding, as we learned in the last chapter, these formulas
really do make sense and go together.
164
CHAPTER 13. MOMENTS 165
The expectation operator 1[] is linear, which means that
1[cn(~ r)] = c1[n(~ r)]
1[n(~ r) +·(~ r)] = 1[n(~ r)] +1[·(~ r)]
13.2 The mean
The mean of a random variable ~ r is j = 1[~ r], that is, it is the expected
value of the function n(r) = r.
Consider the discrete distribution with outcomes (4. 10. 12. 20) and cor-
responding probabilities (0.1. 0.2. 0.3. 0.4). The mean is
1[~ r] = (4)(0.1) + (10)(0.2) + (12)(0.3) + (20)(0.4) = 14
13.2.1 Uniform distribution
The mean of the uniform distribution over the interval [c. /] is (c +/)2. If
you don’t believe me, draw it. To …gure it out from the formula, compute
1[~ r] =

b
o
r
1
/ ÷c
dr
=
1
/ ÷c

b
o
rdr
=
1
/ ÷c

1
2
r
2

b
o
=
1
/ ÷c

/
2
÷c
2
2
=
/ +c
2
.
13.2.2 Normal distribution
The mean of the general normal distribution is the parameter j. Recall that
the normal density function is
1(r) =
1
o

2:
c
÷(a÷j)
2
/2o
2
.
CHAPTER 13. MOMENTS 166
The mean is
1[~ r] =

r
o

2:
c
÷(a÷j)
2
/2o
2
dr.
Use the change-of-variables formula n =
a÷j
o
so that r = j+on, (r÷j)
2
o
2
=
n
2
, and dr = odn. Then we can rewrite
1[~ r] =

r
o

2:
c
÷(a÷j)
2
/2o
2
dr
=

o
÷o
j +on
o

2:
c
÷&
2
/2
odn
= j

o
÷o
1

2:
c
÷&
2
/2
dn +
o

2:

o
÷o
nc
÷&
2
/2
dn.
The …rst integral is the integral of the standard normal density, and like all
densities its integral is 1. The second integral can be split into two parts:

o
÷o
nc
÷&
2
/2
dn =

0
÷o
nc
÷&
2
/2
dn +

o
0
nc
÷&
2
/2
dn.
Use the change of variables n = ÷. in the …rst integral on the right-hand
side. Then n
2
= .
2
and dn = ÷d., so

0
÷o
nc
÷&
2
/2
dn = ÷

o
0
.c
÷:
2
/2
d.
Plugging this back into the expression above it yields

o
÷o
nc
÷&
2
/2
dn = ÷

o
0
.c
÷:
2
/2
d. +

o
0
nc
÷&
2
/2
dn.
But both integrals on the right-hand side are the same, so the expression is
zero. Thus, we get 1[~ r] = j.
13.3 Variance
The variance of the random variable ~ r is 1[(~ r÷j)
2
], where j = 1[~ r] is the
mean of the random variable. The variance is denoted o
2
. Note that
1[(~ r ÷j)
2
] = 1[~ r
2
÷2j~ r +j
2
]
= 1[~ r
2
] ÷2j1[~ r] +j
2
= 1[~ r
2
] ÷2j
2
+j
2
= 1[~ r
2
] ÷j
2
.
CHAPTER 13. MOMENTS 167
We can …nd the variance of the discrete random variable used in the
preceding section. The outcomes were (4. 10. 12. 20) and the corresponding
probabilities were (0.1. 0.2. 0.3. 0.4). The mean was 14. The variance is
1[(~ r ÷j)
2
] = (0.1)(4 ÷14)
2
+ (0.2)(10 ÷14)
2
+
(0.3)(12 ÷14)
2
+ (0.4)(20 ÷14)
2
= (0.1)(100) + (0.2)(16) + (0.3)(4) + (0.4)(36)
= 28.8
We can also …nd it using the alternative formula:
1[~ r
2
] ÷j
2
= (0.1)(4
2
) + (0.2)(10
2
) + (0.3)(12)
2
+ (0.4)(20
2
) ÷14
2
.
You should be able to show that
\ c:(c~ r) = c
2
\ c:(~ r).
The standard deviation of the random variable ~ r is

1[(~ r ÷j)
2
],
which means that the standard deviation is simply o. It is the square root
of the variance.
13.3.1 Uniform distribution
The variance of the uniform distribution can be found from
1[~ r
2
] =

b
o
r
2
/ ÷c
dr
=
1
/ ÷c

1
3
r
3

b
o
=
1
/ ÷c

/
3
÷c
3
3
CHAPTER 13. MOMENTS 168
and note that /
3
÷c
3
= (/ ÷c)(/
2
+c/ +c
2
). Consequently,
o
2
= 1[~ r
2
] ÷j
2
=
/
2
+c/ +c
2
3
÷
(/ +c)
2
4
=
4/
2
+ 4c/ + 4c
2
÷3/
2
÷6c/ ÷3c
2
12
=
/
2
÷2c/ +c
2
12
=
(/ ÷c)
2
12
.
13.3.2 Normal distribution
The variance of the standard normal distribution is 1. Let’s take that on
faith. The variance of the general normal distribution had better be the
parameter o
2
. To make sure, compute
1[(~ r ÷j)
2
] =

o
÷o
(r ÷j)
2
o

2:
c
÷(a÷j)
2
/2o
2
dr
Using the same change-of-variables trick as before, we get
1[(~ r ÷j)
2
] =

o
÷o
(r ÷j)
2
o

2:
c
÷(a÷j)
2
/2o
2
dr
=

o
÷o
(j +on ÷j)
2
o

2:
c
÷&
2
/2
odn
=

o
÷o
on
2

2:
c
÷&
2
/2
dn
= o
2

o
÷o
n
2

2:
c
÷&
2
/2
dn.
The integral is the variance of the standard normal, which we already said
was 1.
13.4 Application: Order statistics
Suppose that you make : independent draws from the random variable ~ r
with distribution function 1(r) and density 1(r). The value of the highest
CHAPTER 13. MOMENTS 169
of them is a random variable, the value of the second highest is a random
variable, and so on, for : random variables. The :-th order statistic is
the expected value of the :-th highest draw. So, the …rst order statistic is
the expected value of the highest of the : draws, the second order statistic
is the expected value of the second highest of the : draws, and so on.
We use order statistics in a variety of settings, but the most straightfor-
ward one is auctions. Think about a …rst-price sealed bid auction in which
: bidders submit their bids simultaneously and then the highest bidder wins
and pays her bid. The seller’s expected revenue, then, is the expected value
of the highest of the : bids, which is the …rst order statistic. Now think
about the second-price sealed-bid auction. In this auction : bidders sub-
mit their bids simultaneously, the highest bid wins, and the winner pays the
second-highest bid. The seller’s expected revenue in this auction is the ex-
pected value of the second-highest of the : bids, which is the second order
statistic.
As a side note, order statistics have also played a role in cosmology,
the study of the cosmos, and in particular they were used by Edwin Hubble.
Hubble was clearly an overachiever. In set the Illinois state high school record
for high jump. He was a Rhodes scholar. He was the …rst astronomer to use
the giant 200-inch Hale telescope at Mount Palomar. He was honored with a
41 cent postage stamp. He has the Hubble Space Telescope named after him.
Importantly for this story, though, he established that the universe extends
beyond our galaxy, the Milky Way. This was a problem because we know
that stars that are farther away are dimmer, but not all stars have the same
brightness. So, we can’t tell whether a particular star is dim because it’s far
away or because it’s just not very bright (econometricians would call this an
identi…cation problem). Astronomers before Hubble made the heroic (that
is, unreasonable) assumption that all starts were the same brightness and
worked from there. Hubble used the milder assumption that the brightest
star in every nebula (or galaxy, but they didn’t know the di¤erence at the
time) is equally bright. In other words, he assumed that the …rst order
statistic is the same for every nebula.
We want to …nd the order statistics, and, in particular, the …rst and
second order statistics. To do this we have to …nd some distributions. Think
about the …rst order statistic. It is the expected value of the highest of
the : draws, and the highest of the : draws is a random variable with a
distribution. But what is the distribution? We must construct it from the
underlying distribution 1.
CHAPTER 13. MOMENTS 170
Let G
(1)
(r) denote the distribution for the highest of the : values drawn
independently from 1(r). We want to derive G
(1)
(r). Remember that
G
(1)
(r) is the probability that the highest draw is less than or equal to r.
For the highest draw to be less than or equal to r, it must be the case that
every draw is less than or equal to r. When : = 1 the probability that the
one draw is less than or equal to r is 1(r). When : = 2 the probability
that both draws are less than or equal to r is (1(r))
2
. And so on. When
there are : draws the probability that all of them are less than or equal to r
is (1(r))
a
, and so
G
(1)
(r) = 1
a
(r).
From this we can get the density function by di¤erentiating G
(1)
with respect
to r:
o
(1)
(r) = :1
a÷1
(r)1(r).
Note the use of the chain rule.
This makes it possible to compute the …rst order statistic, since we know
the distribution and density functions for the highest of : draws. We just
take the expected value in the usual way:
:
(1)
=

r:1
a÷1
(r)1(r)dr.
Example 12 Uniform distribution over (0. 1). We have 1(r) = r on [0. 1],
and 1(r) = 1 on [0. 1]. The …rst order statistic is
:
(1)
=

r:1
a÷1
(r)1(r)dr
=

1
0
r:r
a÷1
dr
= :

1
0
r
a
dr
= :
r
a+1
: + 1

1
0
=
:
: + 1
.
This answer makes some sense. If : = 1 the …rst order statistic is just the
mean, which is 12. If : = 2 and the distribution is uniform so that the
CHAPTER 13. MOMENTS 171
draws are expected to be evenly spaced, then the highest draw should be about
23 and the lowest should be about 13. If : = 3 the highest draw should be
We also care about the second order statistic. To …nd it we follow the
same steps, beginning with identifying the distribution of the second-highest
draw. To make this exercise precise, we are looking for the probability that
the second-highest draw is no greater than some number, call it n. There
are a total of :+1 ways that we can get the second-highest draw to be below
n, and they are listed below:
Event Probability
Draw 1 is above n and the rest are below n (1 ÷1(n))1
a÷1
(n)
Draw 2 is above n and the rest are below n (1 ÷1(n))1
a÷1
(n)
.
.
.
.
.
.
Draw : is above n and the rest are below n (1 ÷1(n))1
a÷1
(n)
All the draws are below n 1
a
(n)
Let’s …gure out these probabilities one at a time. Regarding the …rst line,
the probability that draws 2 through : are below n is the probability of
getting :÷1 draws below n, which is 1
a÷1
(n). The probability that draw 1
is above n is 1 ÷1(n). Multiplying these together yields the probability of
getting draw 1 above n and the rest below. The probability of getting draw
2 above n and the rest below is the same, and so on for the …rst : rows of
the table. In the last row all of the draws are below n, in which case both
the highest and the second highest draws are below n. The probability of
all : draws being below n is just 1
a
(n), the same as when we looked at the
…rst order statistic. Summing the probabilities yields the distribution of the
second-highest draw:
G
(2)
(n) = :(1 ÷1(n))1
a÷1
(n) +1
a
(n).
Multiplying this out and simplifying yields
G
(2)
(n) = :1
a÷1
(n) ÷(: ÷1)1
a
(n).
The density function is found by di¤erentiating G
(2)
with respect to n:
o
(2)
(n) = :(: ÷1)1
a÷2
(n)1(n) ÷:(: ÷1)1
a÷1
(n)1(n).
CHAPTER 13. MOMENTS 172
It can be rearranged to get
o
(2)
(n) = :(: ÷1)(1 ÷1(n))1
a÷2
(n)1(n).
The second order statistic is the expected value of the second-highest draw,
which is
:
(2)
=

no
(2)
(n)dn
=

n:(: ÷1)(1 ÷1(n))1
a÷2
(n)1(n)dn.
Example 13 Uniform distribution over [0. 1].
:
(2)
=

n:(: ÷1)(1 ÷1(n))1
a÷2
(n)1(n)dn
=

1
0
n:(: ÷1)(1 ÷n)n
a÷2
dn
= :(: ÷1)

1
0

n
a÷1
÷n
a

dn
= :(: ÷1)
n
a
:

1
0
÷ :(: ÷1)
n
a+1
: + 1

1
0
= (: ÷1) ÷
:(: ÷1)
: + 1
=
: ÷1
: + 1
.
If there are four draws uniformly dispersed between 0 and 1, the highest draw
is expected to be at 34, the second highest at 24, and the lowest at 14.
If there are …ve draws, the highest is expected to be at 45 and the second
highest is expected to be at 35, and so on.
CHAPTER 13. MOMENTS 173
13.5 Problems
1. Suppose that the random variable ~ r takes on the following values with
the corresponding probabilities:
Value Probability
7 .10
4 .23
2 .40
-2 .15
-6 .10
-14 .02
(a) Compute the mean.
(b) Compute the variance.
2. The following table shows the probabilities for two random variable,
one with density function 1(r), and one with density function o(r).
r 1(r) o(r)
10 0.15 0.20
15 0.5 0.30
20 0.05 0.1
30 0.1 0.1
100 0.2 0.3
(a) Compute the means of the two variables.
(b) Compute the variances of the two variables.
(c) Compute the standard deviations of the two variables.
3. Consider the triangular density given by 1(r) = 2r on the interval
[0. 1].
(a) Find its distribution function 1.
(b) Verify that it is a distribution function, that is, and speci…cally for
this case, that 1 is increasing, 1(0) = 0, and 1(1) = 1.
(c) Find the mean.
CHAPTER 13. MOMENTS 174
(d) Find the variance.
4. Consider the triangular density given by 1(r) =
1
8
r on the interval
[0. 4].
(a) Find its distribution function 1.
(b) Verify that it satis…es the properties of a distribution function,
that is, 1(0) = 0, 1(4) = 1, and 1 increasing.
(c) Find the mean.
(d) Find the variance.
5. Show that if the variance of ~ r is o
2
then the variance of c~ r is c
2
o
2
,
where c is a scalar.
6. Show that if the variance of ~ r is o
2
a
and if ~ n = 3~ r÷1, then the variance
of ~ n is 9o
2
a
.
7. Suppose that the random variable ~ r takes the value 6 with probability
1
2
and takes the value n with probability
1
2
. Find the derivative do
2
dn,
where o
2
is the variance of ~ r.
8. Let G
(1)
and G
(2)
be the distribution functions for the highest and
second highest draws, respectively. Show that G
(1)
…rst-order stochas-
tically dominates G
(2)
.
CHAPTER
14
Multivariate distributions
Multivariate distributions arise when there are multiple random variables.
For example, what we normally refer to as "the weather" is comprised of
several random variables: temperature, humidity, rainfall, etc. A multivari-
ate distribution function is de…ned over a vector of random variables. A
bivariate distribution function is de…ned over two random variables. In this
chapter I restrict attention to bivariate distributions. Everything can be
extended to multivariate distributions by adding more random variables.
14.1 Bivariate distributions
Let ~ r and ~ n be two random variables. The distribution function 1(r. n) is
given by
1(r. n) = 1(~ r _ r and ~ n _ n).
It is called the joint distribution function. The function 1(r. ·) is the
probability that ~ r _ r and ~ n _ ·. The latter is sure to hold, and so
1(r. ·) is the univariate distribution function for the random variable ~ r.
175
CHAPTER 14. MULTIVARIATE DISTRIBUTIONS 176
Similarly, the function 1(·. n) is the univariate distribution function for the
random variable ~ n.
The density function depends on whether the random variables are con-
tinuous or discrete. If they are both discrete then the density is given by
1(r. n) = 1(~ r = r and ~ n = n). If they are both continuous the density is
given by
1(r. n) =
·
2
·r·n
1(r. n).
This means that the distribution function can be recovered from the density
using the formula
1(r. n) =

&
÷o

a
÷o
1(:. t)d:dt.
14.2 Marginal and conditional densities
Consider the following example with two random variables:
~ n = 1 ~ n = 2
~ r = 1 0.1 0.3
~ r = 2 0.2 0.1
~ r = 3 0.1 0.2
The random variable ~ r can take on three possible values, and the random
variable ~ n can take on two possible values. The probabilities in the table
are the values of the joint density function 1(r. n).
Now add a total row and a total column to the table:
~ n = 1 ~ n = 2 1
~ a
(r)
~ r = 1 0.1 0.3 0.4
~ r = 2 0.2 0.1 0.3
~ r = 3 0.1 0.2 0.3
1
~ &
(n) 0.4 0.6 1
CHAPTER 14. MULTIVARIATE DISTRIBUTIONS 177
The sum of the …rst column is the total probability that ~ n = 1, and the
sum of the second column is the total probability that ~ n = 2. These are
the marginal densities. For a discrete bivariate random variable (~ r. ~ n) we
de…ne the marginal density of ~ r by
1
~ a
(r) =
a
¸
j=1
1(r. n
j
)
where the possible values of ~ n are n
1
. .... n
a
. For the continuous bivariate
random variable we de…ne the marginal density of ~ r by
1
~ a
(r) =

o
÷o
1(r. n)dn.
From this we can recover the marginal distribution function of ~ r by integrat-
ing with respect to r:
1
~ a
(r) =

a
÷o
1
~ a
(t)dt =

a
÷o

o
÷o
1(t. n)dndt.
We have already discussed conditional probabilities. We would like to
have conditional densities. From the table above, it is apparent that the
conditional density of ~ r given the realization of ~ n is 1(r[n) = 1(r. n)1(n).
To see that this is true, look for the probability that ~ r = 3 given ~ n = 2.
The probability that ~ n = 2 is 1
~ &
(2) = 0.6. The probability that ~ r = 3 and
~ n = 2 is 1(3. 2) = 0.2. The conditional probability is 1(~ r = 3[~ n = 2) =
1(3. 2)1
~ &
(2) = 0.20.6 = 13. So, the rule is just what we would expect in
the discrete case.
What about the continuous case? The same formula works:
1(r[n) =
1(r. n)
1
~ &
(n)
.
So, it doesn’t matter in this case whether the random variables are discrete
or continuous for us to …gure out what to do. Both of these formulas require
conditioning on a single realization of ~ n. It is possible, though, to de…ne
the conditional density much more generally. Let ¹ be an event, and let 1
be the probability measure over events. Then we can write the conditional
density
1(r[¹) =
1(r. ¹)
1(¹)
CHAPTER 14. MULTIVARIATE DISTRIBUTIONS 178
where 1(r. ¹) denotes the probability that both r and ¹ occur, written as a
density function. For example, if ¹ = ¦r : ~ r _ r
0
¦, so that ¹ is the event
that the realization of ~ r is no greater than r
0
, we know that 1(¹) = 1(r
0
).
Therefore
1(r[~ r _ r
0
) =

;(a)
1(a
0
)
if r _ r
0
0 if r r
0
At this point we have too many functions ‡oating around. Here is a
table to help with notation and terminology.
Function Notation Formula
Density 1(r. n) or 1
~ a.~ &
(r. n)
Distribution 1(r. n) or 1
~ a.~ &
(r. n) Discrete:
¸
~ &<&
¸
~ a<a
1(r. n)
Continuous:

&
÷o

a
÷o
1(:. t)d:dt
Univariate dist. 1(r) 1(r. ·)
Marginal density 1
~ a
(r)
Discrete:
¸
&
1(r. n)
Continuous:

o
÷o
1(r. n)dn
Conditional density 1(r[n) or 1
~ a[&
(r[n) 1(r. n)1(n)
The randomvariables ~ r and ~ n are independent if 1
~ a.~ &
(r. n) = 1
~ a
(r)1
~ &
(n).
In other words, the random variables are independent if the bivariate density
is the product of the marginal densities. Independence implies that 1(r[n) =
1
~ a
(r) and 1(n[r) = 1
~ &
(n), so that the conditional densities and the marginal
densities coincide.
14.3 Expectations
Suppose we have a bivariate random variable (~ r. ~ n). Let n(r. n) be a real-
valued function, in which case n(~ r. ~ n) is a univariate random variable. Then
the expected value of n(~ r. ~ n) is
1[n(~ r. ~ n)] =
¸
&
¸
a
n(r. n)1(r. n)
CHAPTER 14. MULTIVARIATE DISTRIBUTIONS 179
in the discrete case and
1[n(~ r. ~ n)] =

o
÷o

o
÷o
n(r. n)1(r. n)drdn
in the continuous case.
It is still possible to compute the means of the random variables ~ r and ~ n
separately. We can do this using the marginal densities. So, for example,
in the table above the mean of ~ n is (0.4)(1) + (0.6)(2) = 1.6.
A particularly important case is where n(r. n) = (r ÷j
a
)(n ÷j
&
), where
j
a
is the mean of ~ r and j
&
is the mean of ~ n. The resulting expectation is
called the covariance of ~ r and ~ n, and it is denoted
o
a&
= (o·(~ r. ~ n) = 1[(~ r ÷j
a
)(~ n ÷j
&
)].
Note that o
aa
is just the variance of ~ r. Also, it is easy to show that o
a&
=
1[~ r~ n] ÷j
a
j
&
.
The quantity
j
a&
=
o
a&
o
a
o
&
is called the correlation coe¢cient between ~ r and ~ n. The following the-
orems apply to correlation coe¢cients.
Theorem 20 If ~ r and ~ n are independent then j
a&
= o
a&
= 0.
Proof. When the random variables are independent, 1(r. n) = 1
~ a
(r)1
~ &
(n).
Consequently we can write
o
a&
= 1[(~ r ÷j
a
)(~ n ÷j
&
)] = 1[~ r ÷j
a
] 1[~ n ÷j
&
)].
But each of the expectations on the right-hand side are zero, and the result
follows.
It is important to remember that the converse is not true: sometimes
two variables are not independent but still happen to have a zero covariance.
An example is given in the table below. One can compute that o
a&
= 0 but
note that 1(2. 6) = 0 while 1
~ a
(2) 1
~ &
(6) = (0.2)(0.4) = 0.
CHAPTER 14. MULTIVARIATE DISTRIBUTIONS 180
~ n = 6 ~ n = 8 ~ n = 10
~ r = 1 0.2 0 0.2
~ r = 2 0 0.2 0
~ r = 3 0.2 0 0.2
Theorem 21

j
a&

_ 1.
Proof. Consider the random variable ~ r ÷ t~ n, where t is a scalar. Because
variances cannot be negative, we have
0 _ o
2
a÷t&
= 1[~ r
2
÷2t~ r~ n +t
2
~ n
2
] ÷(j
2
a
÷2tj
a
j
&
+t
2
j
2
&
)
=

1[~ r
2
] ÷j
2
a

+t
2

1[~ n
2
] ÷j
2
&

÷2t

1[~ r~ n] ÷j
a
j
&

= o
2
a
+t
2
o
2
&
÷2to
a&
.
Since this is true for any scalar t, choose
t =
o
a&
o
2
&
.
Substituting gives us
0 _ o
2
a
+

o
a&
o
2
&

2
o
2
&
÷2
o
a&
o
2
&
o
a&
0 _ o
2
a
÷
o
2
a&
o
2
&
o
2
a&
o
2
a
o
2
&
_ 1

o
a&
o
a
o
&

_ 1.
The theorem says that the correlation coe¢cient is bounded between ÷1
and 1. If j
a&
= 1 it means that the two random variables are perfectly
correlated, and once you know the value of one of them you know the value
of the other. If j
a&
= ÷1 the random variables are perfectly negatively
CHAPTER 14. MULTIVARIATE DISTRIBUTIONS 181
correlated. This contains just as much information as perfect correlation.
If you know that ~ r has attained its highest possible value and ~ r and ~ n are
perfectly negatively correlated, then ~ n must have attained its lowest value.
Finally, if j
a&
= 0 the two variables are perfectly uncorrelated (and possibly
independent).
14.4 Conditional expectations
When there are two random variables, ~ r and ~ n, one might want to …nd the
expected value of ~ r given that ~ n has attained a particular value or set of
values. This would be the conditional mean. We can use the above table
for an example. What is the expected value of ~ r given that ~ n = 8? ~ r can
only take one value when ~ n = 8, and that value is 2. So, the conditional
mean of ~ r given that ~ n = 8 is 2. The conditional mean of ~ r given that
~ n = 10 is also 2, but for di¤erent reasons this time.
To make this as general as possible, let n(r) be a function of r but not of
n. I will only consider the continuous case here; the discrete case is similar.
The conditional expectation of n(~ r) given that ~ n = n is given by
1[n(~ r)[~ n = n] =

o
÷o
n(r)1(r[n)dr =
1
1
~ &
(n)

o
÷o
n(r)1(r. n)dr.
Note that this expectation is a function of n but not a function of r. The
reason is that r is integrated out on the right-hand side, but n is still there.
14.4.1 Using conditional expectations - calculating the
bene…t of search
Consider the following search process. A consumer, Max, wants to buy a
particular digital camera. He goes to a store and looks at the price. At
that point he has three choices: (i) but the camera at that store, (ii) go to
another store to check its price, or (iii) go back to a previous store and buy the
camera there. Stores draw their prices independently from the distribution
CHAPTER 14. MULTIVARIATE DISTRIBUTIONS 182
1(j) given by
j = 200 with probability 0.2
j = 190 with probability 0.3
j = 180 with probability 0.4
j = 170 with probability 0.1
We want to answer the following question: If the lowest price so far is ¡,
what is the expected bene…t from checking one more store?
Let’s begin by answering this in the most straightforward way possible.
Suppose that ¡ = 200, so that the lowest price found so far is the worst
possible price. If Max searches one more time there is a 10% chance of
…nding a price of \$170 and saving \$30, a 40% chance of …nding a price of
\$180 and saving \$20, a 30% chance of …ncing a price of \$190 and saving
only \$10, and a 20% chance of …nding another store that charges the highest
possible price of \$200, in which case the savings are zero. The expected
saving is (.1)(30) + (.4)(20) + (.3)(10) + (.2)(0) = 14. When ¡ = 200, the
expected bene…t of search is \$14.
Now suppose that ¡ = 190, so that the best price found so far is \$190.
Max has a 10% chance of …nding a price of \$170 and saving \$20, a 40%
chance of …nding a price of \$180 and saving \$10, a 30% chance of …nding
the same price and saving nothing, and a 20% chance of …nding a higher
price of \$200, in which case he also saves nothing. The expected saving is
(.1)(20) + (.4)(10) + (.3)(0) + (.2)(0) = 6. When the best price found so far
is ¡ = 190, the expected bene…t of search is \$6.
Finally, suppose that ¡ = 180. Now there is only one way to improve,
which comes by …nding a store that charges a price of \$170, leading to a \$10
saving. The probabiliyt of …nding such a store is 10%, and the expected
saving from search is \$1.
So now we know the answers, and let’s use these answers to …gure out
a general formula, speci…cally one involving conditional expectations. Note
that when Max …nds a price of j and the best price so far is ¡, his bene…t is
¡ ÷j if the new price j is lower than the old price ¡. Otherwise the bene…t
is zero because he would be better o¤ buying the item at a store he’s already
found. This "if" statement lends itself to a conditional expectation. In
particular, the "if" statement pertains to the conditional expectation 1[¡ ÷
~ j[~ j < ¡], where the expectation is taken over the random variable j. This
epxression tells us what the average bene…t is provided that the bene…t is
CHAPTER 14. MULTIVARIATE DISTRIBUTIONS 183
nonnegative. The actual expected bene…t is
Pr¦~ j < ¡¦1[¡ ÷ ~ j[~ j < ¡].
which is the probability that the bene…t is positive times the expected bene…t
conditional on the bene…t being positive.
Let’s make sure this works using the above example. In particular, let’s
look at ¡ = 190. The conditional expectation is
1[190 ÷ ~ j[~ j < 190] = (.4)(190 ÷180) + (.1)(190 ÷170)
= 6.
which is exactly what we found before.
The conditional expectation lets us work with more complicated distri-
butions. Suppose that prices are drawn independently from the uniform
distribution over the interval [150. 200]. Let the corresponding distribution
function be 1(j) and the density function be 1(j). The expected bene…t
from searching at another store when the lowest price so far is ¡ is
Pr¦~ j < ¡¦1[¡ ÷ ~ j[~ j < ¡] = 1(¡)

o
150
[¡ ÷j]
1(j)
1(¡)
dj
=

o
150
[¡ ÷j]1(j)dj.
To see why this works, look at the top line. The probability that ~ j < ¡
is simply 1(¡), because that is the de…nition of the distribution function.
That gives us the …rst term on the right-hand side. For the second term,
note that we are taking the expectation of ¡ ÷j, so that term is in brackets.
To …nd the conditional expectation, we multiply by the conditional density
which is the density of the random variable j divided by the probability that
the conditioning event (~ j < ¡) occurs. We take the integral over the interval
[150. ¡] because outside of this interval the value of the bene…t is zero. When
we multiply the two terms on the right-hand side of the top line together, we
…nd that the 1(¡) term cancels out, leaving us with the very simple bottom
line. Using it we can …nd the net bene…t of searching at one more store
when the best price so far is \$182.99:

o
150
[¡ ÷j]1(j)dj =

182.99
150
[182.99 ÷j]
1
50
dj = 10.883.
CHAPTER 14. MULTIVARIATE DISTRIBUTIONS 184
14.4.2 The Law of Iterated Expectations
There is an important result concerning conditional expectations. It is called
the Law of Iterated Expectations, and it goes like this.
1
&
[1
a
[n(~ r)[~ n = n]] = 1
a
[n(~ r)].
It’s a complicated statement, so let’s look at what it means. The inside of
the left-hand side is the conditional expectation of n(~ r) given that ~ n takes
some value n. As we have already learned, this is a function of n but not a
function of r. Let’s call it ·(n), and ·(~ n) is a random variable. So now let’s
take the expectation of ·(~ n). The Law of Iterated Expectations says that
1[·(~ n)] = 1
a
[n(~ r)].
Another way of looking at it is taking the expectation of a conditional
expectation. Doing that removes the conditional part.
The best thing to do here is to look at an example to see what’s going
on. Let’s use one of our previous examples:
~ n = 1 ~ n = 2 1
~ a
(r)
~ r = 1 0.1 0.3 0.4
~ r = 2 0.2 0.1 0.3
~ r = 3 0.1 0.2 0.3
1
~ &
(n) 0.4 0.6 1
Begin by …nding the conditional expectations 1[~ r[~ n = 1] and 1[~ r[~ n = 2].
We get 1[~ r[~ n = 1] = 2 and 1[~ r[~ n = 2] = 116. Now take the expectation
over n to get
1
&
[1
a
[n(~ r)[~ n = n]] = 1
~ &
(1) 1[~ r[~ n = 1] +1
~ &
(2) 1[~ r[~ n = 2]
= (0.4)(2) + (0.6)(116) = 1.9.
Now …nd the unconditional expectation of ~ r. It is
1
a
[~ r] = 1
~ a
(1) 1 +1
~ a
(2) 2 +1
~ a
(3) 3
= (0.4)(1) + (0.3)(2) + (0.3)(3)
= 1.9.
It works.
CHAPTER 14. MULTIVARIATE DISTRIBUTIONS 185
Now let’s look at it generally using the continuous case. Begin with
1
a
[n(~ r)] =

o
÷o
n(r)1
~ a
(r)dr
=

o
÷o
n(r)

o
÷o
1(r. n)dn

dr
=

o
÷o

o
÷o
n(r)1(r. n)dndr.
Note that 1(r. n) = 1(r[n)1
~ &
(n), so we can rewrite the above expression
1
a
[n(~ r)] =

o
÷o

o
÷o
n(r)1(r[n)1
~ &
(n)dndr
=

o
÷o
¸
o
÷o
n(r)1(r[n)dr

1
~ &
(n)dn
=

o
÷o
1
a
[n(~ r)[~ n = n]1
~ &
(n)dn
= 1
&
[1
a
[n(~ r)[~ n = n]].
14.5 Problems
1. There are two random variables, ~ r and ~ n, with joint density 1(r. n)
given by the following table.
1(r. n) ~ n = 10 ~ n = 20 ~ n = 30
~ r = 1 .04 0 .20
~ r = 2 .07 0 .18
~ r = 3 .02 .11 .07
~ r = 4 .01 .12 .18
(a) Construct a table showing the distribution function 1(r. n).
(b) Find the univariate distributions 1
~ a
(r) and 1
~ &
(n).
(c) Find the marginal densities 1
~ a
(r) and 1
~ &
(n).
(d) Find the conditional density 1(r[~ n = 20).
(e) Find the mean of ~ n.
(f) Find the mean of ~ r conditional on ~ n = 20.
CHAPTER 14. MULTIVARIATE DISTRIBUTIONS 186
(g) Are ~ r and ~ n independent?
(h) Verify that the Law of Iterated Expectations works.
2. There are two random variables, ~ r and ~ n, with joint density given by
the following table:
1(r. n) ~ n = 3 ~ n = 8 ~ n = 10
~ r = 1 0.03 0.02 0.20
~ r = 2 0.02 0.12 0.05
~ r = 3 0.05 0.01 0.21
~ r = 4 0.07 0.11 0.11
(a) Construct a table showing the distribution function 1(r. n).
(b) Find the univariate distributions 1
~ a
(r) and 1
~ &
(n).
(c) Find the marginal densities 1
~ a
(r) and 1
~ &
(n).
(d) Find the conditional density 1(n[~ r = 3).
(e) Find the means of ~ r and ~ n.
(f) Find the mean of ~ r conditional on ~ n = 3.
(g) Are ~ r and ~ n independent?
(h) Find \ c:(~ r) and \ c:(~ n).
(i) Find (o·(~ r. ~ n).
(j) Find the correlation coe¢cient between ~ r and ~ n.
(k) Verify the Lawof Iterated Expectations for …nding 1
a
[~ r] =.1
&
[1
a
[~ r[n]].
3. Let 1(r) be the uniform distribution over the interval [c. /], and sup-
pose that c ÷ (c. /). Show that 1(r[r _ c) is the uniform distribution
over [c. c].
4. Consider the table of probabilities below:
1(r. n) ~ n = 10 ~ n = 20
~ r = ÷1 0.1 c
~ r = +1 0.3 /
What values must c and / take for ~ r and ~ n to be independent?
CHAPTER
15
Statistics
15.1 Some de…nitions
The set of all of the elements about which some information is desired is called
the population. Examples might be the height of all people in Knoxville,
or the ACT scores of all students in the state, or the opinions about Congress
of all people in the US. Di¤erent members of the population have di¤erent
values for the variable, so we can treat the population variable as a random
variable ~ r. So far everything we have done in probability theory is about the
population random variable. In particular, its mean is j and its variance is
o
2
.
A random sample from a population random variable ~ r is a set of in-
dependent, identically distributed (IID) random variables ~ r
1
. ~ r
2
. .... ~ r
a
, each
of which has the same distribution as the parent random variable ~ r.
The reason for random sampling is that sometimes it is too costly to mea-
sure all of the elements of a population. Instead, we want to infer properties
of the entire population from the random sample. This is statistics.
Let r
1
. .... r
a
be the outcomes of the random sample. A statistic is a
187
CHAPTER 15. STATISTICS 188
function of the outcomes of the random sample which does not contain any
unknown parameters. Examples include the sample mean and the sample
variance.
15.2 Sample mean
The sample mean is given by
r =
r
1
+... +r
a
:
.
Note that we use di¤erent notation for the sample mean ( r) and the popu-
lation mean (j).
The expected value of the sample mean can be found as follows:
1[ r] =
1
:
a
¸
j=1
1[~ r
j
]
=
1
:
a
¸
j=1
j
=
1
:
(:j) = j.
So, the expected value of the sample mean is the population mean.
The variance of the sample mean can also be found. To do this, though,
let’s …gure out the variance of the sum of two independent random variables
~ r and ~ n.
Theorem 22 Suppose that ~ r and ~ n are independent. Then \ c:(~ r + ~ n) =
\ c:(~ r) +\ c:(~ n).
Proof. Note that
(r ÷j
a
+n ÷j
&
)
2
= (r ÷j
a
)
2
+ (n ÷j
&
)
2
+ 2(r ÷j
a
)(n ÷j
&
).
Take the expectations of both sides to get
\ c:(~ r + ~ n) = 1[(~ r ÷j
a
+ ~ n ÷j
&
)
2
]
= 1[(~ r ÷j
a
)
2
] +1[(~ n ÷j
&
)
2
] + 21[(~ r ÷j
a
)(~ n ÷j
&
)]
= \ c:(~ r) +\ c:(~ n) + 2(o·(~ r. ~ n).
CHAPTER 15. STATISTICS 189
But, as shown in Theorem 20, since ~ r and ~ n are independent, (o·(~ r. ~ n) = 0,
and the result follows.
We can use this theorem to …nd the variance of the sample mean. Since a
random sample is a set of IID random variables, the theorem applies. Also,
recall that \ c:(c~ r
j
) = c
2
\ c:(~ r
j
). So,
\ c:( r) = \ c:

1
:
a
¸
j=1
~ r
j

=
1
:
2
a
¸
j=1
\ c:(~ r
j
)
=
:o
2
:
2
=
o
2
:
.
This is a really useful result. It says that the variance of the sample mean
around the population mean shrinks as the sample size becomes larger. So,
bigger samples imply better …ts, which we all knew already but we didn’t
know why.
15.3 Sample variance
We are going to use two di¤erent pieces of notation here. One is
:
2
=
1
:
a
¸
j=1
(r
j
÷ r)
2
and the other is
:
2
=
1
: ÷1
a
¸
j=1
(r
j
÷ r)
2
Both of these can be interpreted as the estimates of variance, and many
scienti…c calculators compute both of them. What you should remember is
to use :
2
as the computed variance when the random sample coincides with
the entire population, and to use :
2
when the random sample is a subset of
the population. In other words, you will almost always use :
2
, and we refer
CHAPTER 15. STATISTICS 190
to :
2
as the sample variance. But, :
2
is useful for what we are about to
do.
We want to …nd the expected value of the sample variance :
2
. It is easier
to …nd the expected value of :
2
and note that
:
2
=
:
: ÷1
:
2
.
We get
1[:
2
] =
1
:
1
¸
a
¸
j=1
(~ r
j
÷ r)
2
¸
=
1
:
1
¸
a
¸
j=1
~ r
j
2
¸
÷1[ r
2
]
which follows from a previous manipulation of variance: \ c:(~ r) = 1[(~ r ÷
j)
2
] = 1[~ r
2
] ÷j
2
. Rearranging that formula and applying it to the sample
mean tells us that
1[ r
2
] = 1[ r]
2
+\ c:( r).
so
1[:
2
] =

1
:
a
¸
j=1
1[~ r
2
j
]

÷1[ r]
2
÷\ c:( r).
But we already know some of these values. We know that 1[ r] = j and
\ c:( r) = o
2
:. Finally, note that since ~ r
j
has mean j and variance o
2
, we
have
1[~ r
2
j
] = j
2
+o
2
.
Plugging this all in yields
1[:
2
] =

1
:
a
¸
j=1
(j
2
+o
2
)

÷j
2
÷
o
2
:
= (j
2
+o
2
) ÷j
2
÷
o
2
:
=
: ÷1
:
o
2
CHAPTER 15. STATISTICS 191
Now we can get the mean of the sample variance :
2
:
1[:
2
] = 1
¸
:
: ÷1
:
2

=
:
: ÷1
1[:
2
]
=
:
: ÷1

: ÷1
:
o
2
= o
2
.
The reason for using :
2
as the sample variance instead of :
2
is that :
2
has
the right expected value, that is, the expected value of the sample variance
is equal to the population variance.
We have found that 1[ r] = j and 1[:
2
] = o
2
. Both r and :
2
are statis-
tics, because they depend only on the observed values of the random sample
and they have no unknown parameters. They are also unbiased because
their expected values are equal to the population parameters. Unbiasedness
is an important and valuable property. Since we use random samples to
learn about the characteristics of the entire population, we want statistics
that match, in expectation, the parameters of the population distribution.
We want the sample mean to match the population mean in expectation, and
the sample variance to match the population variance in expectation.
Is there any intuition behind dividing by : ÷ 1 in the sample variance
instead of dividing by :? Here is how I think about it. The random
sample has : observations in it. We only need one observation to compute
a sample mean. It may not be a very good or precise estimate, but it is still
an estimate. Since we can use the …rst observation to compute a sample
mean, we can use all of the data to compute all of the data to compute
a sample mean. This may seem cryptic and obvious, but now think about
what we need in order to compute a sample variance. Before we can compute
the sample variance, we need to compute the sample mean, and we need at
least one observation to do this. That leaves us with : ÷1 observations to
compute the sample variance. The terminology used in statistics is degrees
of freedom. With : observations we have : degrees of freedom when we
compute the sample mean, but we only have : ÷1 degrees of freedom when
we compute the sample variance because one degree of freedom was used to
compute the sample mean. In both calculations (sample mean and sample
variance) we divide by the number of degrees of freedom, : for the sample
mean r and : ÷1 for the sample variance :
2
.
CHAPTER 15. STATISTICS 192
15.4 Convergence of random variables
In this section we look at what happens when the sample size becomes in…-
nitely large. The results are often referred to as asymptotic properties.
We have two main results, both concerning the sample mean. One is called
the Law of Large Numbers, and it says that as the sample size grows without
bound, the sample mean converges to the population mean. The second is
the Central Limit Theorem, and it says that the distribution of the sample
mean converges to a normal distribution, regardless of whether the popula-
tion is normally distributed or not.
15.4.1 Law of Large Numbers
Let r
a
be the sample mean from a sample of size :. The basic law of large
numbers is
r
a
÷j when : ÷·.
The only remaining issue is what that convergence arrow means.
The Weak Law of Large Numbers states that for any 0
lim
a÷o
1 ([ r
a
÷j[ < ) = 1.
To understand this, take any small positive number . What is the proba-
bility that the sample mean r
a
is within of the population mean? As the
sample size grows, the sample mean should get closer and closer to the pop-
ulation mean. And, if the sample mean truly converges to the population
mean, the probability that the sample mean is within of the population
mean should get closer and closer to 1. The Weak Law says that this is true
no matter how small is.
This type of convergence is called convergence in probability, and it
is written
r
a
1
÷j when : ÷·.
The Strong Law of Large Numbers states that
1

lim
a÷o
r
a
= j

= 1.
This one is a bit harder to understand. It says that the sample mean is almost
sure to converge to the population mean. In fact, this type of convergence
is called almost sure convergence and it is written
r
a
o.c.
÷ j when : ÷·.
CHAPTER 15. STATISTICS 193
15.4.2 Central Limit Theorem
This is the most important theorem in asymptotic theory, and it is the reason
why the normal distribution is so important to statistics.
Let `(j. o
2
) denote a normal distribution with mean j and variance o
2
.
Let (.) be the distribution function (or cdf) of the normal distribution
with mean 0 and variance 1 (or the standard normal `(0. 1)). To state it,
compute the standardized mean:
2
a
=
r
a
÷1[ r
a
]

\ c:( r
a
)
.
We know some of these values: 1[ r
a
] = j and \ c:( r
a
) = o
2
:. Thus we
get
2
a
=
r
a
÷j
o

:
.
The Central Limit Theorem states that if \ c:(~ r
j
) = o
2
< ·, that is,
if the population random variable has …nite variance, then
lim
a÷o
1(2
a
_ .) = (.).
In words, the distribution of the standardized sample mean converges to the
standard normal distribution. This kind of convergence is called conver-
gence in distribution.
15.5 Problems
1. You collect the following sample of size : = 12:
10. 4. ÷1. 3. ÷2. 8. 6. 8. 6. 1. ÷5. 10
Find the sample mean and sample variance.
CHAPTER
16
Sampling distributions
Remember that statistics, like the mean and the variance of a random vari-
able, are themselves random variables. So, they have probability distribu-
tions. We know from the Central Limit Theorem that the distribution of the
sample mean converges to the normal distribution as the sample size grows
without bound. The purpose of this chapter is to …nd the distributions for
the mean and other statistics when the sample size is …nite.
16.1 Chi-square distribution
The chi-square distribution turns out to be fundamental for doing statistics
because it is closely related to the normal distribution. Chi-square random
variables can only have nonnegative values. It turns out to be the distribu-
tion you get when you square a normally distributed random variable.
The density function for the chi-square distribution with : degrees of
freedom is
1(r) =
(r2)
n
2
÷1
c
÷
x
2
2(:2)
194
CHAPTER 16. SAMPLING DISTRIBUTIONS 195
0 1 2 3 4 5
0.0
0.2
0.4
0.6
0.8
1.0
1.2
x
y
Figure 16.1: Density for the chi-square distribution. The thick line has 1
degree of freedom, the thin line has 3, and the dashed line has 5.
where (c) is the Gamma function de…ned as (c) =

o
0
n
o÷1
c
÷&
dn for c 0.
We use the notation ~ n ~ .
2
a
to denote a chi-square random variable with :
degrees of freedom. The density for the chi-square distribution with di¤erent
degrees of freedom is shown in Figure 16.1. The thick line is the density
with 1 degree of freedom, the thin line has 3 degrees of freedom, and the
dashed line has 5. Changing the degrees of freedom radically changes the
shape of the density function.
One strange thing about the chi-square distribution is its mean and vari-
ance. The mean of a .
2
a
random variable is :, the number of degrees of
freedom, and the variance is 2:.
The relationship between the standard normal and the chi-square distri-
bution is given by the following theorem.
Theorem 23 If ~ r has the standard normal distribution, then the random
variable ~ r
2
has the chi-square distribution with 1 degree of freedom.
CHAPTER 16. SAMPLING DISTRIBUTIONS 196
Proof. The distribution function for the variable ~ n = ~ r
2
is
1
~ &
(n) = 1(~ n _ n)
= 1(~ r
2
_ n)
= 1(

n _ r _

n)
= 21(0 _ r _

n)
where the last equality follows from the fact that the standard normal dis-
tribution is symmetric around its mean of zero. From here we can compute
1
~ &
(n) = 2

&
0
1
~ a
(r)dr
where 1
~ a
(r) is the standard normal density. Using Leibniz’s’ rule, di¤eren-
tiate this with respect to n to get the density of ~ n:
1
~ &
(n) = 21
~ a
(

n)
1
2

n
where the last term is the derivative of

n with respect to n. Plug in the
formula for the normal density to get
1
~ &
(n) =
1

2:
c
÷&/2
n
÷1/2
=
(n2)
÷1/2
c
÷&/2
2

:
.
This looks exactly like the formula for the density of .
2
1
except for the de-
nominator. But, (12) =

:, and the formula is complete.
Both the normal distribution and the chi-square distribution have the
property that they are additive. That is, if ~ r and ~ n are independent normally
distributed random variables, then ~ . = ~ r+ ~ n is also normally distributed. If
~ r and ~ n are independent chi-square random variables with :
a
and :
&
degrees
of freedom, respectively, then ~ . = ~ r + ~ n is has a chi-square distribution with
:
a
+:
&
degrees of freedom.
16.2 Sampling from the normal distribution
We use the notation ~ r ~ `(j. o
2
) to denote a random variable that is distrib-
uted normally with mean j and variance o
2
. Similarly, we use the notation
CHAPTER 16. SAMPLING DISTRIBUTIONS 197
~ r ~ .
2
a
when ~ r has the chi-square distribution with : degrees of freedom.
The next theorem describes the distribution of the sample statistics of the
standard normal distribution.
Theorem 24 Let r
1
. .... r
a
be an IID random sample from the standard nor-
mal distribution `(0. 1). Then
(a) r ~ `(0.
1
a
).
(b)
¸
(r
j
÷ r)
2
~ .
2
a÷1
.
(c) The random variables r and
¸
(r
j
÷ r)
2
are statistically independent.
The above theorem relates to random sampling from a standard normal
distribution. If r
1
. .... r
a
are an IID random sample from the general normal
distribution `(j. o
2
), then
r ~ `(j.
o
2
:
).
Also,
¸
(r
j
÷ r)
2
o
2
~ .
2
a÷1
.
To …gure out the distribution of the sample variance, …rst note that since the
sample variance is
:
2
=
¸
(r
j
÷ r)
2
: ÷1
and 1[:
2
] = o
2
, we get that
(: ÷1):
2
o
2
~ .
2
a÷1
.
This looks more useful than it really is. In order to get it you need to know
the population variance, o
2
, in which case you don’t really need the sample
variance, :
2
. The next section discusses the sample distribution of :
2
when
o
2
is unknown.
The properties to take away from this section are that the sample mean
of a normal distribution has a normal distribution, and the sample variance
of a normal distribution has a chi-square distribution (after multiplying by
(: ÷1)o
2
).
CHAPTER 16. SAMPLING DISTRIBUTIONS 198
16.3 t and F distributions
In econometrics the two most frequently encountered distributions are the
t distribution and the F distribution. To brie‡y say where they arise, in
econometrics one often runs a linear regression of the form
n
t
= /
0
+/
1
r
1.t
+... +/
I
r
I.t
+ ~
t
where t is the observation number, r
j.t
is an observation of the i-th explana-
tory variable, and ~
t
~ `(0. o
2
). One then estimates the coe¢cients /
0
. .... /
I
in ways we have described earlier in this book. One uses a t distribution to
test whether the individual coe¢cients /
0
. .... /
I
are equal to zero. If the test
rejects the hypothesis, that explanatory variable has a statistically signi…-
cant impact on the dependent variable. The F test is used to test if linear
combinations of the coe¢cients are equal to zero.
the normal density but with fatter tails, as in Figure 16.2. To get a t
distribution, let ~ r have a standard normal distribution and let ~ n have a chi-
square distribution with : degrees of freedom. Assume that ~ r and ~ n are
independent. Construct the random variable
~
t according to the formula
t =
r
(n:)
1/2
.
Then
~
t has a t distribution with : degrees of freedom. In shorthand,
~
t
a
~
`(0. 1)

.
2
a
:
.
Now look back at Theorem 24. If we sample from a standard normal,
the sample mean has an `(0. 1) distribution and the sum of the squared
deviations has a .
2
a
distribution. So, we get a t distribution when we use
the sample mean in the numerator and something like the sample variance
in the denominator. The trick is to …gure out exactly what we need.
If ~ r ~ `(j. o
2
), then we know that r ~ `(j. o
2
:), in which case
r ÷j
o

:
~ `(0. 1).
We also know that (
¸
(r
j
÷ r)
2
) o
2
~ .
2
a÷1
. Putting this all together yields
~
t =
a÷j
o/

a

P
(a
i
÷ a)
2
o
2

(: ÷1)
=
r ÷j

:
2
:
. (16.1)
CHAPTER 16. SAMPLING DISTRIBUTIONS 199
-5 -4 -3 -2 -1 0 1 2 3 4 5
0.1
0.2
0.3
0.4
x
y
Figure 16.2: Density for the t distribution. The thick curve has 1 degree of
freedom and the thin curve has 5. The dashed curve is the standard normal
density.
The random variable
~
t has a t distribution with : ÷ 1 degrees of freedom.
Also notice that it does not depend on o
2
. It does depend on j, but we
will discuss the meaning of this more in the next chapter. The statistic
computed in expression (16.1) is commonly referred to as a t-statistic.
Like the t distribution, the F distribution is a ratio of two other dis-
tributions. In this case it is the ratio of two chi-square distributions. The
formula is
1
n.a
~
.
2
n
:
.
2
a
:
.
and the density function is shown in Figure 16.3. Because chi-square distri-
butions assign positive probability only to non-negative outcomes, F distri-
butions also assign positive probability only to non-negative outcomes.
The F distribution and the t distribution are related. If
~
t
a
has the t
distribution with : degrees of freedom, then
(
~
t
a
)
2
~ 1
1.a
.
Applying this to expression (16.1) tells us that the following sample statistic
CHAPTER 16. SAMPLING DISTRIBUTIONS 200
0 1 2 3 4 5
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
x
y
Figure 16.3: Density function for the F distribution. The thick curve is
1
2.20
, the dashed curve is 1
5.20
, and the thin curve is 1
10.20
.
has an F distribution:
~
1 =
( r ÷j)
2
:
2
:
~ 1
1.a÷1
.
16.4 Sampling fromthe binomial distribution
Recall that the binomial distribution is used for computing the probability
of getting no more than : successes from : independent trials when the
probability of success in any trial is j.
A series of random trials is going to result in a sequence of successes or
failures, and we can use the random variable ~ r to capture this. Let r
j
= 1
if there was a success in trial i and r
j
= 0 if there was a failure in trial i.
Then
¸
a
j=1
r
j
is the number of successes in : trials, and r = (
¸
r
j
) : is the
average number of successes. Notice that r is also the sample frequency,
that is, the fraction of successes in the sample. The sample frequency has
the following properties:
1[ r] = j
\ c:( r) =
j(1 ÷j)
:
.
CHAPTER
17
Hypothesis testing
The tool of statistical analysis has two primary uses. One is to describe
data, and we do this using such things as the sample mean and the sample
variance. The other is to test hypotheses. Suppose that you have, for
example, a sample of UT graduates and a sample of Vanderbilt graduates,
both from the class of 2002. You may want to know whether or not the
average UT grad makes more than the national average income, which was
about \$35,700 in 2006. You also want to know if the two classes have the
same income. You would perform hypothesis tests to either support or reject
the hypotheses that UT grads have higher average earnings than the national
average and that both UT grads and Vanderbilt grads have the same average
income.
In general, hypothesis testing involves the value of some parameter o that
is determined by the data. There are two types of tests. One is to determine
if the realized value of o is in some set
0
. The other is to compute two
di¤erent values of o from two di¤erent samples and determine if they are the
same or if one is larger than the other.
201
CHAPTER 17. HYPOTHESIS TESTING 202
17.1 Structure of hypothesis tests
The …rst part of testing a hypothesis is forming one. In general a hypothesis
takes the form of o ÷
0
, where
0
is a nonempty set of values for o. It could
be a single value, or it could be a range of values. The statement o ÷
0
is
called the null hypothesis. The alternative hypothesis is that o ÷
0
.
We typically write these as
H
0
(Null hypothesis): o ÷
0
vs.
H
1
(Alternative hypothesis): o ÷
0
.
The form of the alternative hypothesis is determined completely by the
form of the null hypothesis. So, if H
0
is o = o
0
, H
1
is o = o
0
. If H
0
is
o _ o
0
, H
1
is o o
0
. And so on.
Another issue in hypothesis testing is that hypotheses can be rejected
but they cannot be accepted. So, you can establish that something is false,
but not that something is true. Because of this, empirical economists often
make the null hypothesis something they would like to be false. If they can
reject the null hypothesis, that lends support to the alternative hypothesis.
For example, if one thinks that the variance of returns to the Dow-Jones
Industrial Average is smaller than the variance of returns to the S&P 500
index, one would form the null hypothesis that the variance is at least as great
for the Dow and then try to reject it. When running linear regressions, one
tests the null hypothesis that the coe¢cients are zero.
One uses a statistical test to either reject or support the null hypothesis.
The nature of the test is as follows. First we compute a test statistic for o.
Let’s call it 1. For example, if o is the mean of the population distribution,
1 would be the sample mean. As we know, the value of 1 is governed by
a random process. The statistical test identi…es a range of values ¹ for the
random variable 1 such that if 1 ÷ ¹ the null hypothesis is "accepted" and if
1 ÷ ¹ the null hypothesis is rejected. The set ¹ is called the critical region,
and it is important to note that ¹ and
0
are two completely di¤erent things.
For example, a common null hypothesis is H
0
: o = 0. In that case
0
= ¦0¦.
But, we do not reject the null hypothesis if 1 is anything but zero, because
then we would reject the hypothesis with probability 1. Instead, we reject
the hypothesis if 1 is su¢ciently far from zero, or, in our new terminology,
if 1 is outside of the critical region ¹.
CHAPTER 17. HYPOTHESIS TESTING 203
Statistical tests can have errors because of the inherent randomness. It
might be the case that o ÷
0
, so that the null hypothesis is really true,
but 1 ÷ ¹ so we reject the null hypothesis. Or, it might be the case that
o ÷
0
so that the null hypothesis is really false, but 1 ÷ ¹ and we "accept"
it anyway. The possible outcomes of the test are given in the table below.
Value of test statistic 1
1 ÷ ¹ 1 ÷ ¹
True value o ÷
0
Correctly "accept" null Incorrectly reject null
of Type I error
parameter o o ÷
0
Incorrectly "accept" null Correctly reject null
Type II error
A type I error occurs when one rejects a true null hypothesis. A type
II error occurs when a false null hypothesis is not rejected. A problem
arises because reducing the probability of a type I error generally increases
the probability of a type II error. After all, reducing the probability of a
type I error means rejecting the hypothesis less often, whether it is true or
not.
Let 1(.[o) be the distribution of the test statistic . conditional on the
value of the parameter o. The entire previous chapter was about these
distributions. If the null hypothesis is really true, the probability that the
null is "accepted" is 1(¹[o ÷
0
). This is called the con…dence level. This
probability of a type I error is 1 ÷ 1(¹[o ÷
0
). This probability is called
the signi…cance level. The standard is to use a 5% signi…cance level, but
10% and 1% signi…cance levels are also reported. The 5% signi…cance level
corresponds to a 95% con…dence level. I usually interpret the con…dence
level as the level of certainty with which the null hypothesis is false when it
is rejected. So, if I reject the null hypothesis with a 95% con…dence level, I
am 95% sure that the null hypothesis is really false.
Here, then, is the "method of proof" entailed in statistical analysis.
Think of a statement you want to be true. Make this the alternative hy-
pothesis. The null hypothesis is therefore a statement you would like to
reject. Construct a test statistic related to the null hypothesis. Reject the
null hypothesis if you are 95% sure that it is false given the value of the test
statistic.
CHAPTER 17. HYPOTHESIS TESTING 204
For example, suppose that you want to test the null hypothesis that the
mean of a normal distribution is 0. This makes the hypothesis
H
0
: j = 0
vs.
H
1
: j = 0
You take a sample r
1
. .... r
a
and compute the sample mean r and sample
variance :
2
. Construct the test statistic
t =
r ÷j

:
2
:
(17.1)
which we know from Chapter 16 has a t distribution with : ÷ 1 degrees of
freedom. In this case j is the hypothesized mean, which is equal to zero.
Now we must construct a critical range ¹ for the test statistic t. Our critical
range will be an interval around zero so that we reject the null hypothesis if
t is too far from zero. We call this interval the 95% con…dence interval,
and it takes the form (t
1
. t
1
). Let 1
a÷1
(t) be the distribution function for
the t distribution with : ÷1 degrees of freedom. Then the endpoints of the
con…dence interval satisfy
1
a÷1
(t
1
) = 0.025
1
a÷1
(t
1
) = 0.975
The …rst line says that the probability that t is lower than t
1
is 2.5%, and the
second says that the probability that t is higher than t
1
is 2.5%. Combining
these means that the probability that the test statistic t is outside of the
interval (t
1
. t
1
) is 5%.
All of this is shown in Figure 17.1. The probability that t _ t
1
is 2.5%,
as shown by the shaded area. The probability that t _ t
1
is also 2.5%. The
probability that t is between these two points is 95%, and so the interval
(t
1
. t
1
) is the 95% con…dence interval. The hypothesis is rejected if the
value of t lies outside of the con…dence interval.
Another way to perform the same test is to compute 1
a÷1
(t), where t is
the test statistic given in (17.1) above. Reject the hypothesis if
1
a÷1
(t) < 0.025 or 1
a÷1
(t) 0.975.
If the …rst of these inequalities hold, the value of the test statistic is outside
of the con…dence interval and to the left, and if the one on the right holds
CHAPTER 17. HYPOTHESIS TESTING 205
2.5% area
0
tL
x
y
tH
2.5% area
95% confidence interval
Figure 17.1: Two-tailed hypothesis testing
the test statistic is outside of the con…dence interval and to the right. The
null hypothesis cannot be rejected if 0.025 _ 1
a÷1
(t) _ 0.975.
17.2 One-tailed and two-tailed tests
The tests described above are two-tailed tests, because rejection is based
on the test statistic lying in one of the two tails of the distribution. Two-
tailed tests are used when the null hypothesis is an equality hypothesis, so
that it is violated if the test statistic is either too high or too low.
A one-tailed test is used for inequality-based hypotheses, such as the
one below:
H
0
: j _ 0
vs.
H
1
: j < 0
In this case the null hypothesis is rejected if the test statistic is both far
from zero and negative. Large test statistics are compatible with the null
hypothesis as long as they are positive. This contrasts with the two-sided
tests where large test statistics led to rejection regardless of the sign.
CHAPTER 17. HYPOTHESIS TESTING 206
To test the null hypothesis that the mean of a normal distribution is
nonnegative, compute the test statistic given in (17.1). We know from
Chapter 16 that it has the t distribution with : ÷ 1 degrees of freedom.
Letting 1
a÷1
(t) be the cdf of the t distribution with :÷1 degrees of freedom,
we reject the null hypothesis at the 95% con…dence level (or 5% signi…cance
level) if
1
a÷1
(t) < 0.05.
There are two di¤erences between the one-tailed criterion for rejection and
the two-tailed criterion in the previous section. One di¤erence is that the
one-tailed criterion can only be satis…ed one way, with 1
a÷1
(t) small, while
the two-tailed criterion can be satis…ed two ways, with 1
a÷1
(t) either close to
zero or close to one. The second di¤erence is that the one-tailed criterion has
a cuto¤ point of 0.05, while the two-tailed criterion has a lower cuto¤ point
half as big at 0.025. The reason for this is that the two-tailed test splits
the 5% probability mass equally between the two tails, while the one-tailed
criterion puts the whole 5% in the lower tail.
The following table gives the rules for the one-tailed and two-tailed tests
with signi…cance level c and con…dence level 1 ÷ c. The test statistic is .
with distribution function G(.), and the hypotheses concern some parameter
o.
Type of test Hypothesis Reject H
0
if p-value
H
0
: o = o
0
G(.) <
c
2
Two-tailed vs. or 2[1 ÷G([.[)]
H
1
: o = o
0
G(.) 1 ÷
c
2
H
0
: o _ o
0
Upper one-tailed vs. G(.) 1 ÷c 1 ÷G(.)
H
1
: o o
0
H
0
: o _ o
0
Lower one-tailed vs. G(.) < c G(.)
H
1
: o < o
0
The p-value can be thought of as the exact signi…cance level for the test.
The null hypothesis is rejected if the p-value is smaller than c, the desired
signi…cance level.
CHAPTER 17. HYPOTHESIS TESTING 207
17.3 Examples
17.3.1 Example 1
The following sequence is a random sample from the distribution `(j. o
2
).
The task is to test hypotheses about j. The sequence is: 56, 74, 55, 66, 51,
61, 55, 48, 48, 47, 56, 57, 54, 75, 49, 51, 79, 59, 68, 72, 64, 56, 64, 62, 42.
Test the null hypothesis that j = 65. This is a two-tailed test based
on the statistic in equation (17.1). We compute r = 58.73, : = 9.59, and
: = 25. We get
t =
r ÷j

:
2
:
=
58.73 ÷65
9.595
= ÷3.27.
The next task is to …nd the p-value using 1
24
(t), the t distribution with
: ÷ 1 = 24 degrees of freedom. Excel allows you to do this, but it only
allows positive values of t. So, use the command
= TDIST( [t[ , degrees of freedom, number of tails)
= TDIST(3.27, 24, 2) = 0.00324
Thus, we can reject the null hypothesis at the 5% signi…cance level. In fact,
we are 99.7% sure that the null hypothesis is false. Maple allows for both
positive and negative values of t. Using the table above, the p-value can be
found using the formula
2 [1 ÷TDist( [t[ , degrees of freedom)]
TDist(3.27. 24) = 0.99838.
The p-value is 2(1 ÷ 0.99838) = 0.00324, which is the same answer we got
from Excel.
Now test the null hypothesis that j = 60. This time the test statistic
is t = ÷0.663 which yields a p-value of 0.514. We cannot reject this null
hypothesis.
What about the hypothesis that j _ 65? The sample mean is 58.73,
which is less than 65, so we should still do the test. (If the sample mean
had been above 65, there is no way we could reject the hypothesis.) This
is a one-tailed test based on the same test statistic which we have already
CHAPTER 17. HYPOTHESIS TESTING 208
computed, t = ÷3.27. We have to change the Excel command to reduce the
number of tails:
=TDIST(3.27, 24, 1) = 0.00162
Once again we reject the hypothesis at the 5% level. Notice, however, that
the p-value is twice what it was for the two-tailed test. This is as it should
be, as you can …gure out by looking at the above table.
17.3.2 Example 2
You draw a sample of 100 numbers drawn from a normal distribution with
mean j and variance o
2
. You compute the sample mean and sample variance,
and they are r = 10 and :
2
= 16. The null hypothesis is
H
0
: j = 9
Do the data support or reject the hypothesis?
Compute the t-statistic
t =
r ÷j
:

:
=
10 ÷9

16

100
= 2.5
This by itself does not tell us anything. We must plug it into the appropriate
t distribution. The t-statistic has 100 ÷1 = 99 degrees of freedom, and we
can …nd
TDist(2.5. 99) = 0.992 97
and we reject if this number is either less than 0.025 or greater than 0.975.
It is greater than 0.975, so we can reject the hypothesis. Another way to
see it is by computing the j-value
j = 2(1 ÷TDist(2.5. 99)) = 0.014
which is much smaller than the 0.05 required for rejection.
17.3.3 Example 3
Use the same information as example 2, but instead test the null hypothesis
H
0
: j = 10.4.
CHAPTER 17. HYPOTHESIS TESTING 209
Do the data support or reject this hypothesis?
Compute the t-statistic
t =
r ÷j
:

:
=
10 ÷10.4

16

100
= ÷1.0
We can …nd
TDist(÷1. 99) = 0.16
:and we reject if this number is either less than 0.025 or greater than 0.975.
It is not, so we cannot reject the hypothesis. Another way to see it is by
computing the j-value
j = 2(1 ÷TDist(1. 99)) = 0.32
which is much larger than the 0.05 required for rejection.
17.4 Problems
1. Consider the following random sample from a normal distribution with
mean j and standard deviation o
2
:
134. 99. 21. 38. 98. 19. 53. ÷52. 115. 30.
65. 149. 4. 55. 43. 26. 122. 47. 54. 97.
87. 34. 114. 44. 26. 98. 38. 24. 30. 86.
(a) Test the hypothesis that j = 0. Do the data support or reject the
hypothesis?
(b) Test the hypothesis that j = 30.
(c) Test the hypothesis that j _ 65.
(d) Test the hypothesis that j _ 100.
2. Answer the following questions based on this random sample generated
from a normal distribution with mean j and variance o
2
.
89. 51. 12. 17. 71
39. 47. 37. 42. 75
78. 67. 20. 9. 9
44. 71. 32. 13. 61
CHAPTER 17. HYPOTHESIS TESTING 210
(a) What are the best estimates of j and o
2
?
(b) Test the hypothesis that j = 40. Do the data support or reject
the hypothesis?
(c) Test the hypothesis that j = 60. Do the data support or reject
the hypothesis?
CHAPTER
18
Solutions to end-of-chapter problems
Solutions for Chapter 2
1. (a) 1
t
(r) = 72r
2
(r
3
+ 1) +
3
a
+
20
a
5
(b) 1
t
(r) = ÷
20
(4a÷2)
6
(c) 1
t
(r) = ÷c
2a÷14a
3
(42r
2
÷2)
(d) 1
t
(r) =
9
a
1: 3
÷
2. 7
a
1: 3
ln r
(e) 1
t
(r) =
c
(o÷ca)
2
(/ ÷cr
2
) ÷2c
a
o÷ca
2. (a) 1
t
(r) = 24r ÷24
(b) o
t
(r) =
1
4a
3
÷
1
2a
3
ln 3r = (1 ÷2 ln 3r)(4r
3
)
(c) /
t
(r) = ÷4(6r ÷2)(3r
2
÷2r + 1)
5
(d) 1
t
(r) = (1 ÷r)c
÷a
211
CHAPTER 18. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 212
(e)
o
t
(r) =
9
(9r ÷8)
2

2r
2
÷3

5r
3
+ 6÷4
r
9r ÷8

5r
3
+ 6÷
15
2
r
2
9r ÷8
2r
2
÷3

5r
3
+ 6
3. Let 1(r) = r
2
. Then
1
t
(r) = lim
I÷0
1(r +/) ÷1(r)
/
= lim
I÷0
(r +/)
2
÷r
2
/
= lim
I÷0
r
2
+ 2r/ +/
2
÷r
2
/
= lim
I÷0
2r/ +/
2
/
= lim
I÷0
(2r +/)
= 2r.
4. Use the formula
lim
I÷0
1(r +/) ÷1(r)
/
= lim
I÷0
1
a+I
÷
1
a
/
= lim
I÷0
a
a(a+I)
÷
a+I
a(a+I)
/
= lim
I÷0
÷I
a(a+I)
/
= lim
I÷0
÷/
r/(r +/)
= lim
I÷0
÷1
r(r +/)
= ÷
1
r
2
5. (a) Compute 1
t
(3) = ÷18 < 0, so it is decreasing.
(b) Compute 1
t
(13) =
1
13
0, so it is increasing.
(c) Compute 1
t
(4) = ÷5.0c
÷4
< 0, so it is decreasing.
(d) Compute 1
t
(2) =
9
16
0, so it is increasing.
CHAPTER 18. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 213
6. (a) 1
t
(r) =
3
a
2
+4a
÷ (3r ÷2)
2a+4
(a
2
+4a)
2
and 1
t
(÷1) =
1
9
0. It is
increasing.
(b) 1
t
(r) = ÷
1
aln
2
a
and 1
t
(c) = ÷
1
c
< 0. It is decreasing.
(c) 1
t
(r) = 10r + 16 and 1
t
(÷6) = ÷44 < 0. It is decreasing.
7. (a) The FOC is 1
t
(r) = 10÷8r = 0 so r
+
= 54. Compute 1
tt
(54) =
÷8 < 0, so this is a maximum.
(b) The FOC is 1
t
(r) =
84
a
0:3
÷ 6 = 0 so r
+
= 14
1/0.3
. Compute
1
tt
(14
1/0.3
) = ÷2. 721 7 10
÷4
< 0, so this is a maximum.
(c) The FOC is 1
t
(r) = 4 ÷
3
a
= 0 so r
+
= 34. Compute 1
tt
(34) =
16
3
0, so this is a minimum.
8. (a) The foc is 1
t
(r) = 8r÷24 = 0, which is solved when r = 3. Also,
1
tt
(r) = 8 0, so it is a minimum.
(b) The foc is 1
t
(r) = 20r ÷ 4 = 0, which is solved when r = 5.
Also, 1
tt
(r) = ÷20r
2
< 0, so it is a maximum.
(c) The foc is
1
t
(r) =
r + 1
(r + 2)
2
÷
1
r + 2
+ 6 = 0.
which has two solutions: r = ÷
13
6
and r = ÷
11
6
. The second
derivative is 1
tt
(r) =
2
(a+2)
2
÷2
a+1
(a+2)
3
, and 1
tt

13
6
) = ÷432 while
1
tt

11
6
) = 432. The function has a local maximum when r = ÷
13
6
and a local minimum when r = ÷
11
6
.
9. (a) Compute 1
tt
(r) = 2c. Need c < 0.
(b) Need c 0.
10. (a) The problem is
max
n
/(:) ÷c(:)
The FOC is
/
t
(:) = c
t
(:)
Interpretation is that marginal bene…t equals marginal cost.
CHAPTER 18. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 214
(b) We need /
tt
(:) ÷c
tt
(:) _ 0. A better way is to assume /
tt
(:) _ 0
and c
tt
(:) _ 0. This is diminishing marginal bene…t and increas-
ing marginal cost.
(c) The problem is
max
n
n:÷c(:)
The FOC is
n = c
t
(:)
Interpretation is that marginal e¤ort cost equals the wage.
(d) c
tt
(:) _ 0, or increasing marginal e¤ort cost.
11. The …rst-order condition is
:
t
(1) =
90

1
÷90 = 0.
Solving for 1 gives us 1 = 1. Bilco devotes 1 unit of labor to widget
production and the other 59 to gookey production. It produces \ =
20(1)
1/2
= 20 widgets and G = 30(59) = 1770 gookeys.
Solutions for Chapter 3
1. (a) (26. ÷3. 33. 25)
(b) r _ n and r < n and r < n
(c) 77
(d) Yes. r r = 65, n n = 135, and ( r + n) ( r + n) = 354. We have

r r +

n n =

65 +

135 = 19.681
and

( r + n) ( r + n) =

354 = 18. 815.
2. (a) (18. ÷8. ÷48. ÷20)
(b) ÷7
(c)

r r =

65 = 8. 062 3,

n n =

26 = 5. 099, and

( r + n) ( r + n) =

77 = 8. 775 0, which is smaller than

65 +

26 = 13. 161.
CHAPTER 18. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 215
3. (a) 1
a
(r. n) = 8r ÷12n + 18
(b) 1
&
(r. n) = 6n ÷12r
(c)

9
8
.
9
4

4. (a) 1
a
(r. n) = 16n ÷4.
(b) 1
&
(r. n) = 16r ÷2n
2
.
(c) The two foc’s are 16n ÷4 = 0 and 16r ÷2n
2
= 0. The …rst one
implies that n =
1
4
. Plugging this into the second expressions
yields
16r ÷
2
n
2
= 0
16r =
2
(
1
4
)
2
= 32
r = 2
The critical point is (r. n) = (2.
1
4
).
5. (a) 3 ln r + 2 ln n = /
(b) Implicitly di¤erentiate 3 ln r + 2 ln n(r) = / with respect to r to
get
3
r
+
2
n
dn
dr
= 0
dn
dr
= ÷
3r
2n
= ÷
3n
2r
6. (a) :(¡) = j(¡)¡ ÷c¡ = 120¡ ÷4¡
2
÷c¡
(b) The FOC is
120 ÷8¡
+
÷c = 0
¡
+
= 15 ÷
c
8
(c) Using the answer to (b), we have d¡
+
dc = ÷18 < 0
CHAPTER 18. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 216
(d) Plug ¡
+
= 15 ÷
c
8
into :(¡) to get
:(¡) = 120

15 ÷
c
8

÷4

15 ÷
c
8

2
÷c

15 ÷
c
8

= 1800 ÷15c ÷900 + 15c ÷
c
2
16
÷15c +
c
2
8
= 900 ÷15c +
c
2
16
(e) Di¤erentiating yields
:
t
(c) = ÷15 +
c
8
(f) Compare the answers to (b) and (e). Note that ÷¡ is also the
partial derivative of :(¡) = j(¡)¡ ÷c¡ with respect to c, which is
why this works.
7. (a) Implicitly di¤erentiate to get
30r
dr
dc
+ 3c
dr
dc
+ 3r ÷
5
c

dr
dc
+
5r
c
2
= 0.
Solving for drdc yields

30r + 3c ÷
5
c

dr
dc
= ÷

3r +
5r
c
2

dr
dc
= ÷
3r +
5a
o
2
30r + 3c ÷
5
o
= ÷
r
c
3c
2
+ 5
3c
2
+ 30rc ÷5
(b) Implicitly di¤erentiate to get
12rc
dr
dc
+ 6r
2
= 5 ÷5c
2
dr
dc
÷10rc.
Solving for drdc yields

12rc + 5c
2

dr
dc
= 5 ÷10rc ÷6r
2
dr
dc
=
5 ÷10rc ÷6r
2
12rc + 5c
2
CHAPTER 18. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 217
8. (a) The foc is
30

1
÷n = 0
and solving it for 1 gets
30 = n

1

1 =
30
n
1
+
=
900
n
2
(b) Since
1
+
=
900
n
2
we have
d1
+
dn
= ÷
1800
n
3
< 0
The …rm uses fewer workers when the wage rises.
(c) Plugging 1
+
into the pro…t function yields
:
+
= 30

4
900
n
2
÷n
900
n
2
=
900
n
and from there we …nd
d:
+
dn
= ÷
900
n
2
< 0.
Pro…t falls when the wage rises. This happens for two reasons.
One is that the …rm must pay workers more, and the other is that
it uses fewer workers (see part b) and produces less output.
9. (a) Implicitly di¤erentiate to …nd d1d1:
1
1
(1. 1)
d1
d1
+1
1
(1. 1) = 0
d1
d1
= ÷
1
1
(1. 1)
1
1
(1. 1)
(b) Both 1
1
and 1
1
are positive, so d1d1 = ÷1
1
1
1
< 0 and the
isoquant slopes downward.
CHAPTER 18. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 218
Solutions for Chapter 4
1. Set up the Lagrangian
L(r. n. \) = 12r
2
n
4
+\(120 ÷2r ÷4n).
The foc’s are
·L
·r
= 24rn
4
÷2\ = 0
·L
·n
= 48r
2
n
3
÷4\ = 0
·L
·\
= 120 ÷2r ÷4n = 0
This is three equations in three unknowns, so now we solve for the
values of r, n, and \. There are many ways to do this, and one of
them can be found on page 39. Here is another. Solve the third
equation for r:
120 ÷2r ÷4n = 0
r = 60 ÷2n
Substitute this into the …rst two equations
24rn
4
÷2\ = 0
48r
2
n
3
÷4\ = 0
to get
24(60 ÷2n)n
4
÷2\ = 0
48(60 ÷2n)
2
n
3
÷4\ = 0
Multiply the top equation by ÷2 and add the result to the second
equation to get
48(60 ÷2n)
2
n
3
÷4\ ÷

48(60 ÷2n)n
4
÷4\

= 0
The terms with \ in them cancel out, and we are left with
48(60 ÷2n)
2
n
3
÷48(60 ÷2n)n
4
= 0
CHAPTER 18. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 219
Divide both sides by 48(60 ÷2n)n
3
to get
(60 ÷2n) ÷n = 0
60 ÷3n = 0
n = 20
Substitute this back into things we know to get
r = 60 ÷2n = 20
and
\ = 12(60 ÷2n)n
4
= 12(20)(20
4
) = 38. 400. 000.
2. The Lagrangian is
L(c. /. \) = 3 ln c + 2 ln / +\(400 ÷12c ÷14/)
The FOCs are
·L
·c
=
3
c
÷12\ = 0
·L
·/
=
2
/
÷14\ = 0
·L
·\
= 400 ÷12c ÷14/ = 0
Solving the …rst two yields c = 312\ and / = 214\. Substituting
into the third equation gives us
400 ÷12

3
12\

÷14

2
14\

= 0
400 ÷
3
\
÷
2
\
= 0
400 =
5
\
\ =
5
400
=
1
80
Plugging into the earlier expressions,
c =
3
12\
=
3
1280
=
240
12
= 20
and
/ =
2
14\
=
2
1480
=
160
14
=
80
7
.
CHAPTER 18. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 220
3. The Lagrangian is
L(r. n. \) = 16r +n +\(1 ÷r
1/4
n
3/4
).
The FOCs are
·L
·r
= 16 ÷
1
4
\

n
r

3/4
= 16 ÷
\
4
r
1/4
n
3/4
r
= 0
·L
·n
= 1 ÷
3
4
\

r
n

1/4
= 1 ÷
3\
4
r
1/4
n
3/4
n
= 0
·L
·\
= 1 ÷r
1/4
n
3/4
= 0
From the third FOC we know that
r
1/4
n
3/4
= 1.
so the other two FOCs simplify to
\ = 64r
and
\ =
4
3
n.
Setting these equal to each other gives us
4
3
n = 64r
n = 48r.
Plugging this into the third FOC yields
r
1/4
n
3/4
= 1
r
1/4
(48r)
3/4
= 1
r =
1
48
3/4
=
3
1/4
24
.
We can then solve for
n = 48r = 2 3
1/4
and
\ =
4n
3
=
8 3
1/4
3
.
CHAPTER 18. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 221
4. Set up the Lagrangian
L(r. n. \) = 3rn + 4r +\(80 ÷4r ÷12n).
The foc’s are
·L
·r
= 3n + 4 ÷4\ = 0
·L
·n
= 3r ÷12\ = 0
·L
·\
= 80 ÷4r ÷12n = 0
The …rst equation reduces to n = 4(\ ÷ 1)3 and the second equation
tells us that r = 4\. Substituting these into the third equation yields
80 ÷4r ÷12n = 0
80 ÷4(4\) ÷12(4)(\ ÷1)3 = 0
96 ÷32\ = 0
\ = 3
Plugging this into the equations we already derived gives us the rest of
the solution:
r = 4\ = 12
n = 4(\ ÷1)3 = 83.
5. Set up the Lagrangian
L(r. n. \) = 5r + 2n +\(80 ÷3r ÷2rn)
The foc’s are
·L
·r
= 5 ÷3\ ÷2n\ = 0
·L
·n
= 2 ÷2r\ = 0
·L
·\
= 80 ÷3r ÷2rn = 0
CHAPTER 18. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 222
5 ÷3\ ÷2n\ = 0
2 ÷2r\ = 0
80 ÷3r ÷2rn = 0
Now we solve these equations. The third one reduces to
80 ÷3r ÷2rn = 0
2rn = 80 ÷3r
n =
80 ÷3r
2r
and the second one reduces to
2 ÷2r\ = 0
\ =
1
r
.
Substitute these into the …rst one to get
5 ÷3\ ÷2n\ = 0
5 ÷3

1
r

÷2

80 ÷3r
2r

1
r

= 0
Multiplying through by r
2
yields
5r
2
÷3r ÷80 + 3r = 0
5r
2
= 80
r
2
= 16
r = ±4
Note that we only use the positive root in economics, so r = 4. Sub-
stituting into the other two equations yields
n =
80 ÷3r
2r
=
17
2
and
\ =
1
r
=
1
4
.
CHAPTER 18. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 223
6. (a) :
t
(r) = 400 + 4r 0. It says that increasing the size of the farm
leads to increased pro…t, which seems sensible when the farm starts o¤
small.
(b) :
tt
(r) = 4. This is questionable. But, it could arise because of
increasing returns to scale or because of …xed inputs.
(c) The Lagrangian is
L(r. \) = 400r + 2r
2
+\(10 ÷r).
The FOCs are
·L
·r
= 400 + 4r ÷\ = 0
·L
·\
= 10 ÷r = 0
The second one tells us that r = 10 and the …rst one tells us that
\ = 400 + 4r = 440.
(d) It is the marginal value of land.
(e) That would be :
t
(10) = 440. This, by the way, is why the lame
problem is useful. Note that the answers to (d) and (e) are the
same.
(f) No. remember that :
t
(r) 0, so : is increasing and more land is
better. Pro…t is maximized subject to the constraint. Obviously,
constrained optimization will require a di¤erent set of second order
conditions than unconstrained optimization does.
7. (a) :
t
(1) = 30

1 ÷ 10 which is positive when 1 < 9. We would
hope for an upward-sloping pro…t function, so this works, especially
since 1 is only equal to 4.
(b) :
tt
(1) = ÷151
3/2
which is negative. Pro…t grows at a decreasing
rate, which makes sense.
(c) The Lagrangian is
L(1. \) = 30

41 ÷101 +\(4 ÷1)
CHAPTER 18. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 224
The foc’s are
·L
·1
=
30

1
÷10 ÷\ = 0
·L
·\
= 4 ÷1 = 0
The second equation can be solved to get 1 = 4. Plugging 1 = 4
into the …rst equation yields
30

1
÷10 ÷\ = 0
30

1
÷10 ÷\ = 0
5 = \
(d) The Lagrange multiplier is always the marginal value of relaxing
the constraint, where the value comes from whatever the objective
function measures. In this case the objective function is the
pro…t function, and the constraint is on the number of workers
the …rm can use at one time, so the Lagrange multiplier measures
the marginal pro…t from adding workers.
(e) This is
:
t
(4) = 30

4 ÷10 = 5.
Note that this matches the answer from (c).
(f) No. The …rst derivative of the pro…t function is positive (and equal
to 5) when 1 = 4, which means that pro…t is increasing when 1
is 4. The second derivative does not tell us whether we are at a
maximum or minimum when there is a constraint.
8. (a) The Lagrangian is
L(r. n. \) = r
c
n
1÷c
+\(` ÷j
a
r ÷j
&
n).
The FOCs are
·L
·r
= cr
c÷1
n
1÷c
÷\j
a
= 0
·L
·n
= (1 ÷c)r
c
n
÷c
÷\j
&
= 0
·L
·\
= ` ÷j
a
r ÷j
&
n = 0.
CHAPTER 18. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 225
Rearrange the …rst two to get
c

&
a

1÷c
j
a
= \
(1 ÷c)

a
&

c
j
&
= \.
Set them equal to each other to get
c

&
a

1÷c
j
a
=
(1 ÷c)

a
&

c
j
&

n
r

1÷c

n
r

c
=
(1 ÷c)
c
j
a
j
&
n
r
=
(1 ÷c)
c
j
a
j
&
n =
(1 ÷c)
c
j
a
j
&
r.
Now substitute this into the budget constraint to get
j
a
r +j
&
n = `
j
a
r +j
&
(1 ÷c)
c
j
a
j
&
r = `
j
a
r +
(1 ÷c)
c
j
a
r = `
j
a
r =
`
1 +
(1÷c)
c
= c`
r =
c`
j
a
.
Substituting this back into what we found for n yields
n =
(1 ÷c)
c
j
a
j
&
r
=
(1 ÷c)
c
j
a
j
&
c`
j
a
=
(1 ÷c)`
j
&
.
CHAPTER 18. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 226
(b) These are easy.
·r
+
·`
=
c
j
a
0
·n
+
·`
=
1 ÷c
j
&
0.
(c) Again, these are easy.
·r
+
·j
a
= ÷
c`
j
2
a
< 0
·n
+
·j
a
= 0.
The demand curve for good r is downward-sloping, and it is in-
dependent of the price of the other good.
9. (a) Denote labor devoted to widget production by n and labor devoted
to gookey production by o. The Lagrangian is
L(n. o. \) = (9)(20n
1/2
) + (3)(30o) ÷(11)(n +o) +\(60 ÷n ÷o).
The foc’s are
·L
·n
= 90n
÷
1
2
÷11 ÷\ = 0
·L
·o
= 90 ÷11 ÷\ = 0
·L
·\
= 60 ÷n ÷o = 0
The second equation says that \ = 79. Plugging this into the …rst
equation yields
90

n
÷11 ÷79 = 0
90 = 90

n
n = 1
The third equation then implies that o = 60 ÷n = 59. These are the
same as the answers to the question 4 on the …rst homework.
CHAPTER 18. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 227
(b) The Lagrange multiplier is the marginal value of adding workers.
10. (a) The farmer’s problem is
max
1.W
1\
s.t. 21 + 2\ = 1
\ = o
(b) The Lagrangian is
L(1. \. \. j) = 1\ +\(1 ÷21 ÷2\) +j(o ÷\).
The …rst-order conditions are
·L
·1
= \ ÷2\ = 0
·L
·\
= 1 ÷2\ ÷j = 0
·L
·\
= 1 ÷21 ÷2\ = 0
·L
·j
= o ÷\ = 0
We must solve this set of equations:
\ = o (from fourth equation)
1 = 12 ÷o (from third equation)
\ = o2 (from second equation)
j = 12 ÷2o (from …rst equation)
(c) It depends. The marginal impact on area comes directly from the
Lagrange multipliers. \ is the marginal impact of having a longer
fence while keeping the shortest side …xed, and j is the marginal
impact of lengthening the shortest side while keeping the total
CHAPTER 18. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 228
fence length constant. We want to know which is greater, o2 or
12 ÷2o. We can …nd
\ _ j
o2 _ 12 ÷2o
5o2 _ 12
o _ 15.
When the shortest side is more than one-…fth of the total amount
of fencing, the farmer would rather lengthen the fence than lengthen
the shortest side. When the shortest side is smaller than a …fth
of the fence lenght, she would rather lengthen that side, keeping
the total fence length …xed.
Solutions for Chapter 5
1. (a) The solution to the alternative problem is (r. n) = (8.
8
3
). Note
that 4 8 +
8
3
= 34
2
3
20, so the second constraint does not hold.
(b) The solution to the alternative problem is (r. n) = (
10
3
.
20
3
). Note
that 2
10
3
+3
20
3
=
80
3
24, so the …rst constraint does not hold.
(c) If the solution to the alternative problem in (a) had satis…ed the
second constraint, the second constraint would have been non-
binding and its Lagrange multiplier would have been zero. This
is not what happened, though, so the second constraint must bind,
in which case \
2
0. Similarly, part (b) shows us that the …rst
constraint must also bind, and so \
1
0.
(d) Because both constraints bind, the problem becomes
max
a.&
r
2
n
s.t. 2r + 3n = 24
4r +n = 20
This is easy to solve because there is only one point that satis-
…es both constraints: (r. n) = (
18
5
.
28
5
). Now …nd the Lagrange
CHAPTER 18. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 229
multipliers. The FOCs for the equality-constrained problem are
2rn ÷2\
1
÷4\
2
= 0
r
2
÷3\
1
÷\
2
= 0
24 ÷2r ÷3n = 0
20 ÷4r ÷n = 0
We already used the last two to …nd r and n. Plug those values
into the …rst two to get two equations in two unknowns:
2\
1
÷4\
2
=
1008
25
3\
1
+\
2
=
324
25
The solution to this is (\
1
. \
2
) = (
144
125
.
1188
125
).
2. (a) The solution to the alternative problem is (r. n) = (8.
8
3
). Note
that 4 8 +
8
3
= 34
2
3
< 36, so the second constraint does hold this time.
(b) The solution to the alternative problem is (r. n) = (6. 12). Note
that 2 6 + 3 12 = 48 24, so the …rst constraint does not hold.
(c) The solution to the alternative problem in (a) satis…es the second
constraint, so the second contrainst is nonbinding. Therefore
\
1
0 and \
2
= 0.
(d) Because only the …rst constraint binds, the problem becomes
max
a.&
r
2
n
s.t. 2r + 3n = 24
We know from part (a) that (r. n) = (8.
8
3
). We also know that
\
2
= 0. To …nd \
1
use the FOCs for the equality-constrained
problem:
2rn ÷2\
1
= 0
r
2
÷3\
1
= 0
24 ÷2r ÷3n = 0
Plug r = 8 into the second equation to get \
1
=
64
3
. Or, plug
r = 8 and n =
8
3
into the …rst equation to get the same thing.
CHAPTER 18. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 230
3. (a) Setting \
2
= 0 in the original Lagrangian we get the …rst-order
conditions
·L
·r
= 4n ÷6r ÷\
1
= 0
·L
·n
= 4r ÷4\
1
= 0
·L
·\
1
= 36 ÷r ÷4n = 0
We solve these for r, n, and \
1
and get
r =
9
2
, n =
63
8
, \
1
=
9
2
We then have 5r + 2n =
45
2
+
63
4
=
153
4
< 45 and the second constraint
is satis…ed.
(b) Setting \
1
= 0 in the original Lagrangian, the foc’s are
·L
·r
= 4n ÷6r ÷5\
2
= 0
·L
·n
= 4r ÷2\
2
= 0
·L
·\
2
= 45 ÷5r ÷2n = 0
The solution is
r =
45
13
, n =
180
13
. \
2
=
90
13
We then have r+4n =
45
13
+
720
13
=
765
13
36 and the …rst constraint
is not satis…ed.
(c) Part (a) shows that we can get a solution when the …rst constraint
binds and the second doesn’t, and part (b) shows that we cannot
get a solution when the second constraint binds but the …rst does
not. So, the answer comes from part (a), with
r =
9
2
, n =
63
8
, \
1
=
9
2
, \
2
= 0.
CHAPTER 18. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 231
4. (a) The foc’s are
·L
·r
= 3n ÷8 ÷\
1
= 0
·L
·n
= 3r ÷4\
1
= 0
·L
·\
1
= 24 ÷r ÷4n = 0
The solution is
r =
20
3
, n =
13
3
, \
1
= 5
and since 5r + 2n =
100
3
+
26
3
= 42 30 the second constraint is not
satis…ed.
(b) The foc’s are
·L
·r
= 3n ÷8 ÷5\
2
= 0
·L
·n
= 3r ÷2\
2
= 0
·L
·\
2
= 30 ÷5r ÷2n = 0
3n ÷8 ÷5\
2
= 0
3r ÷2\
2
= 0
30 ÷5r ÷2n = 0
The solution is
r =
37
15
, n =
53
6
, \
2
=
37
10
and since r + 4n =
189
5
24 the …rst constraint is not satis…ed.
(c) Since when one constraint binds the other fails, they must both
bind. The place where they both bind is the intersection of the
two "budget lines," or where the following system is solved:
r + 4n = 24
5r + 2n = 30
CHAPTER 18. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 232
The solution is r = 4, n = 5. Now we have to …nd the values of
\
1
and \
2
. To do this, go back to the foc’s for the entire original
Lagrangian:
·L
·r
= 3n ÷8 ÷\
1
÷5\
2
= 0
·L
·n
= 3r ÷4\
1
÷2\
2
= 0
·L
·\
1
= 24 ÷r ÷4n = 0
·L
·\
2
= 30 ÷5r ÷2n = 0
Plug the values for r and n into the …rst two equations to get
15 ÷8 ÷\
1
÷5\
2
= 0
12 ÷4\
1
÷2\
2
= 0
and solve for \
1
and \
2
. The solution is \
1
=
23
9
and \
2
=
8
9
.
5. (a)
1(r. n. \) = r
2
n +\[42 ÷4r ÷2n].
(b)
r
·1
·r
= r(2rn ÷4\) = 0
n
·1
·n
= n(r
2
÷2\) = 0
\
·1
·\
= \(42 ÷4r ÷2n) = 0
r. n. \ _ 0
(c) First notice that the objective function is r
2
n, which is zero if
either r or n is zero. Consequently, neither r _ 0 nor n _ 0 can
be binding. The other, budget-like constraint is binding because
r
2
n is increasing in both arguments, and so \ 0. The Kuhn-
Tucker conditions reduce to
2rn ÷4\ = 0
r
2
÷2\ = 0
42 ÷4r ÷2n = 0
CHAPTER 18. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 233
Solving yields (r. n. \) = (7. 7.
49
2
).
6. (a)
1(r. n. \) = rn + 40r + 60n +\(12 ÷r ÷n)
(b)
r
·1
·r
= r(n + 40 ÷\) = 0
n
·1
·n
= n(r + 60 ÷\) = 0
\
·1
·\
= \(12 ÷r ÷n) = 0
r. n. \ _ 0
(c) This one is complicated, because we can identify three potential
solutions: (i) r is zero and n is positive, (ii) r is positive and n is
zero, and (iii) both r and n are positive. The only thing to do is
try them one at a time.
Case (i): r = 0. Then n = 12 from the third equation, and \ = 60
from the second equation. The value of the objective function is
rn + 40r + 60n = 720.
Case (ii): n = 0. Then r = 12 from the third equation, and \ = 40
from the …rst equation. The value of the objective function is
480. This case is not as good as case (i), so it cannot be the
Case (iii): r. n 0. Divide both sides of the …rst K-T condition by
r, which is legal since r 0, divide the second by n, and divide
the third by \. We get
n + 40 ÷\ = 0
r + 60 ÷\ = 0
12 ÷r ÷n = 0
The solution to this systemof equations is r = ÷4, n = 16, \ = 56.
This is not allowed, though, because r < 0.
The …nal solution is case (i): r = 0, n = 12, \ = 60.
CHAPTER 18. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 234
Solutions for Chapter 6
1. (a)

19 ÷16 ÷22
2 ÷3 61

(b)

10 0 ÷1
13 18 5

(c)

¸
7 1 2
19 2 2
÷23 ÷36 1
¸

(d) 39
2. (a)

¸
14 ÷23
÷21 32
9 11
¸

(b)

¸
7 0
12 14
0 9
¸

(c)

33 ÷13 32
7 ÷11 36

(d) 84
3. (a) 14
(b) 134
4. (a) 21
(b) ÷12
5. In matrix form the system of equations is

¸
6 ÷2 ÷3
2 4 1
3 0 ÷1
¸

¸
r
n
.
¸

=

¸
1
÷2
8
¸

.
CHAPTER 18. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 235
Using Cramer’s rule we get
r =
det

¸
1 ÷2 ÷3
÷2 4 1
8 0 ÷1
¸

det

¸
6 ÷2 ÷3
2 4 1
3 0 ÷1
¸

=
80
2
= 40
n =
det

¸
6 1 ÷3
2 ÷2 1
3 8 ÷1
¸

det

¸
6 ÷2 ÷3
2 4 1
3 0 ÷1
¸

=
÷97
2
. =
det

¸
6 ÷2 1
2 4 ÷2
3 0 8
¸

det

¸
6 ÷2 ÷3
2 4 1
3 0 ÷1
¸

=
224
2
= 112
6. In matrix form the system of equations is

¸
5 ÷2 1
3 ÷1 0
0 3 2
¸

¸
r
n
.
¸

=

¸
9
9
15
¸

.
Using Cramer’s rule we get
r =
det

¸
9 ÷2 1
9 ÷1 0
15 3 2
¸

det

¸
5 ÷2 1
3 ÷1 0
0 3 2
¸

=
60
11
CHAPTER 18. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 236
n =
det

¸
5 9 1
3 9 0
0 15 2
¸

det

¸
5 ÷2 1
3 ÷1 0
0 3 2
¸

=
81
11
. =
det

¸
5 ÷2 9
3 ÷1 9
0 3 15
¸

det

¸
5 ÷2 1
3 ÷1 0
0 3 2
¸

=
÷39
11
7. (a)
1
8

1 ÷3
2 2

(b)
1
2

¸
1 ÷2 1
÷1 2 1
1 0 ÷1
¸

8. (a)
1
14

÷4 ÷1
÷2 ÷4

(b)
1
25

¸
5 ÷4 3
0 15 ÷5
0 ÷5 10
¸

Solutions for Chapter 7
1. (a) The determinant of the matrix

¸
3 6 0
2 0 ÷5
1 ÷1 ÷1
¸

is ÷33, and so there is a unique solution.
CHAPTER 18. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 237
(b) The determinant of the matrix

¸
4 ÷1 8
17 ÷8 10
÷3 2 2
¸

is 0, and so there is not a unique solution. To …nd out whether
there is no solution or an in…nite number, get the augmented
matrix in row-echelon form.

¸
4 ÷1 8
17 ÷8 10
÷3 2 2

160
200
40
¸

¸
4 ÷1 8
0 ÷
15
4
÷24
0
5
4
8

160
÷480
160
¸

¸
4 ÷1 8
0 ÷
15
4
÷24
0 0 0

160
÷480
0
¸

Since the bottomrowis zeros all the way across, there are in…nitely
many solutions.
(c) The determinant of the matrix

¸
2 ÷3 0
3 0 5
2 6 10
¸

is 0, and so there is not a unique solution. To …nd out whether
there is no solution or an in…nite number, get the augmented
matrix in row-echelon form.

¸
2 ÷3 0
3 0 5
2 6 10

6
15
18
¸

Multiply the …rst row by 2 and add it to the third row:

¸
2 ÷3 0
3 0 5
6 0 10

6
15
30
¸

CHAPTER 18. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 238
Multiply the second row by 2 and subtract it from the third row:

¸
2 ÷3 0
3 0 5
0 0 0

6
15
0
¸

There is a row of all zeros, so there is an in…nite number of solu-
tions.
(d) The determinant of the matrix

¸
4 ÷1 8
3 0 2
5 1 ÷2
¸

is 0, and so there is not a unique solution. To …nd out whether
there is no solution or an in…nite number, get the augmented
matrix in row-echelon form.

¸
4 ÷1 8
3 0 2
5 1 ÷2

30
20
40
¸

Add the top row to the bottom row:

¸
4 ÷1 8
3 0 2
9 0 6

30
20
70
¸

Multiply the middle row by 3 and subtract it from the bottom
row:

¸
4 ÷1 8
3 0 2
0 0 0

30
20
10
¸

Since the bottom row is zeros all the way across except for the
last column, there is no solution.
(e) The determinant of the matrix

¸
6 ÷1 ÷1
5 2 ÷2
0 1 ÷2
¸

is ÷27, there is a unique solution.
CHAPTER 18. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 239
2. (a) There is no inverse if the determinant is zero, which leads to the
equation
6c + 2 = 0
c = ÷
1
3
.
(b) Setting the determinant equal to zero and solving for c yields
÷5c ÷5 = 0
c = ÷1
(c) There is no inverse if the determinant is zero, which leads to the
equation
5c + 9 = 0
c = ÷
9
5
.
(d) Setting the determinant equal to zero and solving for c yields
20c ÷35 = 0
c =
7
4
Solutions for Chapter 8
1. (a) Rewrite the system as
) = c((1 ÷t)) ) +i(1) +G
` = 1 :(). 1)
Implicitly di¤erentiate with respect to t to get
d)
dt
= ÷c
t
) + (1 ÷t)c
t
d)
dt
+i
t
d1
dt
0 = 1:
Y
d)
dt
+1:
1
d1
dt
Write in matrix form:

1 ÷(1 ÷t)c
t
÷i
t
:
Y
:
1

oY
ot
o1
ot

=

÷c
t
)
0

CHAPTER 18. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 240
Use Cramer’s rule to get:
d)
dt
=

÷c
t
) ÷i
t
0 :
1

(1 ÷t)c
t
÷i
t
:
Y
:
1

= ÷c
t
)
:
1
(1 ÷(1 ÷t)c
t
):
1
+:
Y
i
t
which is ÷c
t
) times the derivative from the lecture. It is negative, so
an increase in the tax rate reduces GDP.
d1
dt
=

1 ÷(1 ÷t)c
t
÷c
t
)
:
Y
0

(1 ÷t)c
t
÷i
t
:
Y
:
1

= ÷c
t
)
:
Y
(1 ÷(1 ÷t)c
t
):
1
+:
Y
i
t
which is ÷c
t
) times the derivative from the lecture. It is negative, so
an increase in the tax rate reduces the interest rate.
(b) Implicitly di¤erentiate the system with respect to ` to get
d)
d`
= (1 ÷t)c
t
d)
d`
+i
t
d1
d`
1 = 1:
Y
d)
d`
+1:
1
d1
d`
Write in matrix form:

(1 ÷t)c
t
÷i
t
:
Y
:
1

oY
oA
o1
oA

=

0
1

Use Cramer’s rule to get:
d)
d`
=

0 ÷i
t
1 :
1

(1 ÷t)c
t
÷i
t
:
Y
:
1

=
i
t
(1 ÷(1 ÷t)c
t
):
1
+:
Y
i
t
CHAPTER 18. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 241
Both the numerator and denominator are negative, making the
derivative positive, and so an increase in money supply increases
GDP.
d1
dt
=

(1 ÷t)c
t
0
:
Y
1

(1 ÷t)c
t
÷i
t
:
Y
:
1

=
(1 ÷t)c
t
(1 ÷(1 ÷t)c
t
):
1
+:
Y
i
t
The numerator is positive, making the derivative negative, and so
an increase in money supply reduces the interest rate.
(c) Implicitly di¤erentiate the system with respect to 1 to get
d)
d1
= (1 ÷t)c
t
d)
d1
+i
t
d1
d1
0 = :+1:
Y
d)
d1
+1:
1
d1
d1
Write in matrix form:

1 ÷(1 ÷t)c
t
÷i
t
:
Y
:
1

oY
o1
o1
o1

=

0
÷:

The derivatives are ÷: times the derivatives from part (b), and
so an increase in the price level reduces GDP and increases the
interest rate.
2. (a) First simplify to two equations:
) = c() ÷1) +i(1) +G+r(). 1)
` = 1 :(). 1)
Implicitly di¤erentiate with respect to G to get
d)
dG
= c
t
d)
dG
+i
t
d1
dG
+ 1 +r
Y
d)
dG
+r
1
d1
dG
0 = 1 :
Y
d)
dG
+1 :
1
d1
dG
CHAPTER 18. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 242
Rearrange as
d)
dG
÷c
t
d)
dG
÷i
t
d1
dG
÷r
Y
d)
dG
÷r
1
d1
dG
= 1
:
Y
d)
dG
+:
1
d1
dG
= 0
We can write this in matrix form

1 ÷c
t
÷r
Y
÷i
t
÷r
1
:
Y
:
1

oY
oG
o1
oG

=

1
0

Now use Cramer’s rule to solve for d)dG and d1dG:
d)
dG
=

1 ÷i
t
÷r
1
0 :
1

1 ÷c
t
÷r
Y
÷i
t
÷r
1
:
Y
:
1

=
:
1
(1 ÷c
t
÷r
Y
):
1
+:
Y
(i
t
+r
1
)
The numerator is negative. The denominator is negative. So, d)dG
0.
d1
dG
=

1 ÷c
t
÷r
Y
1
:
Y
0

1 ÷c
t
÷r
Y
÷i
t
÷r
1
:
Y
:
1

=
÷:
Y
(1 ÷c
t
÷r
Y
):
1
+:
Y
(i
t
+r
1
)
The numerator is negative and so is the denominator. Thus, d1dG
0. An increase in government spending increases both GDP and in-
terest rates in the short run.
(b) In matrix form we get

1 ÷c
t
÷r
Y
÷i
t
÷r
1
:
Y
:
1

oY
oG
o1
oG

=

÷c
t
0

Thus, the derivatives are ÷c
t
times those in part (a), so an increase
in tax revenue reduces both GDP and interest rates.
CHAPTER 18. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 243
(c)
d)
d`
=

1 ÷i
t
÷r
1
0 :
1

1 ÷c
t
÷r
Y
÷i
t
÷r
1
:
Y
:
1

=
i
t
+r
1
(1 ÷c
t
÷r
Y
):
1
+:
Y
(i
t
+r
1
)
Both the numerator and denominator are negative, making the
derivative positive, and so an increase in money supply increases
GDP.
d1
d`
=

1 ÷c
t
÷r
Y
0
:
Y
1

(1 ÷t)c
t
÷i
t
:
Y
:
1

=
1 ÷c
t
÷r
Y
(1 ÷c
t
÷r
Y
):
1
+:
Y
(i
t
+r
1
)
The numerator is positive, making the derivative negative, and so
an increase in money supply reduces the interest rate.
3. (a) Rewrite the system as
) = c((1 ÷t)) ) +i(1) +r(). 1) +G
` = 1 :(). 1)
) =

)
Implicitly di¤erentiate with respect to G to get
d)
dG
= (1 ÷t)c
t
d)
dG
+i
t
d1
dG
+r
Y
d)
dG
+r
1
d1
dG
+ 1
0 = :(). 1)
d1
dG
+1:
Y
d)
dG
+1:
1
d1
dG
d)
dG
= 0
The last line implies, obviously, that d)dG = 0. This makes sense
because ) is …xed at the exogenous level

) . Even so, let’s go through
CHAPTER 18. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 244
the e¤ort of writing the equation in matrix notation:

¸
1 ÷(1 ÷t)c
t
÷r
Y
÷i
t
÷r
1
0
1:
Y
1:
1
:
1 0 0
¸

¸
d)dG
d1dG
d1dG
¸

=

¸
1
0
0
¸

Use Cramer’s rule to get
d)
dG
=

1 ÷i
t
÷r
1
0
0 1:
1
:
0 0 0

1 ÷(1 ÷t)c
t
÷r
Y
÷i
t
÷r
1
0
1:
Y
1:
1
:
1 0 0

= 0
where the result follows immediately from the row with all zeroes. The
increase in government spending has no long-run impact on GDP. As
for interest rates,
d1
dG
=

1 ÷(1 ÷t)c
t
÷r
Y
1 0
1:
Y
0 :
1 0 0

1 ÷(1 ÷t)c
t
÷r
Y
÷i
t
÷r
1
0
1:
Y
1:
1
:
1 0 0

=
:
÷:i
t
÷:r
1
= ÷
1
i
t
+r
1
0.
Increased government spending leads to an increase in interest rates in
the long run. Finally,
d1
dG
=

1 ÷(1 ÷t)c
t
÷r
Y
÷i
t
÷r
1
1
1:
Y
1:
1
0
1 0 0

1 ÷(1 ÷t)c
t
÷r
Y
÷i
t
÷r
1
0
1:
Y
1:
1
:
1 0 0

=
÷1:
1
÷:i
t
÷:r
1
0.
An increase in government spending leads to an increase in the price
level.
CHAPTER 18. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 245
(b) This time implicitly di¤erentiate with respect to ` to get
d)
d`
= (1 ÷t)c
t
d)
d`
+i
t
d1
d`
+r
Y
d)
d`
+r
1
d1
d`
1 = :(). 1)
d1
d`
+1:
Y
d)
d`
+1:
1
d1
d`
d)
d`
= 0
Write the equation in matrix notation:

¸
1 ÷(1 ÷t)c
t
÷r
Y
÷i
t
÷r
1
0
1:
Y
1:
1
:
1 0 0
¸

¸
d)d`
d1d`
d1d`
¸

=

¸
0
1
0
¸

Use Cramer’s rule to get
d)
d`
=

0 ÷i
t
÷r
1
0
1 1:
1
:
0 0 0

1 ÷(1 ÷t)c
t
÷r
Y
÷i
t
÷r
1
0
1:
Y
1:
1
:
1 0 0

= 0
The increase in money supply has no long-run impact on GDP.
As for interest rates,
d1
d`
=

1 ÷(1 ÷t)c
t
÷r
Y
0 0
1:
Y
1 :
1 0 0

1 ÷(1 ÷t)c
t
÷r
Y
÷i
t
÷r
1
0
1:
Y
1:
1
:
1 0 0

=
0
÷:i
t
÷:r
1
= 0.
Increasing the money supply has no long-run impact on interest
rates, either. Finally,
d1
d`
=

1 ÷(1 ÷t)c
t
÷r
Y
÷i
t
÷r
1
0
1:
Y
1:
1
1
1 0 0

1 ÷(1 ÷t)c
t
÷r
Y
÷i
t
÷r
1
0
1:
Y
1:
1
:
1 0 0

=
÷i
t
÷r
1
÷:i
t
÷:r
1
=
1
:
0.
CHAPTER 18. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 246
An increase the money supply leads to an increase in the price
level. That’s the only long-run impact of an increase in money
supply.
4. (a) Implicitly di¤erentiate the system with respect to 1:

1
d1
= 1
j
dj
d1
+1
1

S
d1
= o
j
dj
d1

1
d1
=

S
d1
Write it in matrix form:

¸
1 0 ÷1
j
0 1 ÷o
j
1 ÷1 0
¸

¸

1
d1

S
d1
djd1
¸

=

¸
1
1
0
0
¸

Solve for djd1 using Cramer’s rule:
dj
d1
=

1 0 1
1
0 1 0
1 ÷1 0

1 0 ÷1
j
0 1 ÷o
j
1 ÷1 0

=
÷1
1
1
1
÷o
1
0
where the result follows because 1
1
0, 1
j
< 0, and o
j
0.
(b) Implicitly di¤erentiate the system with respect to n:

1
dn
= 1
j
dj
dn

S
dn
= o
j
dj
dn
+o
&

1
dn
=

S
dn
Write it in matrix form:

¸
1 0 ÷1
j
0 1 ÷o
j
1 ÷1 0
¸

¸

1
dn

S
dn
djdn
¸

=

¸
0
o
&
0
¸

CHAPTER 18. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 247
Solve for djdn using Cramer’s rule:
dj
dn
=

1 0 0
0 1 o
&
1 ÷1 0

1 0 ÷1
j
0 1 ÷o
j
1 ÷1 0

=
o
&
1
1
÷o
1
0.
where the result follows because o
&
< 0, 1
j
< 0, and o
j
0.
5. We can write the regression as n = A +c where
n =

¸
6
2
5
¸

and A =

¸
1 9
1 4
1 3
¸

.
The estimated coe¢cients are given by
^
= (A
T
A)
÷1
A
T
n =
1
62

146
23

.
6. (a) The matrix is
A
T
A =

2 6 ÷4
8 24 ÷16

¸
2 8
6 24
÷4 ÷16
¸

=

56 224
224 896

and its determinant is 0.
(b) The second column of r is a scalar multiple of the …rst, and so the
two vectors span the same column space. The regression projects
the n vector onto this column space, but there are in…nitely-many
ways to write the resulting projection as a combination of the two
column vectors.
7.
^
= (A
T
A)
÷1
A
T
n =
1
138

205
64

CHAPTER 18. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 248
8. r
3
= 12r
2
, and so the new variable does not expand the space spanned
by the columns of the data matrix. All it does is make the solution
indeterminant, and the matrix A
T
A will not be invertible. To see
this, note that if we add the column
A =

¸
¸
¸
1 2 24
1 3 36
1 5 60
1 4 48
¸

and A
T
A =

¸
4 14 168
14 54 648
168 648 7776
¸

The determinant of A
T
A is 0. Also, the third column is 12 times the
second column.
9. (a) The eigenvalues are given by the solution to the problem

5 ÷\ 1
4 2 ÷\

= 0.
Taking the determinant yields
(5 ÷\)(2 ÷\) ÷4 = 0
6 ÷7\ +\
2
= 0
\ = 6. 1
Eigenvectors satisfy

5 ÷\ 1
4 2 ÷\

·
1
·
2

=

0
0

.
When \ = 1, this is

4 1
4 1

·
1
·
2

=

0
0

.
There are many solutions, but one of them is

·
1
·
2

=

1
÷4

.
When \ = 6, the equation is

÷1 1
4 ÷4

·
1
·
2

=

0
0

.
CHAPTER 18. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 249
Again there are many solutions, but one of them is

·
1
·
2

=

1
1

.
(b) Use the same steps as before. The eigenvalues are \ = 7 and
\ = 6. When \ = 7 an eigenvector is (1. 3), and when \ = 6 an
eigenvector is (1. 4).
(c) The eigenvalues are \ = 7 and \ = 0. When \ = 7 an eigenvector
is (3. ÷2) and when \ = 0 an eigenvector is (2. 1).
(d) The eigenvalues are \ = 3 and \ = 2. When \ = 3 an eigenvector
is (1. 0) and when \ = 2 an eigenvector is (÷4. 1).
(e) The eigenvalues are \ = 2 + 2

13 and \ = 2 ÷2

13. When \ =
2+2

13 an eigenvector is (÷2

13÷8. 3) and when \ = 2÷2

13
an eigenvector is (2

13 ÷8. 3).
10. (a) Yes. The eigenvalues are \ = 13 and \ = ÷15, both of which
are less than one.
(b) No. The eigenvalues are \ = 54 and \ = 13. The …rst one is
larger than 1, so the system is unstable.
(c) Yes. The eigenvalues are \ = 14 and \ = ÷23, both of which
have magnitude less than one.
(d) No. The eigenvalues are \ = 2 and \ = ÷4145. The …rst one is
larger than 1, so the system is unstable.
Solutions for Chapter 9
1.
\1(r
1
. r
2
. r
3
) =

¸
2r
2
3
+ 3r
2
2
÷8r
1
6r
1
r
2
4r
1
r
3
¸

\1(5. 2. 0) =

¸
÷28
60
0
¸

CHAPTER 18. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 250
2. (a) The second-order Taylor approximation is
1(r
0
) +1
t
(r
0
)(r ÷r
0
) +
1
tt
(r
0
)
2
(r ÷r
0
)
2
We have
1(1) = 2
1
t
(r) = ÷6r
2
÷5
1
t
(1) = ÷11
1
tt
(r) = ÷12r
1
tt
(1) = ÷12
and so the Taylor approximation at 1 is
2 ÷11(r ÷1) ÷
12
2
(r ÷1)
2
= ÷6r
2
+r + 7
(b) We have
1(1) = ÷30
1
t
(r) = 10 ÷
20

r
+
1
r
1
t
(1) = ÷9
1
tt
(r) =
10
r
3
2
÷
1
r
2
1
tt
(1) = 9
and the Taylor approximation at 1 is
÷30 ÷9(r ÷1) +
9
2
(r ÷1)
2
=
9
2
r
2
÷18r ÷
33
2
(c) We have
1(r) = 1
t
(r) = 1
tt
(r) = c
a
1(1) = 1
t
(1) = 1
tt
(1) = c
and the Taylor approximation is
c +c(r ÷1) +
c
2
(r ÷1)
2
=
1
2
c

r
2
+ 1

CHAPTER 18. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 251
3.
1(r) - 1(r
0
) +1
t
(r
0
)(r ÷r
0
) +
1
2
1
tt
(r
0
)(r ÷r
0
)
2
= 12 ÷2r ÷4r
2
4.
1(r) - 1(r
0
) +1
t
(r
0
)(r ÷r
0
) +
1
2
1
tt
(r
0
)(r ÷r
0
)
2
= c +/r +cr
2
The second-degree Taylor approximation gives you a second-order poly-
nomial, and if you begin with a second-order polynomial you get a
perfect approximation.
5. (a) Negative de…nite because c
11
< 0 and c
11
c
22
÷c
12
c
21
= ÷7.
(b) Positive semide…nite because c
11
0 but c
11
c
22
÷c
12
c
21
= 0.
(c) Inde…nite because c
11
0 but c
11
c
22
÷c
12
c
21
= ÷25.
(d) Inde…nite because
[c
11
[ 0,

4 0
0 ÷3

= ÷12, and

4 0 1
0 ÷3 ÷2
1 ÷2 1

÷25.
(e) Positive de…nite because [¹
1
[ = 6 0 and [¹
2
[ = 17 0.
(f) Inde…nite because [¹
1
[ = ÷4 < 0 but [¹
2
[ = ÷240 < 0.
(g) Negative de…nite because [¹
1
[ = ÷2 < 0 and [¹
2
[ = 7 0.
(h) Inde…nite because [¹
1
[ = 3 0, [¹
2
[ = 8 0, and [¹
3
[ = ÷44 < 0.
6. (a) Letting 1(r. n) denote the objective function, the …rst partials are
1
a
= ÷n
1
&
= 8n ÷r
and the matrix of second partials is

1
aa
1
a&
1
&a
1
&&

=

0 ÷1
÷1 8

.
This matrix is inde…nite and so the second-order conditions for a min-
imum are not satis…ed.
CHAPTER 18. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 252
(b) Letting 1(r. n) denote the objective function, the …rst partials are
1
a
= 8 ÷2r
1
&
= 6 ÷2n
and the matrix of second partials is

1
aa
1
a&
1
&a
1
&&

=

÷2 0
0 ÷2

which is negative de…nite. The second-order conditions are satis-
…ed.
(c) We have
\1 =

5n
5r ÷4n

and
H =

0 5
5 ÷4

This matrix is inde…nite because [H
1
[ = 0 but [H
2
[ = ÷25 < 0.
The second-order condition is not satis…ed.
(d) We have
\1 =

12r
6n

and
H =

12 0
0 6

This matrix is positive de…nite because [H
1
[ = 12 0 and [H
2
[ =
72 0. The second-order condition is satis…ed.
7. If it is a convex combination there must be some number t ÷ [0. 1] such
that

6
2

= t

11
4

+ (1 ÷t)

÷1
0

.
Writing these out as two equations gives us
6 = 11t + (÷1)(1 ÷t)
CHAPTER 18. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 253
and
2 = 4t + 0(1 ÷t).
Solving the …rst one yields t = 712 and solving the second one yields
t = 12. These are not the same so it is not a convex combination.
8. Let t be a scalar between 0 and 1. Given r
o
and r
b
, we want to show
that
1(tr
o
+ (1 ÷t)r
b
) _ t1(r
o
) + (1 ÷t)1(r
b
)
Looking at the left-hand side,
1(tr
o
+ (1 ÷t)r
b
) = (tr
o
+ (1 ÷t)r
b
)
2
= (r
b
+tr
o
÷tr
b
)
2
Looking at the right-hand side,
t1(r
o
) + (1 ÷t)1(r
b
) = tr
2
o
+ (1 ÷t)r
2
b
= r
2
b
+tr
2
o
÷tr
2
b
Subtracting the left-hand side from the right-hand side gives us
r
2
b
+tr
2
o
÷tr
2
b
÷(r
b
+tr
o
÷tr
b
)
2
= t(1 ÷t)(r
o
÷r
b
)
2
which has to be nonnegative because t, 1÷t, and anything squared are
all nonnegative.
Solutions for Chapter 10
1. (a) r = 20 and n = 4.
(b) There are two of them: r = 15 and n = 4, and r = 15 and n = 5.
(c) Sum the probabilities along the row to get 0.35.
(d) 0.03 + 0.17 + 0.00 + 0.05 + 0.04 + 0.20 = 0.49.
(e)
1(n _ 2[r _ 20) =
1(n _ 2 and r _ 20)
1(r _ 20)
=
0.23
0.79
=
23
79
CHAPTER 18. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 254
(f) Bayes’ rule says
1(n = 4[r = 20) =
1(r = 20[n = 4) 1(n = 4)
1(r = 20)
.
We have 1(n = 4[r = 20) = 0.200.44 = 511. Also,
1(r = 20[n = 4) 1(n = 4)
1(r = 20)
=
(0.200.27) (0.27)
0.44
=
20
44
=
5
11
.
(g) Two events are statistically independent if the probability of their
intersection equals the product of their probabilities. We have
1(r _ 20) = 0.65
1(n ÷ ¦1. 4¦) = 0.42
1(r _ 20) 1(n ÷ ¦1. 4¦) = (0.65)(0.42) = 0.273
1(r _ 20 and n ÷ ¦1. 4¦) = 0.24
They are not statistically independent.
2. (a) 1(¹) = 0.26 and 1(1) = .18, so ¹ is more likely.
(b) The numbers in parentheses are (c. /) pairs: ¦(4. 1). (4. 3). (5. 3)¦.
(c) 0.73.
(d) 1(/ = 2[c = 5) = 1(/ = 2 and c = 5)1(c = 5) = 0.060.32 =
316 = 0.1875.
(e)
1(c _ 3 and / ÷ ¦1. 4¦) = 0.45
and
1(/ ÷ ¦1. 4¦) = 0.58
so
1(c _ 3[/ ÷ ¦1. 4¦) =
0.45
0.58
= 0.77586
(f) 1(c ÷ ¦1. 3¦ and / ÷ ¦1. 2. 4¦) = 0.14, but 1(c ÷ ¦1. 3¦) = 0.36
and 1(/ ÷ ¦1. 2. 4¦) = 0.73. We have
1((c ÷ ¦1. 3¦)1(/ ÷ ¦1. 2. 4¦) = 0.36 + 0.73 = 0.2628 = 0.14.
They are not statistically independent.
CHAPTER 18. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 255
3. We want 1(disease [ positive), which is
1(disease [ positive) =
1(disease and positive)
1(positive)
.
Note that
1(disease and positive) = 1(positive [ disease) 1(disease)
= 0.95
1
20. 000
= 0.0000475
and
1(positive) = 1(positive [ disease) 1(disease) +1(positive [ healthy) 1(healthy)
= 0.95
1
20. 000
+ 0.05
19. 999
20. 000
= 0.0000475 + 0.0499975
= 0.050045
Now we get
1(disease [ positive) =
1(disease and positive)
1(positive)
=
0.0000475
0.050045
= 0.000949
In spite of the positive test, it is still very unlikely that Max has the
disease.
4. Use Bayes’ rule:
1(entrepreneur [ old) =
1(old [ entrpreneur)1(entrepreneur)
1(old)
Your grad assistant told you that 1(old [ entrepreneur) = 0.8 and that
1(entrepreneur) = 0.3. But she didn’t tell you 1(old), so you must
CHAPTER 18. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 256
calculate it:
1(old) = 1(old [ doctor)1(doctor)
+1(old [ lawyer)1(lawyer)
+1(old [ entrpreneur)1(entrpreneur)
= (0.6)(0.2) + (0.3)(0.5) + (0.8)(0.3)
= 0.51
Plugging this into Bayes’ rule yields
1(entrepreneur [ old) =
(0.8)(0.3)
0.51
= 0.47
47% of old people are entrepreneurs.
Solutions for Chapter 12
1. (a) Plugging in 1(r) = 16 gives us

8
2
r1(r)dr =
1
6

8
2
rdr
=
1
12
r
2

8
2
=
64
12
÷
4
12
= 5
(b) Following the same strategy,

8
2
r
2
1(r)dr =
1
6

8
2
r
2
dr
=
1
18
r
3

8
2
=
512
18
÷
8
18
= 28
2. Leibniz’ rule says
d
dt

b(t)
o(t)
1(r. t)dr =

b(t)
o(t)
·1(r. t)
·t
dr +/
t
(t)1(/(t). t) ÷c
t
(t)1(c(t). t).
CHAPTER 18. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 257
Here
1(r. t) = tr
2
, /(t) = t
2
, and c(t) = ÷t
2
so
·1(r. t)
·t
= r
2
, 1(/(t). t) = t (t
2
)
2
= t
5
, and 1(c(t). t) = t (÷t
2
)
2
= t
5
.
Leibniz’ rule then becomes
d
dt

t
2
÷t
2
tr
2
dr =

t
2
÷t
2
r
2
dr + (2t)(t
5
) ÷(÷2t)(t
5
)
=
r
3
3

t
2
÷t
2
+ 2t
6
+ 2t
6
=
t
6
3
÷
÷t
6
3
+ 4t
6
=
14
3
t
6
.
3. Use Leibniz’ rule:
d
dt

b(t)
o(t)
1(r. t)dr =

b(t)
o(t)
·1(r. t)
·t
dr +/
t
(t)1(/(t). t) ÷c
t
(t)1(c(t). t)
d
dt

4t
2
÷3t
t
2
r
3
dr = 2

4t
2
÷3t
tr
3
dr + (8t) t
2
(4t
2
)
3
÷(÷3) t
2
(÷3t)
3
=
2
4
tr
4

4t
2
÷3t
+ 512t
9
÷81t
5
= 128t
9
÷
81
2
t
5
+ 512t
9
÷81t
5
= 640t
9
÷
243
2
t
5
4. Let 1(r) denote the distribution function for l(c. /), and let G(r)
denote the distribution function for l(0. 1). Then
1(r) =

0 r < c
a÷o
b÷o
for c _ r < /
1 r _ /
CHAPTER 18. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 258
and
G(r) =

0 r < 0
r for 0 _ r < 1
1 r _ 1
First-order stochastic dominance requires 1(r) _ G(r). The require-
ments on c and / are
c _ 0
/ _ 1
The easiest way to see this is by graphing it. But, from looking at the
equations, if 0 _ c _ 1 _ / we can write
G(r) ÷1(r) =

0 r < 0
r 0 _ r < c
r ÷
a÷o
b÷o
for c _ r < 1
1 ÷
a÷o
b÷o
1 _ r < /
0 r _ /
Note that
r ÷
r ÷c
/ ÷c
=
/r ÷cr ÷r +c
/ ÷c
=
c(1 ÷r)
/ ÷c
+
(/ ÷1)r
/ ÷c
which is positive when / _ 1 _ r _ c _ 0, and
1 ÷
r ÷c
/ ÷c
=
/ ÷c ÷r +c
/ ÷c
=
/ ÷r
/ ÷c
which is positive when / _ r. So G(r) _ 1(r) as desired.
Solutions for Chapter 13
1. (a) j = (.10)(7)+(.23)(4)+(.40)(2)+(.15)(÷2)+(.10)(÷6)+(.02)(÷14) =
1. 24
(b) o
2
= (.10)(7 ÷ 1. 24)
2
+ (.23)(4 ÷ 1. 24)
2
+ (.40)(2 ÷ 1. 24)
2
+
(.15)(÷2 ÷ 1. 24)
2
+ (.10)(÷6 ÷ 1. 24)
2
+ (.02)(÷14 ÷ 1. 24)
2
=
16. 762.
CHAPTER 18. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 259
2. (a) The means are
j
;
= (10)(.15) + (15)(.5) + (20)(.05) + (30)(.1) + (100)(.2) = 33
and
j
j
= (10)(.2) + (15)(.3) + (20)(.1) + (30)(.1) + (100)(.3) = 41. 5.
(b) The variances are
o
2
;
= (10 ÷33)
2
(.15) + (15 ÷33)
2
(.5) + (20 ÷33)
2
(.05)
+(30 ÷33)
2
(.1) + (100 ÷33)
2
(.2)
= 1148.5
and
o
2
j
= (10 ÷41.5)
2
(.2) + (15 ÷41.5)
2
(.3) + (20 ÷41.5)
2
(.1)
+(30 ÷41.5)
2
(.1) + (100 ÷41.5)
2
(.3)
= 1495.3
(c) The standard deviations are
o
;
=

1148.5 = 33. 890
and
o
j
=

1495.3 = 38. 669
3. (a)
1(r) =

a
0
1(t)dt
=

a
0
2tdt
= t
2

a
0
= r
2
.
(b) All those things hold.
CHAPTER 18. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 260
(c)
j =

1
0
r 2rdr
= 2

1
0
r
2
dr
=
2
3
r
3

1
0
=
2
3
.
(d) First …nd
1[~ r
2
] =

1
0
r
2
2rdr
= 2

1
0
r
3
dr
=
2
4
r
4

1
0
=
1
2
.
Then note that
o
2
= 1[~ r
2
] ÷j
2
=
1
2
÷
4
9
=
1
18
.
4. (a) For r ÷ [0. 4] we have
1(r) =

a
0
1
8
tdt
=
1
16
t
2

a
0
=
1
16
r
2
Outside of this interval we have 1(r) = 0 when r < 0 and 1(r) = 1
when r 1.
CHAPTER 18. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 261
(b) Using 1(r) = r
2
16, we get 1(0) = 0, 1(4) = 1, and 1
t
(r) =
r8 _ 0.
(c)
j =

4
0
r

1
8
r

dr =
8
3
(d)
o
2
=

4
0

r ÷
8
3

2

1
8
r

dr =
8
9
5. The mean of the random variable c~ r is cj, where j is the mean of ~ r.
The variance is
\ c:(c~ r) = 1[(c~ r ÷cj)
2
]
= 1[c
2
(~ r ÷j)
2
]
c
2
1[~ r ÷j]
2
= c
2
o
2
.
The …rst line is the de…nition of variance, the second factors out the c,
the third works because the expectations operator is a linear operator,
and the third is the de…nition of o
2
.
6. We know that
1[(r ÷j
a
)
2
] = o
2
a
We want to …nd
o
2
&
= 1[(n ÷j
&
)
2
]
Note that j
&
= 3j
a
÷ 1, and that n = 3r ÷ 1. Substituting these in
yields
o
2
&
= 1[(n ÷j
&
)
2
]
= 1[(3r ÷1 ÷(3j
a
÷1))
2
]
= 1[(3r ÷3j
a
)
2
]
= 1[9(r ÷j
a
)
2
]
= 91[(r ÷j
a
)
2
]
= 9o
2
a
.
CHAPTER 18. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 262
7. The mean is j = 3 +
1
2
n. The variance is therefore
o
2
=
1
2
[6 ÷(3 +
1
2
n)]
2
+
1
2
[n ÷(3 +
1
2
n)]
2
=
1
4
n
2
÷3n + 9.
The derivative with respect to n is
do
2
dn
=
1
2
n ÷3.
8. All we have to do is show that G
(2)
(r) _ G
(1)
(r) for all r. We have
G
(2)
(r) ÷G
(1)
(r) =

:1
a÷1
(r)(1 ÷1(r)) +1
a
(r)

÷[1
a
(r)]
= :1
a÷1
(r)(1 ÷1(r)) _ 0.
Solutions for Chapter 14
1. (a)
1(r. n) ~ n = 10 ~ n = 20 ~ n = 30
~ r = 1 .04 .04 .24
~ r = 2 .11 .11 .49
~ r = 3 .13 .24 .69
~ r = 4 .14 .37 1.00
(b) 1
~ a
is given by the last column of part (a), and 1
~ &
is given by the
bottom row of part (a).
(c) 1
~ a
(1) = .24, 1
~ a
(2) = .25, 1
~ a
(3) = .20, 1
~ a
(4) = .31. Similarly,
1
~ &
(10) = .14, 1
~ &
(20) = .23, 1
~ &
(30) = .63.
(d) The formula for conditional density is 1(r[~ n = 20) = 1(r. 20)1
~ &
(20),
which gives us 1(1[~ n = 20) = 0.23 = 0, 1(2[~ n = 20) = 0,
1(3[~ n = 20) = 1123, and 1(4[~ n = 20) = 1223.
(e) Using the marginal density from part (c), the mean is
j
&
= (.14)(10) + (.23)(20) + (.63)(30) = 24.9
CHAPTER 18. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 263
(f) Using part (d),
j
a[&=20
= (0)(1) + (0)(2) + (1123)(3) + (1223)(4) = 8123
(g) No. For the two to be independent we need 1(r. n) = 1
~ a
(r)1
&
(n).
This does not hold. For example, we have 1(3. 20) = .11, 1
~ a
(3) =
.20, and 1
~ &
(20) = .23, which makes 1
~ a
(3)1
~ &
(20) = .046 = .11.
(h) We have
1
a
[~ r[~ n = 10] =
.04(1) +.07(2) +.02(3) +.01(4)
.14
= 2.0
1
a
[~ r[~ n = 20] =
0(1) + 0(2) +.11(3) +.12(4)
.23
= 3.52
1
a
[~ r[~ n = 30] =
.2(1) +.18(2) +.07(3) +.18(4)
.63
= 2.36
1
&
[1
a
[~ r[n] = (.14)(2.0) + (.23)(3.52) + (.63)(2.36) = 2.58
Finally, using the marginal density from part (c) yields
1
a
[~ r] = (.24)(1) + (.25)(2) + (.20)(3) + (.31)(4) = 2.58.
It works.
2. (a)
1(r. n) ~ n = 3 ~ n = 8 ~ n = 10
~ r = 1 0.03 0.05 0.25
~ r = 2 0.05 0.19 0.44
~ r = 3 0.10 0.25 0.71
~ r = 4 0.17 0.43 1.00
(b) 1
~ a
(1) = 0.25, 1
~ a
(2) = 0.44, 1
~ a
(3) = 0.71, 1
~ a
(4) = 1.00 and
1
~ &
(3) = 0.17, 1
~ &
(8) = 0.43, 1
~ &
(10) = 1.00.
(c) 1
~ a
(1) = 0.25, 1
~ a
(2) = 0.19, 1
~ a
(3) = 0.27, 1
~ a
(4) = 0.29 and 1
~ &
(3) =
0.17, 1
~ &
(8) = 0.26, 1
~ &
(10) = 0.57.
(d) 1(~ n = 3[~ r = 1) = 0.12, 1(~ n = 8[~ r = 1) = 0.08, and 1(~ n = 10[~ r =
1) = 0.80.
(e) j
a
= 2.6 and j
&
= 8.29.
CHAPTER 18. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 264
(f) 1[~ r[~ n = 3] = [(0.03)(1) + (0.02)(2) + (0.05)(3) + (0.07)(4)]0.17 =
2. 94.
(g) No. The following table shows the entries for 1
~ a
(r)1
~ &
(n):
1
~ a
(r)1
~ &
(n) ~ n = 3 ~ n = 8 ~ n = 10
~ r = 1 0.043 0.065 0.143
~ r = 2 0.032 0.049 0.108
~ r = 3 0.046 0.070 0.154
~ r = 4 0.049 0.075 0.165
None of the entries are the same as those in the 1(r. n) table.
(h) o
2
a
= 1.32 and o
2
&
= 6.45.
(i) (o·(~ r. ~ n) = ÷0.514
(j) o
a&
= ÷.176.
(k) We have
1
a
[~ r[~ n = 3] =
(.03)(1) + (.02)(2) + (.05)(3) + (.07)(4)
.17
= 2. 94
1
a
[~ r[~ n = 8] =
(.02)(1) + (.12)(2) + (.01)(3) + (.11)(4)
.26
= 2. 81
1
a
[~ r[~ n = 10] =
(.2)(1) + (.05)(2) + (.21)(3) + (.11)(4)
.57
= 2.40
1
&
[1
a
[~ r[n]] = (.17)(2.94) + (.26)(2.81) + (.57)(2.40) = 2.6
which is the same as the mean of ~ r found above.
3. The uniform distribution over [c. /] is
1(r) =
r ÷c
/ ÷c
when r ÷ [c. /], it is 1 for r /, and 0 for r < c. The conditional
distribution is
1(r[r _ c) =
1(r)
1(c)
=
a÷o
b÷o
c÷o
b÷o
=
r ÷c
c ÷c
for r ÷ [c. c], it is 1 for r c, and 0 for r < c. But this is just the
uniform distribution over [c. c].
CHAPTER 18. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 265
4. ~ r and ~ n are independent if 1(r. n) = 1
~ a
(r)1
~ &
(n) or, equivalently, if
1(r[n) = 1(r). This answer uses the latter formulation. We can see
that 1(~ r = ÷1[~ n = 10) = 14, and for ~ r and ~ n to be independent it
must also be the case that 1(~ r = ÷1[~ n = 20) = 14. But
1(~ r = ÷1[~ n = 20) =
c
c +/
.
We also know that c + / must equal 0.6 so that the probabilities sum
to one. Thus,
c
c +/
=
c
0.6
=
1
4
c =
.6
4
= 0.15
/ = 0.6 ÷c = 0.45.
18.1 Solutions for Chapter 15
1. r = 4 and :
2
= 24.
Solutions for Chapter 17
1. (a) Compute the t-statistic
t =
r ÷j
:

:
=
60.02 ÷0
44.37

30
= 7.41
which has 29 degrees of freedom. Use the Excel formula
=TDIST(7.41, 29, 2)
to get the p-value of 0.0000000365. The data reject the hypothesis.
(b) The t-statistic is 3.706, the p-value is 0.000882, and the hypothesis
is rejected.
(c) The t-statistic is -0.614, the p-value for the one-tailed test is 0.27,
and the hypothesis is supported.
CHAPTER 18. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 266
(d) The sample mean is 60.02 which is smaller than 100, so the hy-
pothesis is supported.
2. (a) The best estimate of j is the sample mean r and the best estimate
of o
2
is the sample variance :
2
.
r = 44.2
:
2
= 653.8
(b) Compute the t-statistic
t =
r ÷j
:

:
=
44.2 ÷40
25.6

20
= 0.73
and the t-statistic has 19 degrees of freedom. From here compute
the j-value of 2(1 ÷ TDist(0.73. 19)) = 0.47 0.05 and the data
support the hypothesis.
(c) Compute the t-statistic
t =
r ÷j
:

:
=
44.2 ÷60
25.6

20
= ÷2. 76
and again the t-statistic has 19 degrees of freedom. From here
compute the j-value of 2(1 ÷TDist(2.76. 19)) = .01 25 < 0.05 and
the data reject the hypothesis.
INDEX
alternative hypothesis, 202
astrology, 15, 29
asymptotic theory, 192
almost sure convergence, 192
Central Limit Theorem, 193
convergence in distribution, 193
convergence in probability, 192
Law of Large Numbers, 192
auctions, 161
augmented matrix, 87
Bayes’ rule, 137
likelihood, 138
posterior, 138
prior, 138
Bernoulli distribution, 145
better-than set, 126
binding constraint, 54
binomial distribution, 145
Excel command, 147
sampling from, 200
capacity constraint, 54
cdf, 145
Central Limit Theorem, 193
chain rule
for functions of one variable, 13
for functions of several variables,
25
chi-square distribution, 194
relationship to normal distribution,
195
Cobb-Douglas function, 43
production function, 46
utility function, 43
cofactor, 78, 82
coin ‡ipping, 145
column space, 89, 100
comparative statics analysis, 5, 29, 31,
45, 96
267
INDEX 268
implicit di¤erentiation approach,
30
total di¤erential approach, 32
complementary slackness, 57
component of a vector, 22
concave function, 120
de…nition, 121
conditional density, 177
continuous case, 177
discrete case, 177
general formula, 177
conditional expectation, 181
conditional probability, 136, 177
con…dence level, 203
constant function, 10
continuous random variable, 143
convergence of random variables, 192
almost sure, 192
in distribution, 193
in probability, 192
convex combination, 120
convex function, 122
de…nition, 122
convex set, 124
coordinate vector, 25, 81
correlation coe¢cient, 179, 180
covariance, 179
Cramer’s rule, 79, 81, 97
critical region, 202
cumulative density function, 145
degrees of freedom, 191
density function, 144
derivative, 9
chain rule, 13
division rule, 14
partial, 24
product rule, 12
determinant, 78
di¤erentiation, implicit, 29
discrete random variable, 143
distribution function, 144
dot product, 23
dynamic system, 102, 104
in matrix form, 104
c, 17
econometrics, 98, 99
eigenvalue, 105
eigenvector, 106
Euclidean space, 23
event, 132, 143
Excel commands
binomial distribution, 147
t distribution, 207
expectation, 164, 165
conditional, 181
expectation operator, 165
expected value, 164
conditional, 181
experiment, 131
exponential distribution, 149
exponential function, 17
derivative, 18
F distribution, 199
relation to t distribution, 199
…rst-order condition
for multidimensional optimization,
28
…rst-order condition (FOC), 15
for equality-constrained optimiza-
tion, 40
for inequality-constrained optimiza-
tion, 57, 65
Kuhn-Tucker conditions, 67
INDEX 269
…rst-order stochastic dominance, 159
Hessian, 117
hypothesis
alternative, 202
null, 202
hypothesis testing, 202
con…dence level, 203
critical region, 202
errors, 203
one-tailed, 205
p-value, 206
signi…cance level, 203
two-tailed, 205
identity matrix, 75
IID (independent, identically distrib-
uted), 187
implicit di¤erentiation, 29, 31, 37, 97
independence, statistical, 140
independent random variables, 178
independent, identically distributed,
187
inequality constraint
binding, 54
nonbinding, 55
slack, 55
inner product, 23
integration by parts, 157, 158
inverse matrix, 76
2 2, 82
existence, 81
formula, 82
IS-LM analysis, 95
joint distribution function, 175
Kuhn-Tucker conditions, 67
Kuhn-Tucker Lagrangian, 66
Lagrange multiplier, 39
interpretation, 43, 48, 55
Lagrangian, 39
Kuhn-Tucker, 66
lame examples, 53, 59, 99, 103
Law of Iterated Expectations, 184
Law of Large Numbers, 192
Strong Law, 192
Weak Law, 192
least squares, 98, 100
projection matrix, 101
Leibniz’s rule, 160, 162
likelihood, 138
linear approximation, 114
linear combination, 89
linear operator, 155
linear programming, 62
linearly dependent vectors, 91
linearly independent vectors, 91, 92
logarithm, 17
derivative, 17
logistic distribution, 151
lognormal distribution, 151
lower animals, separation from, 156
Maple commands, 207
marginal density function, 177
martrix
diagonal elements, 73
matrix, 72
augmented, 87, 92
determinant, 78
dimensions, 72
Hessian, 117
idempotent, 102
INDEX 270
identity matrix, 75
inverse, 76
left-multiplication, 75
multiplication, 73
negative semide…nite, 118, 119
nonsingular, 81
positive semide…nite, 119
rank, 88
right-multiplication, 75
scalar multiplication, 73
singular, 81
square, 73
transpose, 75
mean, 165
sample mean, 188
standardized, 193
Monty Hall problem, 139
multivariate distribution, 175
expectation, 178
mutually exclusive events, 132
negative semide…nite matrix, 118, 119
nonbinding constraint, 55
nonconvex set, 124
norm, 24
normal distribution, 148
mean, 165
sampling from, 197
standard normal, 148
variance, 168
null hypothesis, 202
objective function, 30
one-tailed test, 205
order statistics, 169
…rst order statistic, 169, 170
for uniform distribution, 170, 172
second order statistic, 171, 172
orthogonal, 101
outcome (of an experiment), 131
p-value, 206
partial derivative, 24
cross partial, 25
second partial, 25
pdf, 145
population, 187
positive semide…nite matrix, 119
posterior probability, 138
prior probability, 138
probability density function, 145
probability distributions
binomial, 145
chi-square, 194
F, 199
logistic, 151
lognormal, 151
normal, 148
t, 198
uniform, 147
probability measure, 132, 143
properties, 132
quasiconcave, 126
quasiconvex, 127
random sample, 187
random variable, 143, 164
continuous, 143
discrete, 143
IID, 187
rank (of a matrix), 88, 92
realization of a random variable, 143
rigor, 3
row-echelon decomposition, 87, 88, 92
row-echelon form, 87
INDEX 271
sample, 187
sample frequency, 200
sample mean, 188
standardized, 193
variance of, 188
sample space, 131
sample variance, 189, 190
mean of, 191
scalar, 11, 23
scalar multiplication, 23
search, 181
second-order condition (SOC), 16, 116
for a function of : variables, 118,
119
second-price auction, 161, 163
signi…cance level, 203
span, 89, 92
stability conditions, 103
using eigenvalues, 109
stable process, 103
standard deviation, 167
standardized mean, 193
statistic, 187
statistical independence, 140, 178
and correlation coe¢cient, 179
and covariance, 179
statistical test, 202
submatrix, 78
support, 145
system of equations, 76, 86
Cramer’s rule, 79, 81, 97
existence and number of solutions,
91
graphing in (r. n) space, 89
graphing in column space, 89
in matrix form, 76, 97
inverse approach, 87
row-echelon decomposition approach,
88
t distribution, 198
relation to F distribution, 199
t-statistic, 199
Taylor approximation, 115
for a function of : variables, 117
test statistic, 202
told you so, 48
total di¤erential, 31
transpose, 75, 98
trial, 131
two-tailed test, 205
type I error, 203
type II error, 203
unbiased, 191
uniform distribution, 147
…rst order statistic, 170
mean, 165
second order statistic, 172
variance, 167
univariate distribution function, 175
variance, 166
vector, 22
coordinate, 25
dimension, 23
in matrix notation, 73
inequalities for ordering, 24
worse-than set, 127
Young’s Theorem, 25

Acknowledgments Valentina Kozlova, Kelly Padden, and John Tilstra provided valuable proofreading assistance on the first version of this book, and I am grateful. Other mistakes were found by the students in my class. Of course, if they missed anything it is still my fault. Valentina and Bruno Wichmann have both suggested additions to the book, including the sections on stability of dynamic systems and order statistics. The cover picture was provided by my son, Henry, who also proofread parts of the book. I have always liked this picture, and I thank him for letting me use it.

CONTENTS

1 Econ and math 1.1 Some important graphs . . . . . . . . . . . . . . . . . . . . . . 1.2 Math, micro, and metrics . . . . . . . . . . . . . . . . . . . . .

1 2 4

I

Optimization (Multivariate calculus)
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

6
7 8 9 14 16 17 18 21 21 22 24 26

2 Single variable optimization 2.1 A graphical approach . . . . . . . . . . . 2.2 Derivatives . . . . . . . . . . . . . . . . . 2.3 Uses of derivatives . . . . . . . . . . . . 2.4 Maximum or minimum? . . . . . . . . . 2.5 Logarithms and the exponential function 2.6 Problems . . . . . . . . . . . . . . . . . . 3 Optimization with several variables 3.1 A more complicated pro…t function 3.2 Vectors and Euclidean space . . . . 3.3 Partial derivatives . . . . . . . . . . 3.4 Multidimensional optimization . . . i . . . . . . . . . . . .

ii 3.5 Comparative statics analysis . . . . . . . . . . . . . . . . . . . 29 3.5.1 An alternative approach (that I don’ like) . . . . . . . 31 t 3.6 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4 Constrained optimization 4.1 A graphical approach . . . . . . . . . 4.2 Lagrangians . . . . . . . . . . . . . . 4.3 A 2-dimensional example . . . . . . . 4.4 Interpreting the Lagrange multiplier . 4.5 A useful example - Cobb-Douglas . . 4.6 Problems . . . . . . . . . . . . . . . . 5 Inequality constraints 5.1 Lame example - capacity constraints 5.1.1 A binding constraint . . . . . 5.1.2 A nonbinding constraint . . . 5.2 A new approach . . . . . . . . . . . . 5.3 Multiple inequality constraints . . . . 5.4 A linear programming example . . . 5.5 Kuhn-Tucker conditions . . . . . . . 5.6 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 37 39 40 42 43 48 52 53 54 55 56 59 62 64 67

II

Solving systems of equations (Linear algebra)
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

71
72 72 76 77 79 81 83 86 87 87 87 89

6 Matrices 6.1 Matrix algebra . . 6.2 Uses of matrices . . 6.3 Determinants . . . 6.4 Cramer’ rule . . . s 6.5 Inverses of matrices 6.6 Problems . . . . . .

7 Systems of equations 7.1 Identifying the number of solutions 7.1.1 The inverse approach . . . . 7.1.2 Row-echelon decomposition 7.1.3 Graphing in (x,y) space . .

iii 7.1.4 Graphing in column space . . . . . . . . . . . . . . . . 89 7.2 Summary of results . . . . . . . . . . . . . . . . . . . . . . . . 91 7.3 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 8 Using linear algebra in economics 8.1 IS-LM analysis . . . . . . . . . . . . 8.2 Econometrics . . . . . . . . . . . . . 8.2.1 Least squares analysis . . . . 8.2.2 A lame example . . . . . . . . 8.2.3 Graphing in column space . . 8.2.4 Interpreting some matrices . 8.3 Stability of dynamic systems . . . . . 8.3.1 Stability with a single variable 8.3.2 Stability with two variables . 8.3.3 Eigenvalues and eigenvectors . 8.3.4 Back to the dynamic system . 8.4 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 95 98 98 99 100 101 102 102 104 105 108 109

9 Second-order conditions 9.1 Taylor approximations for R ! R . . . . . . . 9.2 Second order conditions for R ! R . . . . . . 9.3 Taylor approximations for Rm ! R . . . . . . 9.4 Second order conditions for Rm ! R . . . . . 9.5 Negative semide…nite matrices . . . . . . . . . 9.5.1 Application to second-order conditions 9.5.2 Examples . . . . . . . . . . . . . . . . 9.6 Concave and convex functions . . . . . . . . . 9.7 Quasiconcave and quasiconvex functions . . . 9.8 Problems . . . . . . . . . . . . . . . . . . . . .

114 . 114 . 116 . 116 . 118 . 118 . 119 . 120 . 120 . 124 . 128

III

Econometrics (Probability and statistics)
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

130
131 . 131 . 132 . 134 . 136

10 Probability 10.1 Some de…nitions . . . . . . . . . 10.2 De…ning probability abstractly . 10.3 De…ning probabilities concretely 10.4 Conditional probability . . . . .

. 157 . . . . . .4 Problems . . . .4. . . . . . . . . . . . .1 Interpreting integrals . . . 13. . . 13. . . . . . . . . . . . . .3. . . . . . .2. . . . . .4 Application: Order statistics 13. . . . . . . . . . . . . . . . . . . . . .2 Uniform distribution . . . . . . 159 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 . . . . . . . . . . . . . 12. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 . . . 13. . . 11. . . 163 . . .1 Uniform distribution 13. . . 12 Integration 12. . . . . . . . . . . .5 10. . . . . . . . . . .5 Lognormal distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4 Exponential distribution . . . . . . . . . . . . . .4. .1 Mathematical expectation . 137 139 140 141 143 143 144 144 145 145 147 148 149 151 151 11 Random variables 11. . . . . . . . . . . . . . . . . . . 11. . . . . .3 Density functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12. . . . . . . . . . . .2 The mean .2 Distribution functions . . . . . .4 Useful distributions . . . . .1 Application: Choice between lotteries 12. . . . . . . . . . . .3 Di¤erentiating integrals . . . . . . . . . 161 .2 Normal distribution . . 11. . . . . . . . . . . . . .3. . . . . . . 13. .2. . . .4. . . . . . . . . . . . . . . . . . . .2. . .3 Variance . . . .3. . . . . . . . .8 Bayes’rule . . . . . . . . . .4. . . . . . . . . 156 . . . . . 11. . . . . . . . . . . 12. . . . . . . . . . . . . . . . . . . .5 Problems .6 Logistic distribution . .7 10.iv 10. . . .2 Integration by parts . . . . . . . . .1 Uniform distribution 13. . . . . . . . . . . . . . . . . . . 13 Moments 13. . 11. . . . . . .1 Application: Second-price auctions . . . . 11. . . . . . . . . . . . Monty Hall problem . 11. . . . . . . 164 164 165 165 165 166 167 168 168 173 . . . . 11. . . . . . . . . . . . . . . . . .2 Normal distribution . . . . . . . . . . . . . . . . .1 Binomial (or Bernoulli) distribution 11. . . . . . . . . . . . . . 13. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 Normal (or Gaussian) distribution . . . . . 12.1 Random variables . .4. . . . . . . .4. . . . . . . Statistical independence Problems . . . . . .6 10.

. . . . . . . . . . .2 Sample mean . . . . . . . .5 Problems . . . . . . . . . . 14. . . . 17. . . . . . . . . . . . . . . . . . . . . . 187 187 188 189 192 192 193 193 194 194 196 198 200 201 202 205 207 207 208 208 209 16 Sampling distributions 16. . . . . . . . . . . . .4. . . . . . . . . . 16. . . . . .3. . .2 The Law of Iterated Expectations . . . . . . . . . . . . . . . . 15. . . . . . . . 14. . . . . . . . . . . . . . 14. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14. . . .1 Bivariate distributions .1 Using conditional expectations . . . . . . 15. .3. . . . . . . . . . . . . . . . . . . . . . . . . . . 176 . . . . . . . . . . . . . . . . . . . 15. . . . . . . . . .3 t and F distributions . . . . . 181 . . . . . . . . . . . . 15. . . . . .4 Problems . . . . .4 Convergence of random variables 15. . . . . .3 Sample variance . . . . . .1 Solutions for Chapter 15 . . . .3 Examples . . . . . . . . . . . . . . .1 Structure of hypothesis tests . . . . . . .4 Conditional expectations . . . . . . . . . . . . . . . . . .1 Chi-square distribution . . . . . .3 Example 3 . . . . . . . . 181 . . .5 Problems . .4.1 Law of Large Numbers . . . . 18 Solutions to end-of-chapter problems 211 18. . .2 Example 2 . . . . 17. . . 15 Statistics 15. . . . . . . . . . 17. . . . . . . . . . . . . . . 178 . 15. . . . . . . 14. . . . . . . 17. . . . . . . . . . . . . . . . . . . . . . 16. . .2 Central Limit Theorem . . . . 17. . .2 Sampling from the normal distribution . . . . . . . . . .4. . . . . . . . . . . . . . .v 14 Multivariate distributions 14. . . . . 185 . . . . . . . . . . . 17. . 14.3 Expectations . . . . . . . . . . . . . . . . . .2 One-tailed and two-tailed tests . . . . . . . . . . . . . . . .4. . . . . . . . 175 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16. . . . . . . . .4 Sampling from the binomial distribution 17 Hypothesis testing 17. . . . . . . . . . . . . . . . . . . . . . 175 . . . . . . . . . . . . .3. . . . . . . . . . .2 Marginal and conditional densities . . 265 Index 267 . 184 . . . . . . . . . . . . . .calculating the bene…t of search . . . . .1 Example 1 . .1 Some de…nitions . .

In undergraduate courses economic arguments are often made using graphs. Both of these techniques require some math. with giving examples. In graduate courses we tend to use equations. Part of getting comfortable about using math to do economics is knowing how to go from graphs to the underlying equations. that is. In sociology one can often get by with anecdotal evidence. In economics there are two primary ways one can justify an assertion. is to teach you to speak mathematics as a second language.CHAPTER 1 Econ and math Every academic discipline has its own standards by which it judges the merits of what researchers claim to be true. In history it requires links to the original sources. to make you comfortable talking about economics using the shorthand of mathematics. 1 . either using empirical evidence (econometrics or experimental work) or mathematical arguments. that is. A second goal. and one purpose of this course is to provide you with the mathematical tools needed to make and understand economic arguments. In the physical sciences this typically requires experimental veri…cation. though. But equations often have graphical counterparts and vice versa. and part is going from equations to the appropriate graphs.

3 is completely di¤erent. the graph depicts the fundamental consumer choice problem. and the curve is an indi¤erence curve. the …gure shows how an individual or …rm chooses an amount of some activity.2 depicts a di¤erent situation. the graph illustrates the …rm’ cost-minimization problem. If the upward-sloping line is marginal cost and the downward-sloping line is marginal bene…t. If the axes are commodities.1 Some important graphs One of the fundamental graphs is shown in Figure 1. but that just ampli…es its importance. How do we characterize tangency? At an even more basic level. the curve is an isoquant. It shows a collection of points with a . and the line is an iso-cost line.1.2 are: How do we …nd the point where the two lines intersect? How do we …nd the change from one intersection point to another? And how do we know that two curves will intersect in the …rst place? Figure 1. the graph shows how the market price is determined. The questions for Figure 1. The axes and curves are not labeled.1: A constrained choice problem 1. how do we …nd slopes of curves? How do we write conditions for the curve to be curved the way it is? And how do we do all of this with equations instead of a picture? Figure 1.1 raises several issues. s Figure 1. How do we write the equations for the line and the curve? The line and curve seem to be tangent. the line is a budget line. ECON AND MATH 2 Figure 1.CHAPTER 1. If the axes are inputs. If the upward-sloping line is a supply curve and the downward-sloping one is a demand curve.

but we might want to consider several markets simultaneously. So. it allows us to make sure that our statements are true. that is. but not on income. but we might want to talk about more. There are more as well. if the horizontal axis measures the quantity of a good and the vertical axis measures its price. though. ECON AND MATH 3 Figure 1. and a primary reason for using equations instead of graphs. It provides rigor. The third graph allows quantity demanded to depend on price. or any other factors. an important question.2: Solving simultaneous equations line …tting through them. and so we know that our arguments are correct and we can then devote attention to the quality of the assumptions underlying them. For example. For the second graph it means supply and demand for one commodity. . For the …rst graph that means consumer choice with only two commodities. prices of other goods. How do we …t the best line through these points? This is the key to doing empirical work. All of them. the points could be observations of a demand curve. is how do we handle more than two dimensions? Math does more for us than just allow us to expand the number of dimensions. All of our assertions will be logical conclusions from our initial assumptions. require that we restrict attention to two dimensions. How do we …nd the demand curve that best …ts the data? These three graphs are fundamental to economics.CHAPTER 1.

An equilibrium is simply a state in which there is no pressure for anything to change. Graphically. It moves beyond basic calculus in two ways.2. micro.2 Math. not the amount of a single commodity. as is …nding what a consumer purchases to maximize utility. and the market-clearing price is the one at which suppliers have no incentive to raise or lower their prices and consumers have no incentive to raise or lower their o¤ers. and it begins with basic calculus. equilibrium analysis requires …nding the intersection of two curves. ECON AND MATH 4 Figure 1. Instead. For example. First. consumers maximize utility subject to a budget constraint. and metrics The theory of microeconomics is based on two primary concepts: optimization and equilibrium. Finding how much a …rm produces to maximize pro…t is an example of an optimization problem.3: Fitting a line to data points 1. Optimization problems usually require …nding maxima or minima. consumers choose commodity bundles. To analyze problems with several choice variables. as illustrated in Figure 1. and calculus is the mathematical tool used to do this. Mathematically. Second. the problem is not just a simple maximization problem. economic problems often have agents simultaneously choosing the values of more than one variable. Finding the market-clearing price is an equilibrium problem. as in Figure 1. it involves the solution of .CHAPTER 1. The …rst section of the book is devoted to the theory of optimization. we need multivariate calculus. though. We must …gure out how to perform constrained optimization.1. Solutions to games are also based on the concept of equilibrium.

which is the third branch of mathematics we will study. econometrics. So. if one estimates a demand equation in which quantity demanded is the dependent variable and the good’ price.CHAPTER 1. ." and we use calculus to …nd optima. ECON AND MATH 5 several equations in several unknowns.3. because matrices turn out to provide an easy way to present the relevant equations. Much of it is linear algebra. for example. A typical empirical project involves estimating an equation that relates a dependent variable to a set of independent variables. and income are independent variables. Mathematically. and then uses data to estimate the comparative statics results second. Econometrics also requires a knowledge of probability and statistics. It isn’ t really. Consequently. The branch of mathematics used for this is linear (or matrix) algebra. But getting you to the point where you can perform comparative statics analysis means going through these two parts of mathematics. on average. There is some math behind that task. For example. But this is a comparative statics question. econometrics and comparative statics analysis go hand-in-hand. Econometrics itself is the task of …tting the best line to a set of data points. when one of the independent variables changes. Comparative statics analysis is also at the heart of empirical work. some s complement good prices. how does a consumer’ optimal s bundle change when the underlying commodity prices change? How does a …rm’ optimal output change when an input or an output price changes? s How does the market-clearing price change when an input price changes? All of these questions are answered using comparative statics analysis. The estimated equation then tells how the dependent variable changes. comparative statics analysis involves multivariable calculus. because "best" implies "optimal. often in combination with matrix algebra. and so we must learn to manipulate matrices and use them to solve systems of equations. A good empirical project uses some math to derive the comparative statics results …rst. for example. A little bit of the math is calculus. the resulting equation tells how much quantity demanded changes when income rises. as in Figure 1. that is. This makes it sound hard. which involves …nding how the optimum or equilibrium changes when one of the underlying parameters changes. Economic exercises often involve comparative statics analysis. some substitute good prices.

PART I    OPTIMIZATION            (multivariate calculus)        .

di¤erential calculus. So. the one in which the agent must choose the value of only a single variable. Many of you have already taken calculus. and that the other is to teach you to frame economic questions. or government agencies.CHAPTER 2 Single variable optimization One feature that separates economics from the other social sciences is the premise that individual actors. workers. and this chapter concerns singlevariable. mathematically. …rms. Remember that one purpose of this course is to introduce you to the mathematical tools and techniques needed to do economics at the graduate level. In light of the second goal. in economics everybody maximizes something. we will begin with a graphical analysis of optimization and then …nd the math that underlies the graph. whether they are consumers. In later chapters we explore optimization problems in which the agent chooses the values of several variables simultaneously. In other words. doing mathematical economics requires an ability to …nd maxima and minima of functions. and their answers. This chapter takes a …rst step using the simplest possible case. act rationally to make themselves as well o¤ as possible. One di¤erence between teaching calculus in 7 .

The pro…t function is (q) = R(q) C(q): The …rst term on the right-hand side is the …rm’ revenue. is revenue minus cost. as always. 2.1: A pro…t function with a maximum an economics course and teaching it in a math course is that economists almost never use trigonometric functions. and they are introduced in this chapter. The maximum level of pro…t is .1 stand out. First. Graphically this is very easy. When the …rm produces and sells q units it earns revenue R(q) and incurs costs of C(q). Figure 2. Increasing q beyond q reduces pro…t. and .1 shows the …rm’ pro…t funcs tion. how do we do it with equations instead? Two features of Figure 2. which is achieved when output is q . need logarithms and exponential functions. The economy has cycles. More importantly for this chapter. however. but none regular enough to model using sines and cosines. We will. So. Pro…t. SINGLE VARIABLE OPTIMIZATION \$ π* 8 π(q) q q* Figure 2. The question is.1 A graphical approach Consider the case of a competitive …rm choosing how much output to produce. and the second s term is its cost. at the maximum the slope of the pro…t function is zero. we will skip trigonometric functions.CHAPTER 2.

1) h!0 dx h The idea is as follows. we have to make sure that the pro…t function is rising then falling. To see why this is important. and not minimized. (3) We must make sure that pro…t really is maximized at q . the pro…t function rises up to q and then falls. (2) We must …nd q by …nding where the slope is zero. Suppose we start at x and consider a change to x + h. The derivative of the function f at x is de…ned as f (x + h) f (x) df (x) = lim : (2. (1) We must …nd the slope of the pro…t function. compare it to Figure 2.CHAPTER 2. It can be denoted in two ways: f 0 (x) or df (x)=dx. Then f changes from f (x) to f (x + h).2 Derivatives The derivative of a function provides its slope at a point.3. 2. The ratio of the change in f to the change in x is a measure of the slope: . while in Figure 2. In Figure 2. This leaves us with several tasks. (4) We must relate our …ndings back to economics.2.2 the pro…t function falls to the minimum then rises. with the help of Figure 2. Second. where the pro…t function has a minimum.1 it rises to the maximum then falls. To make sure we have a maximum.2: A pro…t function with a minimum decreasing q below q also reduces pro…t. SINGLE VARIABLE OPTIMIZATION \$ π(q) 9 q Figure 2.

you end up with the derivative. SINGLE VARIABLE OPTIMIZATION f(x) 10 f(x+h) f(x) f(x) x x x+h Figure 2. Proof. a constant function. The theorem says that the derivative of a constant function is zero. is just a horizontal line. We present these rules as a series of theorems. Theorem 2 Suppose f (x) = x. And it helps to have a few simple rules in hand. Theorem 1 Suppose f (x) = a.CHAPTER 2. Then f 0 (x) = 1. and horizontal lines have slopes of zero. one that yields the same value for every possible x.1). that is. f 0 (x) = lim f (x + h) h!0 h a a = lim h!0 h = 0: f (x) Graphically. Finding the derivative comes from applying the formula in equation (2. Then f 0 (x) = 0.3: Approximating the slope of a function [f (x + h) f (x)]=[(x + h) x]. in the limit. and. . Make the change in x smaller and smaller to get a more precise measure of the slope.

Theorem 4 Suppose f (x) = u(x) + v(x). SINGLE VARIABLE OPTIMIZATION Proof. When you multiply a function by a scalar (or constant). f 0 (x) = lim f (x + h) f (x) h!0 h [u(x + h) + v(x + h)] [u(x) + v(x)] = lim h!0 h u(x + h) u(x) v(x + h) u(x) = lim + h!0 h h 0 0 = u (x) + v (x): . Theorem 3 Suppose f (x) = au(x).CHAPTER 2. f 0 (x) = lim f (x + h) f (x) h!0 h (x + h) x = lim h!0 h h = lim h!0 h = 1: 11 Graphically. Proof. Graphically. and the slope of the 45-degree line is one. you also multiply the derivative by the same scalar. Proof. f 0 (x) = lim f (x + h) f (x) h!0 h au(x + h) au(x) = lim h!0 h u(x + h) u(x) = a lim h!0 h 0 = au (x): This theorem provides a useful rule. Then f 0 (x) = u0 (x) + v 0 (x). Then f 0 (x) = au0 (x). The theorem con…rms that the derivative of this function is one. multiplying by a scalar rotates the curve. the function f (x) = x is just a 45-degree line.

Proof. Then f 0 (x) = Proof. Then f 0 (x) = u0 (x)v(x)+u(x)v 0 (x). f 0 (x) = lim f (x + h) h!0 h 1 u(x+h) u0 (x)=[u(x)]2 . which tells how to take the derivative of the product of two functions. SINGLE VARIABLE OPTIMIZATION 12 This rule says that the derivative of a sum is the sum of the derivatives. Theorem 6 Suppose f (x) = 1=u(x). and it is provided in the next theorem. Remembering that the limit of a product is the product of the limits. The next theorem is the product rule. Theorem 5 Suppose f (x) = u(x) v(x). the above expression reduces to f 0 (x) = lim [u(x + h) u(x)] [v(x + h) v(x) + lim u(x + h) h!0 h!0 h h 0 0 = u (x)v(x) + u(x)v (x): v(x)] We need a rule for functions of the form f (x) = 1=u(x).CHAPTER 2. f (x) 1 u(x) = lim h u(x) u(x + h) = lim h!0 h[u(x + h)u(x)] [u(x + h) u(x)] 1 = lim lim h!0 h!0 u(x + h)u(x) h 1 = u0 (x) : [u(x)]2 h!0 . f 0 (x) = lim f (x + h) f (x) h [u(x + h)v(x + h)] [u(x)v(x)] = lim h!0 h [u(x + h) u(x)]v(x) u(x + h)[v(x + h) + = lim h!0 h h h!0 v(x)] where the move from line 2 to line 3 entails adding then subtracting limh!0 u(x+ h)v(x)=h.

h2 . Theorem 7 Suppose f (x) = u(v(x)). and we are done. Then there exists a sequence h1 .and f 0 (x) = lim f (x + h) f (x) h!0 h u(v(x + h)) u(v(x)) = lim h!0 h u(b) u(b) = lim h!0 h = 0: But u0 (v(x)) v 0 (x) = 0 since v 0 (x) = 0. Let b = v(x) for all x. First suppose that there is some sequence h1 . functions of functions. Proof. Then f 0 (x) = lim = = = = f (x + h) f (x) h!0 h u(v(x + h)) u(v(x)) lim h!0 h u(v(x + h)) u(v(x)) v(x + h) v(x) lim h!0 v(x + h) v(x) h u(v(x) + k)) u(v(x)) v(x + h) v(x) lim lim k!0 h!0 k h 0 0 u (v(x)) v (x): Now suppose that there is no sequence as de…ned above. SINGLE VARIABLE OPTIMIZATION 13 Our …nal rule concerns composite functions. h2 . that is. ::: with limi!1 hi = 0 and v(x + hi ) v(x) = 0 for all i. This rule is called the chain rule.2) . Combining these rules leads to the following really helpful rule: d a[f (x)]n = an[f (x)]n 1 f 0 (x): dx (2.CHAPTER 2. Then f 0 (x) = u0 (v(x)) v 0 (x). ::: with limi!1 hi = 0 and v(x + hi ) v(x) 6= 0 for all i.

for example. and even if n is not an integer. and f3 (x) = 3x2 . SINGLE VARIABLE OPTIMIZATION 14 This holds even if n is negative. Plugging this all into the formula above gives us (3x2 + 2)(4x f (x) = x3 0 1) 4(x3 + 2x) + x3 3(x3 + 2x)(4x x6 1)x2 : 2. and f3 (x) = x3 . So.3 Uses of derivatives In economics there are three major uses of derivatives. f2 (x) = 0 4. for example. Getting more ridiculously complicated. f2 (x) = 4x 1. the derivative of xn is nxn 1 . Combining the rules also gives us the familiar division rule: u0 (x)v(x) v 0 (x)u(x) d u(x) : = dx v(x) [v(x)]2 (2." In principles of economics courses. f1 (x) = x3 + 2x.2) to get d u(x)v 1 (x) dx = u0 (x)v 1 (x) + ( 1)u(x)v 2 (x)v 0 (x) u0 (x) = v(x) v 0 (x)u(x) : v 2 (x) Multiplying both the numerator and denominator of the …rst term by v(x) yields (2.3).CHAPTER 2. consider f (x) = (x3 + 2x)(4x x3 1) : To di¤erentiate this thing. split f into three component functions. The derivative of (4x2 1) :4 is :4(4x2 1) 1:4 (8x). and 0 0 0 f1 (x)f2 (x) f1 (x)f2 (x) f1 (x)f2 (x)f3 (x) 0 f (x) = + : f3 (x) f3 (x) [f3 (x)]2 0 We can di¤erentiate the component functions to get f1 (x) = 3x2 + 2.3) To get it. and the derivative of (2x + 1)5 is 10(2x + 1)4 . We can then use the product rule and expression (2. Then f (x) = f1 (x) f2 (x)=f3 (x). rewrite u(x)=v(x) as u(x) [v(x)] 1 . marginal cost is . The …rst use comes from the economics idea of "marginal this" and "marginal that.

and we want to interpret these. and so on.4) which reduces to the familiar rule that a …rm maximizes pro…t by producing where marginal revenue equals marginal cost.1. The pro…t function is (q) = R(q) C(q). Using our rules for di¤erentiation. We have not used numbers or speci…c functions and. aside from homework exercises. We could divide output up into smaller units. This is where we started the chapter. if the marginal cost function M C(q) or. (2. We end up in the same place we do using graphs. Consider the function y = f (x). and if f 0 (x) 0 we know that y decreases when x increases. C 0 (q) is positive we know that an increase in output leads to an increase in cost. for example. marginal revenue is the derivative of the revenue function. we can rewrite the FOC as d = R0 (q ) dq C 0 (q ) = 0. The third use of derivatives is for …nding maxima and minima of functions. We know that R0 (q) is marginal revenue and C 0 (q) is marginal cost. SINGLE VARIABLE OPTIMIZATION 15 de…ned as the additional cost a …rm incurs when it produces one more unit of output. though. The derivative f 0 (x) measures the change in y when x changes. with a competitive …rm choosing output to maximize pro…t. Notice what we have done here. The second use of derivatives comes from looking at their signs (the astrology of derivatives). If the cost function is C(q). So. or d = 0: dq This condition is called a …rst-order condition. and so if f 0 (x) 0 we know that y increases when x increases. Continually dividing output into smaller and smaller units of size h leads to the de…nition of marginal cost as c(q + h) c(q) : M C(q) = lim h!0 h Marginal cost is simply the derivative of the cost function. marginal cost is C(q + 1) C(q). for example. we rarely will. We might ask whether an increase in x leads to an increase in y or a decrease in y.CHAPTER 2. often abbreviated as FOC. where q is quantity. As we saw in Figure 2. by measuring in grams instead of kilograms. Using general functions leads to expressions involving general functions. pro…t is maximized when the slope of the pro…t function is zero. which is a good . Similarly. equivalently.

For the function to have a maximum. which holds if R00 (q) 0 and C 00 (q) 0. For the function to have a minimum. we can guarantee that pro…t is maximized if the second derivative of the revenue function is nonpositive and the second derivative of the cost function is nonnegative. One special case that receives considerable attention in economics is the one in which R(q) = pq. and this has an economic interpretation: each additional unit of output adds more to total cost than any unit preceding it. the derivative should be increasing.1 the slope of the curve decreases as q increases. which means that the …rm earns less from each additional unit it sells. Each of these is called a second-order condition or SOC. We can guarantee that pro…t is maximized globally if 00 (q) 0 for all possible values of q. Let’ look at the condition a little more closely. and the second-order condition for a minimum is f 00 (x) 0. SINGLE VARIABLE OPTIMIZATION 16 thing. which means that the second derivative should be positive. the condition C 00 (q) 0 means that marginal cost is increasing. like in Figure 2. but Figure 2.CHAPTER 2. The …rst s derivative of the pro…t function is 0 (q) = R0 (q) C 0 (q) and the second derivative is 00 (q) = R00 (q) C 00 (q). 2.4 Maximum or minimum? Figure 2. The second derivative of the function f (x) is denoted f 00 (x) or d2 f =dx2 . where p is the price of the good. or second derivatives. The condition R00 (q) 0 means that marginal revenue is decreasing. The second-order condition for a maximum is f 00 (x) 0.2 shows one with a minimum.1 shows a pro…t function with a maximum. Remembering that C 0 (q) is marginal cost. The power of the mathematical approach is that it allows us to apply the same techniques in situations where graphs will not work. This is the . while in Figure 2. which means that the second derivative should be negative. We can guarantee that pro…t is maximized. Both of them generate the same …rst-order condition: d =dq = 0. like in Figure 2. if 00 (q ) 0.2. So what property of the function tells us that we are getting a maximum and not a minimum? In Figure 2.1.2 the slope of the curve increases as q increases. Since slopes are just derivatives of the function. So. at least locally. The second-order condition for a maximum is 00 (q) 0. we can express these conditions mathematically by taking derivatives of derivatives. the derivative should be decreasing.

SINGLE VARIABLE OPTIMIZATION 17 revenue function for a price-taking …rm in a perfectly competitive industry. dx x Logarithms have two additional useful properties: ln xy = ln x + ln y: and ln(xa ) = a ln x: Combining these yields ln(xa y b ) = a ln x + b ln y: (2.CHAPTER 2. because that expression does not exist.5) . 2. We know that d xn = xn 1 . So how do we get x 1 as a derivative? The answer is the natural logarithm: 1 d ln x = . But how can we get the function x 1 ? We cannot get it by di¤erentiating x0 =0. The number e 2:718. We cannot get it by di¤erentiating x0 . the function x by di¤erentiating x2 =2. dx n We can get the function x2 by di¤erentiating x3 =3. let me show you why they are special for math. The second-order condition reduces to C 00 (q) 0. Without going into why these functions are special for economics. Then R0 (q) = p and R00 (q) = 0. They are related: ln ex = eln x = x. which is the familiar condition that price equals marginal cost. The …rst is the natural logarithm. which says that marginal cost must be nondecreasing. and the …rst-order condition for pro…t maximization is p C 0 (q) = 0. because dx0 =dx = 0. and the second is the exponential function. and so on. the function x 2 by di¤erentiating x 1 .5 Logarithms and the exponential function The functions ln x and ex turn out to play an important role in economics. the function x 3 by di¤erentiating x 2 =2.

.1) to show that the derivative of x2 is 2x. SINGLE VARIABLE OPTIMIZATION 18 The left-hand side of this expression is non-linear.6 Problems (a) f (x) = 12(x3 + 1)2 + 3 ln x 2 (b) f (x) = 1=(4x (c) f (x) = e (e) f (x) = 2)5 14x3 +2x 1. that is. The exponential function ex also has an important di¤erentiation property: it is its own derivative. Compute the derivatives of the following functions: 5x 4 (d) f (x) = (9 ln x)=x0:3 ax2 b cx d 2. but the right-hand side is linear in the logarithms. 4. 2. Compute the derivative of the following functions: (a) f (x) = 12(x (c) h(x) = 1=(3x2 (d) f (x) = xe (e) g(x) = (2x2 x 1)2 2x + 1)4 (b) g(x) = (ln 3x)=(4x2 ) p 3) 5x3 + 6 8 9x 3.CHAPTER 2. Use the de…nition of a derivative to prove that the derivative of 1=x is 1=x2 . Economists often use the form in (??) for utility functions and production functions. d x e = ex : dx This implies that the derivative of eu(x) = u0 (x)eu(x) . Use the de…nition of the derivative (expression 2. which makes it easier to work with.

Answer the following: (a) Is f (x) = 2x3 12x2 increasing or decreasing at x = 3? 19 (b) Is f (x) = ln x increasing or decreasing at x = 13? (c) Is f (x) = e x x1:5 increasing or decreasing at x = 4? (d) Is f (x) = 4x 1 x+2 increasing or decreasing at x = 2? 6.CHAPTER 2. Optimize the following functions. . b. (a) Find conditions on a. Consider the function f (x) = ax2 + bx + c. Answer the following: (a) Is f (x) = (3x 2)=(4x + x2 ) increasing or decreasing at x = 12 increasing or decreasing at x = 6? 1? (b) Is f (x) = 1= ln x increasing or decreasing at x = e? (c) Is f (x) = 5x2 + 16x 7. b. and c that guarantee that f (x) has a unique global minimum. Optimize the following functions. and c that guarantee that f (x) has a unique global maximum. (b) Find conditions on a. and tell whether the optimum is a local maximum or a local minimum: (a) f (x) = 4x2 + 10x 6x 3 ln x (b) f (x) = 120x0:7 (c) f (x) = 4x 8. and tell whether the optimum is a local maximum or a local minimum: (a) f (x) = 4x2 (c) f (x) = 36x 24x + 132 4x (x + 1)=(x + 2) (b) f (x) = 20 ln x 9. SINGLE VARIABLE OPTIMIZATION 5.

The wage rate is \$11 per unit of time. (a) Suppose that Beth is her own minion and that her e¤ort cost function is also c(m). with minion e¤ort denoted by m. Both require labor only. The minion does not like exerting e¤ort. and his cost of e¤ort is given by the function c(m). The production function for widgets is W = 20L1=2 and the production function for gookeys is G = 30L. and interpret it. (d) What are the second-order conditions for the answer in (c) to be a maximum? 11. and its pro…t function is (L) = 9 20L1=2 + 3 30(60 L) 11 60. Find the equation determining how much e¤ort the minion will exert. repsectively. and interpret it.) . SINGLE VARIABLE OPTIMIZATION 20 10. A …rm (Bilco) can use its manufacturing facility to make either widgets or gookeys. Find the equation determining how much e¤ort she would exert. (b) What are the second-order condtions for the answer in (a) to be a maximum? (c) Suppose that Beth pays the minion w per unit of e¤ort.CHAPTER 2. Her bene…t from m units of minion e¤ort is given by the function b(m). and the prices of widgets and gokeys are \$9 and \$3 per unit. How much of each product should Bilco produce per unit of time? (Hint: If Bilco devotes L units of labor to widget production it has 60 L units of labor to devote to gookey production. The manufacturing facility can accomodate 60 workers and no more. Beth has a minion (named Henry) and bene…ts when the minion exerts e¤ort.

This time. so must choose amounts of several di¤erent goods simultaneously. consumers choose bundles of commodities. Firms use many inputs and must choose their amounts simultaneously. For example. This chapter addresses issues that arise when there are several variables. The previous chapter used graphs to generate intuition.1 A more complicated pro…t function In the preceding chapter we looked at a pro…t function in which the …rm chose how much output to produce. We cannot do that here because I am bad at drawing graphs with more than two dimensions. In economics. our intuition will come from what we learned in the last chapter. Instead. problems often involve more than one choice variable. Suppose that the …rm can use n di¤erent inputs. and s 21 . let’ focus on inputs. 3. though. instead of focusing on outputs.CHAPTER 3 Optimization with several variables Almost all of the intuition behind optimization comes from looking at problems with a single choice variable.

written (x1 . the quantity xi is the amount of the i-th input. xn . What happens if there are two inputs (n = 2). More generally.CHAPTER 3. Call the input labor (L) and its price the wage (w). the vector (x1 . :::. we call xi the i-th component of . When the …rm employs L units of labor it produces F (L) units of output and sells them for p units each. call them capital (K) and labor (L)? The production function is then Q = F (K. xn ). OPTIMIZATION WITH SEVERAL VARIABLES 22 denote the amounts by x1 .1). :::. :::. and the corresponding pro…t function is (K. L) rK wL: (3. 3. L). For each i between 1 and n. x2 units of input 2. The …rst-order condition is 0 (L) = pF 0 (L) w = 0. is (L) = pF (L) wL. xn ) is an input vector. which increases revenue by pF 0 (L). and it can sell as much of its output as it wants at the competitive price p. and we will assume that the …rm can purchase as much of input i as it wants for price ri .1) How do we …nd the …rst-order condition? That is the task for this chapter. :::. Its only cost is a labor cost equal to wL because it pays each unit of labor the wage w. When the …rm uses x1 units of input 1. The production function is then Q = F (L). its output is given by the production function Q = F (x1 . Using one additional unit of labor costs an additional w but increases output by F 0 (L). How much of each input should the …rm use to maximize pro…t? We know what to do when there is only one input (n = 1).2 Vectors and Euclidean space Before we can …nd a …rst-order condition for (3. xn ): Inputs are costly. and stops when the added revenue and the added cost exactly o¤set each other. Pro…t. and so on. which can be interpreted as the …rm’ pro…t maximizing labor demand equats ing the value marginal product of labor pF 0 (L) to the wage rate. for revenue of pF (L). The …rm employs labor as long as each additional unit generates more revenue than it costs. L) = pF (K. In our example of the input-choosing pro…t-maximizing …rm. A vector is an array of n numbers. we …rst need some terminology. then.

Vectors are collections of numbers. We begin with addition: x + y = (x1 + y1 . the vector (x1 . Multiplication is more complicated. In this text vectors are sometimes written out as (x1 . or real number. :::. :::. When you took plane geometry in high school. this was Euclidean geometry. It is done using the formula x y = x1 y1 + x2 y2 + ::: + xn yn . Scalar multiplication is achieved by multiplying each component of the vector by the same number. You might wonder why we would ever want such a thing. but sometimes that is cumbersome. The set of real numbers is commonly denoted by R. like x and y. :::. x2 + y2 . sometimes called the dot product. or n-dimensional Euclidean space. xn + yn ): Adding vectors is done component-by-component.CHAPTER 3. We call R2 the 2-dimensional Euclidean space. and scalar multiplication takes a vector and a scalar and yields another vector. which can be thought of as R R. ax2 . xn . the vector (x1 . When a vector has n components. They are also numbers themselves. It pays ri per unit of input i. it is r1 x1 + ::: + rn xn . We can depict a 2-dimensional vector using a coordinate plane anchored by two real lines. But the inner product takes two vectors and yields a scalar. xn ) is n-dimensional. it is in Rn . Suppose that a …rm uses n inputs in amounts x1 . xn ). OPTIMIZATION WITH SEVERAL VARIABLES 23 the vector (x1 . then ax = (ax1 . So. Three common operations are used with vectors. The other form of multiplication is the inner product. The number of components in a vector is called the dimension of the vector. One is scalar multiplication. and it will help you if you begin to think of them this way. :::. xn . That way we can talk about operations involving two vectors. If x is a vector and a is a scalar (a real number). :::. x2 ) is in R2 . and we depict R using a number line. xn ). thereby either "scaling up" or "scaling down" the vector. Here is an example. and there are two notions. . axn ). What is its total production cost? Obviously. which can be easily written as r x. We use the symbol x to denote the vector whose components are x1 . :::. :::. Vector addition takes two vectors and yields another vector. Vector subtraction can be achieved through addition and using 1 as the scalar: x y = x + ( 1)y.

The point is that we want to di¤erentiate (3. like (3. xn ) f (x1 . :::. =. xi 1 . x = y if xi = yi for all i = 1. and just write px. For vectors. The i-th partial derivative of f is f (x1 . Both vector addition and the inner product are commutative. xn ) and pays prices given by the vector p = (p1 . if a consumer purchases a commodity bundle given by the vector x = (x1 . This is also called the norm of the vector x. This will contrast with matrices in a later chapter. x2 . :::. For real numbers we have the familiar relations >. n. In general both of these conditions will depend on the values of both K and L. xi+1 . n and xi > yi for some i between 1 and n. A second use comes from looking at p x x = x2 + ::: + x2 . x > y if x y but x 6= y. We will get to that later.1). and From the third one it follows that x > y if xi yi for all i = 1. Let f (x1 . xi + h. is to maximize it according to each variable separately. so we will have to solve some simultaneous equations. Di¤erentiating a function of two or more variables with respect to only one of them is called partial di¤erentiation. :::. x y if xi yi for all i = 1. Then x x is the distance from the point x (remember. . :::. pn ). where matrix multiplication is dependent on the order in which the matrices are written. x y if xi > yi for all i = 1. by …nding a …rst-order condition for the choice of K and another one for the choice of L. and <. The fourth condition can be read x is strictly greater than y component-wise. n. . :::. :::. they do not depend on the order in which the two vectors occur. that is.CHAPTER 3. :::. s 1 and it is written kxk = (x x) 2 . :::. xn ) @f (x) = lim : (3. :::. OPTIMIZATION WITH SEVERAL VARIABLES 24 Similarly. Often it is more convenient to leave the "dot" out of the inner product. her total expenditure is p x.2) h!0 @xi h . Vector analysis also requires some de…nitions for ordering vectors. 3. that is.3 Partial derivatives The trick to maximizing a function of several variables.1) once with respect to K and once with respect to L. n. ::. xn ) be a general function of n variables. 1 n it’ a number) to the origin.

x2 ) = 14(5x1 2)(7x2 3). We sometimes use the notation fi (x) to denote @f (x)=@xi . Using coordinate vectors. In fact. where u1 .CHAPTER 3. x2 ) = (5x1 2)(7x2 3)2 . un are functions of the one-dimensional variable x. The coordinate vector ei is the vector with components given by ei = 1 i and ei = 0 when j 6= i. :::.2) can be rewritten @f f (x + hei ) (x) = lim h!0 @xi h f (x) : The i-th partial derivative of the function f is simply the derivative one gets by holding all of the components …xed except for the i-th component. 0. 1). The …rst coordinate vector is e1 = (1. and so on through the n-th coordinate vector en = (0. :::. coordinate vector ei has a one in the i-th place and zeros everywhere else. So. :::. 0. :::. the order of di¤erentiation does not matter. It is important to note that. It is also possible to take partial derivatives of partial derivatives. this result is important enough to have a name: Young’ Theorem. 0. For example. the de…nition of the i-th partial derivative in (3. :::. OPTIMIZATION WITH SEVERAL VARIABLES 25 This de…nition might be a little easier to see with one more piece of notation. consider the function f (x1 . and we call fij (x) the cross partial of f (x) with respect to xi and xj . much like second derivatives in single-variable calculus. un (x)). 0). x2 ) = 5(7x2 3)2 and f2 (x1 . 1. Then df = f1 u01 (x) + f2 u02 (x) + ::: + fn u0n (x) dx . the j second coordinate vector is e2 = (0. 0). that is. When a function is de…ned over n-dimensional vectors it has n di¤erent partial derivatives. s Partial di¤erentiation requires a restatement of the chain rule: Theorem 8 Consider the function f : Rn ! R given by f (u1 (x). The partial derivatives are f1 (x1 . fij (x) = fji (x). We use the notation fij (x) = @2f (x): @xi @xj We call fii (x) the second partial of f with respect to xi . in most cases. u2 (x). One takes the partial by pretending that all of the other variables are really just constants and di¤erentiating as if it were a single-variable function.

5 ln x). y2 ) = y1 y2 . but the values y1 and y2 are both determined by the value of x. K . Let f (y1 . L ). Suppose that the function is f (3x2 2. if we use the chain rule. L) rK wL: Think about the …rm as solving two problems simultaneously: (i) given the optimal amount of labor. and (ii) given the optimal amount of capital. we di¤erentiate with respect to each argument separately and then add them together. with y1 = 2x and y2 = 3x2 .CHAPTER 3.4 Multidimensional optimization Let’ return to our original problem. we get dy1 dy2 df = y2 + y1 dx dx dx = (2)(3x2 ) + (6x)(2x) = 18x2 : It works. the …rm wants to use the amount of capital that maximizes (K. L). L . 5 ln x) 5 x : The basic rule to remember is that when variable we are di¤erentiating with respect to appears in several places in the function. The following lame example shows that this works. It’ derivative with respect to x is s d f (3x2 dx 2. L ) = 0 @K . The …rm chooses both capital K and labor L to maximize (K. Substituting we have f (x) = (2x)(3x2 ) = 6x3 df = 18x2 : dx But. L) = pF (K. Problem (i) translates into @ (K . 3. maximizing the pro…t function given in s expression (3. 5 ln x) (6x) + f2 (3x2 2. OPTIMIZATION WITH SEVERAL VARIABLES 26 This rule is best explained using an example. 5 ln x) = f1 (3x2 2. the …rm wants to employ the amount of labor that maximizes (K .1).

L) = 1=2 K + L1=2 . and the wage rate is w = 4.CHAPTER 3. L) = 3K @K and 3=4 L1=2 6=0 (3. The …rst-order conditions are @ (K. Example 1 The production function is F (K. Then the pro…t function is (K. In this example the two …rst-order conditions were independent. the FOC for K did not depend on L and the FOC for L did not depend on K. This is not always the case. To see how this works. L ) = 0: @L 27 Thus.4) . L) = 10(K 1=2 + L1=2 ) 5K 4L. L) = K 1=4 L1=2 .3) @ (K. L) = 6K 1=4 L @L 1=2 6 = 0: (3. that the price of the good is p = 10. as shown by the next example. The two equations above are the …rstorder conditions for the pro…t-maximization problem. suppose that the production function is F (K. L) = 5L 1=2 4 = 0: @L The …rst equation gives us K = 1 and the second gives us L = 25=16. the price of capital is r = 5. Solution. Find the optimal values of K and L. and the wage rate is w = 6. L) = 5K @K and 1=2 5=0 @ (K. L) = 12K 1=4 L1=2 6K 6L: The …rst-order conditions are @ (K. with the provision that all of the optimization problems must be solved together. OPTIMIZATION WITH SEVERAL VARIABLES and problem (ii) translates into @ (K . The pro…t function is (K. optimization in several dimensions is just like optimization in each single dimension separately. that is. the price of capital is r = 6. the price is 12.

:::. xn ) max pF (x) x r1 x 1 r x: ::: rn x n or.4) can be rearranged to get K 1=4 = 1 L1=2 K 1=4 = L1=2 K = L2 28 where the last line comes from raising both sides to the fourth power. This is the same as the condition for a single variable. :::. pFn (x1 .3) yields L1=2 K 3=4 L1=2 (L2 )3=4 L1=2 L3=2 1 L = 2 = 2 = 2 = 2 1 : 2 Plugging this back into K = L2 yields K = 1=4.CHAPTER 3. xn ) . note that (3. . The general pro…t-maximization problem is x1 . and it holds for every input. L = This example shows the steps for solving a multi-dimensional optimization problem. Now let’ return to the general problem to see what the …rst-order cons ditions tell us. The i-th FOC is pFi (x) = ri . which is the condition that the value marginal product of input i equals its price. The …rst-order conditions are: pF1 (x1 . in vector notation. OPTIMIZATION WITH SEVERAL VARIABLES To solve these. :::.xn max pF (x1 . Plugging this into (3. xn ) r1 = 0 rn = 0. .:::. .

The trick we now take is to implicitly di¤erentiate equation (3. To see how it works. OPTIMIZATION WITH SEVERAL VARIABLES 29 3. let’ return to the pro…t maximization problem with s a single choice variable: max pF (L) wL: L The FOC is pF 0 (L) w = 0: (3. how does the optimal value of L change when p changes? To answer this. it should be comparative statics analysis. "How does the optimum change when one of the underlying variables changes?" For example. and many papers (and dissertations) have relied on not much more than comparative statics analysis. or when the wage rate changes? This is an important problem. so that F 00 (L) < 0: Note that this guarantees that the second-order condition for a maximization is satis…ed.5) The comparative statics question is.5) with respect to p. In other words. If there is one tool you have in your kit at the end of the course. rewrite the FOC so that L is replaced by the function L (p): pF 0 (L (p)) w=0 dL = 0: dp and di¤erentiate both sides of the expression with respect to p. how does the …rm’ des mand for labor change when the output price changes. treating L as a function of p. The standard comparative statics questions is. We get F 0 (L (p)) + pF 00 (L (p)) The comparative statics question is now simply the astrology question.CHAPTER 3. "What is the sign of dL =dp?" Rearranging the above equation to isolate dL =dp on the left-hand side gives us dL = dp F 0 (L ) : pF 00 (L ) . let’ …rst assume that the marginal product of labor is s strictly decreasing.5 Comparative statics analysis Being able to do multivariate calculus allows us to do one of the most important tasks in microeconomics: comparative statics analysis.

and so the …rm wants to expand its operation to generate more pro…t. s). s) < 0 for a maximization problem and fxx (x. which makes sense: when the output price rises producing output becomes more pro…table. Example 2 A person must decide how much to work in a single 24-hour day.6) We want to know the sign of this derivative. The comparative statics question is. For a minimization problem we have fxx > 0. s) dx + fxs (x. and we know that pF 00 (L ) < 0 because we assumed strictly diminishing marginal product of labor. …rst derive the …rst-order condition: fx (x.e. fxs (x.CHAPTER 3. s) > 0 for a minimization problem. Consequently. Let x denote the optimal value of x. s) = 0: Next implicitly di¤erentiate with respect to s to get fxx (x. the sign of the comparative statics derivative dx =ds is the same as the sign of the numerator. where x is the choice variable and s is a shift parameter. Suppose that the objective function. For a maximization problem we have fxx < 0 by assumption and so the negative sign cancels out the sign of the denominator. This tells us that the …rm demands more labor when the output price rises. "What is the sign of dx =ds?" To get the answer. is f (x. s) : fxx (x. So. which is positive. F 00 (L) < 0. s). i. the function the agent wants to optimize. at so that there is a unique solution to the …rst-order condition. s). Assume that the second-order condition holds strictly. so that fxx (x. We can write the comparative statics problem generally. s) = 0: ds Rearranging yields the comparative statics derivative dx = ds fxs (x. s) (3. OPTIMIZATION WITH SEVERAL VARIABLES 30 We know that F 0 (L) > 0 because production functions are increasing. and so the comparative statics derivative dx =ds has the opposite sign from the partial derivative fxs (x. She gets paid w per hour of labor. that is. but gets utility from each hour she . These conditions guarantee that there is no "‡ spot" in the objective function. dL =dp has the form of the negative of a ratio of a positive number to a negative number.

where t is the amount of time spent away from work.5.1 An alternative approach (that I don’ like) t Many people use an alternative approach to comparative statics analysis. so that 24 L is is the amount of time she does not spend at work. :::. and her objective is maxwL + u(24 L L): The FOC is w u0 (24 L ) = 0: Write L as a function of w to get L (w) and rewrite the FOC as: w u0 (24 L (w)) = 0: Di¤erentiate both sides with respect to w (this is implicit di¤erentiation) to get dL = 0: 1 + u00 (24 L ) dw Solving for the comparative statics derivative dL =dw yields dL = dw 1 u00 (24 L) > 0: She works more when the wage rate w increases. Does the person work more or less when her wage increases? Solution. and the total di¤erential of the function g(x1 . Let L denote the amount she works. but I do not like this approach as much. xn ) is dg = g1 dx1 + ::: + gn dxn : We want to use total di¤erentials to get comparative statics derivatives. We will get to why later. It gets to the same answers. and u has the properties u0 (t) > 0 and u00 (t) < 0. . Her utility from this leisure time is therefore u(24 L). This utility is given by the function u(t). OPTIMIZATION WITH SEVERAL VARIABLES 31 does not spend at work.CHAPTER 3. 3. The approach begins with total di¤erential.

s) = 0: If we want to do comparative statics analysis using our (preferred) implicit di¤erentiation approach. r. instead of just one. we implicitly di¤erentiate with respect to r to get fxx dx + fxr = 0 dr dx fxr = : dr fxx . s and r. s). s) = 0: To …nd the comparative statics derivative dx=dr. The problem is to optimize f (x. OPTIMIZATION WITH SEVERAL VARIABLES 32 Remember our comparative statics problem: we choose x to optimize f (x. Let’ stretch our techniques a little and have a problem with two shift pas rameters. The total di¤erential of the s right-hand side is zero. s) = 0: Let’ take the total di¤erential of both sides. we would …rst write x as a function of the two shift parameters. and the FOC is fx (x.7) This is exactly the comparative statics derivative we found above in equation (3. s).CHAPTER 3. r.6). and many students …nd it straightforward and easier to use than implicit di¤erentiation. so that fx (x(r. s). We can solve for this expression in the above equation: fxx dx + fxs ds = 0 fxx dx = fxs ds fxs dx = : ds fxx (3. So the method works. The FOC is fx (x. r. s)] = fxx dx + fxs ds: Setting the above expression equal to zero yields fxx dx + fxs ds = 0: The derivative we want is the comparative statics derivative dx=ds. and the total di¤erential of the left-hand side is d[fx (x.

As we constructed it back in equation (2.6 Problems 1. So. Using total di¤erentials. I object to the total di¤erential approach because it entails dividing by zero. it’ just a matter of when you want to do your s remembering. All of that said. think about what the derivative dx=ds means. and I prefer to think of dx=ds as a single entity with a long name. 7). s) to get fxx dx + fxr dr + fxs ds = 0: We want to solve for dx=dr. 6. so s is not a function of r. on a purely mathematical basis. That means that ds=dr = 0. According to this intuition.6) except it replaces s with r. . and you were taught very young that you cannot divide by zero. Consider the vectors x = (4. we would …rst take the total di¤erential of fx (x. s 3. 7. It’ just not going to show up anywhere else in this book. the total di¤erential approach works just …ne. this does not look like the same answer. and doing so yields fxx dx + fxr dr + fxs ds = 0 fxx dx = fxr dr fxs ds dx fxr fxs ds = : dr fxx fxx dr On the face of it.7). 2) and y = (6. Substituting this in yields fxr dx = dr fxx as expected. r. since it is just like expression (3.CHAPTER 3.1). 3. so ds is zero. though. dx=ds is the limit of x= s as s ! 0. and in the total di¤erential approach we recognized it at the end. To see why. I still like the implicit di¤erentiation approach better. But we divided by it in equation (3. both s and r are shift parameters. ds is the limit of s as it goes to zero. 1. So both work. But. So what is the di¤erence between the two approaches? In the implicit di¤erentiation approach we recognized that s does not depend on r at the beginning of the process. On a practical level. and not a ratio of dx and ds. OPTIMIZATION WITH SEVERAL VARIABLES 33 This should not be a surprise.

Its cost function is cq. A …rm faces inverse demand function p(q) = 120 …rm’ output. y) = 4x2 + 3y 2 (a) Find the partial derivative fx (x. 6. y). (b) Find the inner product x y. (b) Find the slope of the indi¤erence curve at point (x. or 2. Consider the function u(x. x x y? (c) Find the inner product x y. 6. y). 5. are true: x = y. 4q. (b) Find the partial derivative fy (x. 3. Consider the function f (x. (b) Find the partial derivative fy (x. Consider the vectors x = (5. 34 y. where q is the . p p p (d) Is x x + y y (x + y) (x + y)? 4y. s (a) Write down the …rm’ pro…t function. 0. 4. y) = 3 ln x + 2 ln y. (a) Write the equation for the indi¤erence curve corresponding to the utility level k. if any. 4x + 2=y. y). Consider the function f (x. y) = 16xy (a) Find the partial derivative fx (x. 12xy + 18x. (c) Find the critical point of f .CHAPTER 3. y). (c) Find the critical point of f . y). p p p (c) Verify that x x + y y > (x + y) (x + y). OPTIMIZATION WITH SEVERAL VARIABLES (a) Write down the vector 2y + 3x: (b) Which of the following. 2) and y = (3. x < y. 2. 2). s (b) Find the pro…t-maximizing level of pro…t as a function of the unit cost c. (a) Write down the vector 6x 3.

7. (a) If capital is on the vertical axis and labor on the horizontal. An isoquant is a curve showing the combinations of inputs that all lead to the same level of output. When the production function over capital K and labor L is F (K. negative? 35 Is it positive or (d) Write the maximized pro…t function as a function of c. Each worker at a …rm can produce 4 units per hour. (e) Find the derivative showing how pro…t changes when c changes. s where L is the number of workers employed p (fractional workers are okay). What’ its sign? s 9. OPTIMIZATION WITH SEVERAL VARIABLES (c) Find the comparative statics derivative dq=dc. What’ its sign? s (c) Find d =dw. each worker must p be paid \$w per hour. (a) 15x2 + 3xa (b) 6x2 a = 5a 5xa2 5 x = 20: a 8. Does the isoquant slope upward or downward? . L) = 100. an isoquant corresponding to 100 units of output is given by the equation F (K.CHAPTER 3. (b) Suppose that production is increasing in both capital and labor. The …rm’ pro…t function is (L) = 30 4L wL. and the …rm’ revenue function is R(L) = 30 L. (f) Show that d =dc = q. …nd the slope of the isoquant. Find dx=da from each of the following expressions. s (a) Show that L = 900=w2 : (b) Find dL =dw. L).

But many problems in economics have constraints. and under the assumption that more is better. So far we have only looked at unconstrained optimization problems.CHAPTER 4 Constrained optimization Microeconomics courses typically begin with either consumer theory or producer theory. Both of these problems lead to a graph of the form in Figure 1. because consumers cannot purchase unlimited quantities of every good.1. If they begin with consumer theory the …rst problem they face is the consumer’ constrained optimization problem: the consumer chooses a s commodity bundle. to maximize utility without spending more than her budget. The budget constraint is an extra condition that the optimum must satisfy. a consumer would choose an in…nite amount of every good. The consumer’ budget cons straint is a classic example. How do we make sure that we get the solution that maximizes utility while still letting the budget constraint hold? 36 . or vector of goods. Without a budget constraint. the …rst problem it poses is the …rm’ cost minimization problem: the …rm chooses a s vector of inputs to minimize the cost of producing a predetermined level of output. If the course begins with producer theory. This obviously does not help us describe the real world.

it says that total expenditure on the two goods is equal to M . E. The abbreviation "s. The problem can be written as follows: maxu(x1 . we s . p1 x1 + p2 x2 = M where pi is the price of good i and M is the total amount the consumer has to spend on consumption. The second line is the budget constraint.1: Consumer’ problem s 4.x2 s. CONSTRAINED OPTIMIZATION x2 37 E x1 Figure 4.1 shows the problem graphically. and it should be familiar.1 A graphical approach Let’ look more carefully at the consumer’ two-dimensional maximization s s problem. Figure 4.t. To …nd the tangency condition we must …gure out how to …nd the slopes of the two curves. and (ii) it is on the budget line. We can do this easily using implicit di¤erentiation. What we want to do in this section is …gure out what equations we need to characterize the solution. because it’ easier. Since x2 is on the vertical axis. x2 ) x1 . The optimal consumption point. is a tangency point.t. Begin with the budget line.CHAPTER 4. and it has two features: (i) it is where the indi¤erence curve is tangent to the budget line." so that the problem for the consumer is to choose a commodity bundle to maximize utility subject to a budget constraint. Let’ translate s these into math." stands for "subject to.

x2 = M=p2 (p1 =p2 )x1 . x2 ) = k for some scalar k. Rearranging yields dx2 = dx1 p1 : p2 (4.x2 ) @x2 : (4. x2 ) @u(x1 . . and the denominator is the marginal utility of good 2. and it is better to apply it …rst to the easier case. Now let’ …nd the slope of the indi¤erence curve.1) Of course. we could have gotten to the same place by rewriting the equation for the budget line in slope-intercept form. The equation for an s indi¤erence curve is u(x1 . x2 ) dx2 + = 0: @x1 @x2 dx1 Rearranging yields dx2 = dx1 @u(x1 .x2 ) @x1 @u(x1 . Treat x2 as a function of x1 and rewrite to get u(x1 . Treating x2 as a function of x1 and rewriting the budget constraint yields p1 x1 + p2 x2 (x1 ) = M: Implicit di¤erentiation gives us p1 + p2 dx2 =0 dx1 because the derivative of M with respect to x1 is zero.2) The numerator is the marginal utility of good 1. but we have to use implicit di¤erentiation anyway to …nd the slope of the indi¤erence curve.CHAPTER 4. so the slope of the indi¤erence curve is the negative of the ratio of marginal utilities. CONSTRAINED OPTIMIZATION 38 want to …nd a slope of the form dx2 =dx1 . which is also known as the marginal rate of substitution. x2 (x1 )) = k: Now implicitly di¤erentiate with respect to x1 to get @u(x1 .

xn ) + (M p1 x1 ::: pn xn ): This requires some interpretation. let’ think s about the quantity M p1 x1 ::: pn xn . Our …rst step is to set up the Lagrangian L(x1 . but suppose it was positive.CHAPTER 4. requires that the slope of the budget line in (4.3) The other condition. that the indi¤erence curve and budget line are tangent. x2 ) p1 : p2 (4. The task now is to characterize the solution in a more general setting with more dimensions. 4. says that the bundle (x1 . First of all.2 Lagrangians The way that we solve constrained optimization problems is by using a trick developed by the 18-th century Italian-French mathematician Joseph-Louis Lagrange. It has to be zero according to the budget constraint. :::. so don’ t get confused.2). and p1 x1 + ::: + pn xn is expenditure on consumption. p1 x1 + ::: + pn xn = M . CONSTRAINED OPTIMIZATION 39 Condition (i). xn ) s. and utility is measured in utility units (or utils).4) constitute two equations in two unknowns (x1 and x2 ). ) = u(x1 . the variable is called the Lagrange multiplier (and the Greek letter is lambda). xn . x2 ) = u2 (x1 . What would it mean? M is income. But unspent income is measured in dollars. so we cannot simply . or u1 (x1 .) Suppose that our objective is to solve an n-dimensional constrained utility maximization problem: x1 .xn max u(x1 . which is simply p1 x1 + p2 x2 = M . :::. (There is also a 1972 ZZ Top song called La Grange. and so they completely characterize the solution to the consumer’ s optimization problem. Second. condition (ii). x2 ) must lie on the budget line. (4.t.4) Equations (4.:::.3) and (4. :::. Income minus expenditure is simply unspent income.1) is the same as the slope of the indi¤erence curve in (4.

one for each of the xi ’ and one for : s @u @L = p1 = 0 @x1 @x1 . optimization using the Lagrangian turns the n-dimensional constrained optimization problem into an (n+1)-dimensional unconstrained optimization problem.3 A 2-dimensional example The utility function is u(x1 . 1 2 and the budget is M = 120. . though. ) = x1 x2 + (120 1=2 1=2 10x1 20x2 ): .5c) Notice that the last FOC is simply the budget constraint. The budget constraint. The Lagrangian. CONSTRAINED OPTIMIZATION 40 add these together. guarantees that there is no unspent income. So.x2 1=2 1=2 s. and is therefore measured in utils/dollar. is the utility of consumption plus the utility of unspent income. @u @L = pn = 0 @xn @xn @L = M p1 x1 ::: @ (4.CHAPTER 4. Note that the Lagrangian has not only the xi ’ as arguments. These two features give the Lagrangian approach its appeal. though. The Lagrange multiplier converts the dollars into utils.t. we begin by setting up the Lagrangian L(x1 . The …rst-order conditions arise from taking n + 1 partial derivatives of L. optimization using the Lagrangian guarantees that the budget constraint is satis…ed. 4. The expression (M p1 x1 ::: pn xn ) can be interpreted as the utility of unspent income. then. 10x1 + 20x2 = 120: What are the utility maximizing levels of x1 and x2 ? To answer this. The consumer’ problem is then s maxx1 x2 x1 . but also s the Lagrange multiplier .5b) pn xn = 0 (4. Also. x2 ) = x0:5 x0:5 . and so the second term in the Lagrangian is necessarily zero. x2 . . the prices are p1 = 10 and p2 = 20.5a) (4. We still want it there. because it is important for …nding the right set of …rst-order conditions.

equation (4.CHAPTER 4.6b) (4.6c) is simply the budget constraint.6c) to get 120 10(2x2 ) 20x2 = 0 40x2 = 120 x2 = 3: Because x1 = 2x2 . CONSTRAINED OPTIMIZATION The …rst-order conditions are @L 1=2 1=2 = 0:5x1 x2 10 = 0 @x1 @L 1=2 1=2 = 0:5x1 x2 20 = 0 @x2 @L = 120 10x1 20x2 = 0 @ Of course.6a) and (4.6a) (4.6b) to get 1 = 20 and = 1 40 x1 x2 x2 x1 1=2 41 (4. Substitute x1 = 2x2 into (4. and ). We have three equations in three unknowns (x1 .6c) To solve 1=2 : Set these equal to each other to get 1 20 2 x2 x1 x2 x1 1=2 1 = 40 = x1 x2 x1 x2 1=2 1=2 1=2 2x2 = x1 where the last line comes from cross-multiplying. them. …rst rearrange (4. we have x1 = 6: . x2 .

we know from the rearrangement of (4. which.4 Interpreting the Lagrange multiplier Remember that we said that the second term in the Lagrangian is the utility value of unspent income. s To do so. . the Lagrange multiplier should be the marginal utility of (unspent) income. x2 ) = M 20 M 40 = M p : 20 2 Di¤erentiating this expression with respect to income M yields du 1 = p = . This term is (M p1 x1 p2 x2 ). of course. so we still have x1 = 2x2 . All of the steps are the same as above. let’ generalize the problem so that income is M instead of s 120. Substituting into the budget constraint gives us M 10(2x2 ) 20x2 = 0 M x2 = 40 M x1 = 20 1 p : = 20 2 0:5 0:5 Plugging these numbers back into the utility function gives us u(x1 .6a) that 1 = 20 x2 x1 1=2 42 1 3 = 20 6 1 p : = 20 2 1=2 4. Let’ see if this is true. So. dM 20 2 and the Lagrange multiplier really does measure the marginal utility of income. is zero because there is no unspent income. CONSTRAINED OPTIMIZATION Finally. because it is the slope of the utility-of-unspent-income function.CHAPTER 4.

Cobb-Douglas Economists often rely on the Cobb-Douglas class of functions which take the form f (x1 . relaxing the constraint is lowering q. :::. and let q be the desired level of output. Let xi be s the amount of input i employed by the …rm. xn ) = xa1 xa2 x an 1 2 n where all of the ai ’ are positive. xn )): Since the …rm is minimizing cost. In our case the objective function is a utility function. which was referred to simply as marginal cost way back in Chapter 2.:::. :::. s To see its usefulness. reducing cost from the optimum would require reducing the output requirement q. so the marginal value is marginal utility. The constraint is relaxed by allowing income to be higher.5 A useful example .t. where the units used to measure value are determined by the objective function. :::. let F (x1 . :::. So.xn max u(x) . xn . The functional form arose out of a 1928 s collaboration between economist Paul Douglas and mathematician Charles Cobb. using the Lagrangian to solve the …rm’ cost minimization problem gives you the s …rm’ marginal output cost function for free. let ri be its price. The …rm’ problem would be s x1 . The interpretation of is the marginal cost of output. s 4. :::. ) = r1 x1 + ::: + rn xn + (q F (x1 . F (x1 .xn min r1 x1 + ::: + rn xn s.CHAPTER 4. and was designed to …t Douglas’ production data. CONSTRAINED OPTIMIZATION 43 In general.:::. xn ) = q The Lagrangian is then L(x1 . So. Now think instead about a …rm’ cost-minimization problem. consider a consumer choice problem with a CobbDouglas utility function u(x) = xa1 xa2 x an : 1 2 n x1 . so the Lagrange multiplier measures the marginal utility of income. xn ) be the production function. the Lagrange multiplier measures the marginal value of relaxing the constraint.

7) to get pi = ai u(x) 1 xi ai u(x) M = xi u(x)(a1 + ::: + an ) ai M = : a1 + ::: + an xi . xn .CHAPTER 4. Substitute these into the budget constraint: pi = M p1 x1 ::: pn xn = 0 a1 u(x) an u(x) x1 ::: xn = 0 x1 xn u(x) (a1 + ::: + an ) = 0: M (4. CONSTRAINED OPTIMIZATION s. The expression for @L=@xi can be rearranged to become ai a 1 ai x1 x an pi = u(x) pi = 0: n xi xi This yields that ai u(x) xi for i = 1. which is just the budget constraint. :::. p1 x1 + ::: + pn xn = M: Form the Lagrangian L(x1 . n. plug this back into (4. n and @L =M @ p1 x1 ::: pi = 0 pn xn = 0. :::. :::.7) M Now solve this for the Lagrange multiplier : = u(x) a1 + ::: + an : M Finally.t. ) = xa1 1 xan + (M n p1 x1 ::: pn xn ): 44 The …rst-order conditions take the form @L ai x an = x a1 n @xi xi 1 for i = 1.

but rearranging (4. .8) That was a lot of steps. in which case the spending share for good i is just equal to the exponent on good i. most graduate students in economics have memorized it by the end of their …rst semester because it turns out to be so handy.8).8) to ai pi xi = : M a1 + ::: + an The numerator of the left-hand side is the amount spent on good i. then. CONSTRAINED OPTIMIZATION Finally.8) yields an intuitive and easily memorizable expression. The obvious comparative statics derivative for a demand function is with respect to its own price: dxi = dpi ai M a1 + ::: + an p2 i 0 and so demand is downward-sloping. We can do this by taking the comparative statics derivative of xi with respect to price pj .8) lends itself to some particularly easy comparative statics analysis. Rearrange (4. one often looks for the e¤ects of changes in the prices of other goods. where j 6= i. is the share of income spent on good i. which is another feature that makes Cobb-Douglas special. Another comparative statics derivative is with respect to income: dxi ai 1 = dM a1 + ::: + an pi 0: All goods are normal goods when the utility function takes the Cobb-Douglas form.CHAPTER 4. The left-hand side. solve this for xi to get the demand function for good i: xi = ai M : a1 + ::: + an pi 45 (4. The demand function in (4. Finally. dxi = 0: dpj This result holds because the other prices appear nowhere in the demand function (4. In fact. as it should be. The denominator of the left-hand side is the total amount spent. In especially convenient cases the exponents sum to one. The equation says that the share of spending is determined entirely by the exponents of the CobbDouglas utility function.

This time. n Set up the Lagrangian L(x1 . 1 n @ which is just the production constraint. Consider the …rm’ cost-minimization problem when the production function is s xan .9) pi because the production constraint tells us that F (x) = q.CHAPTER 4. Plugging this into the production constraint give us a1 a1 p1 a1 x a1 1 xan ): n ai F (x) =0 xi q p1 an a1 an q q pn an = q = q: an pn a1 +:::+an a1 +:::+an But a1 + ::: + an = 1.:::. Rearranging the expression for @L=@xi yields q x i = ai . The problem is x1 . :::. so that F (x) = xa1 1 n to assume that a1 + ::: + an = 1. ) = p1 x1 + ::: + pn xn + (q The …rst-order conditions take the form @L = pi @xi for i = 1. n and @L = q x a1 xan = 0. xn . (4. :::.xn min p1 x1 + ::: + pn xn s.t. though. xa1 1 xan = q. so the above expression reduces further to a1 p1 a1 p1 a1 a1 an pn an pn an q = q an = 1 . CONSTRAINED OPTIMIZATION 46 We can also use Cobb-Douglas functions in a production setting. we are going Cobb-Douglas.

CONSTRAINED OPTIMIZATION = p1 a1 a1 47 (4.CHAPTER 4.11) dq a1 an Why is this cool? There are three reasons. dC(q)=dq is marginal cost. and the exponents of the Cobb-Douglas production function. This doesn’ look particularly useful or intuitive. a1 an p1 a1 a1 a1 pn an pn an an an q q where the last equality holds because a1 + ::: + an = 1. Second. and it depends on the amount of output being produced (q).9) to get q x i = ai pi a pn ai p 1 1 = p i a1 an an q: This is the input demand function. It can be. First. :::. though. This one is pretty easy to remember. And it has a cool comparative statics result: a a p1 1 pn n dC(q) = : (4. compare the marginal cost function to the original production function: F (x) = xa1 1 p1 M C(q) = a1 x an n a1 pn an an : . This means that the Cobb-Douglas production function gives us constant marginal cost.10) pn an an : We can substitute this back into (4. and q appears nowhere on the right-hand side. the input prices (p1 . pn ). t Plug it back into the original objective function p1 x1 + ::: + pn xn to get the cost function C(q) = p1 x1 + ::: + pn xn a a pn n an a1 p 1 1 q + ::: + pn = p1 p 1 a1 an pn a1 an pn p1 p1 = a1 q + ::: + an a1 an a1 a1 an pn p1 = (a1 + ::: + an ) q a1 an a a p1 1 pn n = q.

b 3 ln a + 2 ln b s. Solve the following problem: maxa.y 1. 2x + 4y = 120 [Hint: You should be able to check your answer against the general version of the problem in Section 4.10) and (4.t. 12a + 14b = 400 3. at the end of the the last section on interpreting the Lagrange multiplier. Use the Lagrange multiplier method to solve the following problem: s.y s.6 Problems max 12x2 y 4 x.11). we said that in a cost-minimization problem the Lagrange multiplier is just marginal cost? Compare equations (4.t. Solve the following problem: minx.t. And third. remember how.y s.5. 4.] 2. 3x + 2xy = 80 .y 16x + y x1=4 y 3=4 = 1 4. They are the same.CHAPTER 4. CONSTRAINED OPTIMIZATION 48 You can get the marginal cost function by replacing the xi ’ in the production s function with the corresponding pi =ai . Solve the following problem: max 3xy + 4x x. I told you so.t. Solve the following problem: min 5x + 2y x. 4x + 12y = 80 5.

where L is the number of workers employed (fractional p workers are okay). Each worker can produce 4 units per hour. This is a lame but instructive problem. and the …rm’ revenue function is s p R(L) = 30 L. Another lame but instructive problem: A …rm has the capacity to use 4 workers at a time. the farmer’ problem is s maxx 400x + 2x2 s. Does its sign make sense? (b) Find the second derivative of the pro…t function. [Hint: It had better be 4. Does its sign make sense? (b) Find the second derivative of the pro…t function. make sense? Does its sign (c) Set up the Lagrangian and use it to …nd the optimal value of L. L = 4 (a) Find the …rst derivative of the pro…t function. make sense? Does its sign (c) Set up the Lagrangian and use it to …nd the optimal value of x.CHAPTER 4. The …rm’ pro…t function is (L) = 30 4L 10L.) (d) Interpret the Lagrange multiplier. where x is the number of acres of corn planted. x = 10 (a) Find the …rst derivative of the pro…t function. s It must solve the problem p max 30 4L 10L L s. So. Does that mean that pro…t is minimized when x = 10? 7.t.] . (Hint: It had better be 10.t. Pro…t from growing an acre of corn is given by (x) = 400x + 2x2 . each worker must be paid \$10 per hour. (f) The second derivative of the pro…t function is positive. (e) Find the marginal value of an acre of land without using the Lagrange multiplier. CONSTRAINED OPTIMIZATION 49 6. A farmer has 10 acres of land and uses it to grow corn.

(f) The second derivative of the pro…t function is negative. and 0 < < 1. 50 (e) Find the marginal pro…t from a worker without using the Lagrange multiplier. Can you sign them? 9. The manufacturing facility can accomodate 60 workers and no more. repsectively. M > 0 is income. (b) Find @x =@M and @y =@M . This is the same as problem 2. Does that mean pro…t is maximized when L = 4? 8. where w denotes labor devoted to widget production. The production function for widgets is W = 20w1=2 . (a) Use a Lagrangian to determine how much of each product Bilco should produce per unit of time. Here is the obligatory comparative statics problem. and the production function for gookeys is G = 30g. chooses x and y to maxx. where g denotes labor devoted to gookey production. A …rm (Bilco) can use its manufacturing facility to make either widgets or gookeys. Both require labor only. px x + py y = M A consumer where px > 0 is the price of good x.y x y 1 s. (a) Show that x = M=px and y = (1 )M=py .CHAPTER 4. Can you sign them? (c) Find @x =@px and @y =@px . CONSTRAINED OPTIMIZATION (d) Interpret the Lagrange multiplier.t. and the prices of widgets and gokeys are \$9 and \$3 per unit. py > 0 is the price of good y. .11 but done using Lagrange multipliers. (b) Interpret the Lagrange multiplier. The wage rate is \$11 per unit of time.

[This last condition makes the constraint binding. and other than that you need not worry about it. The farmer wants to use the fencing to enclose the largest possible area. and solve the farmer’ problem. [Hint: The solution will s be a function of F and S. so the state law dictates that W S. (c) Which has a greater impact on the area the farmer can enclose. W is always the short side. and she also wants to obey the law. Furthermore. [Hint: s There should be two constraints. (a) Write down the farmer’ constrained maximization problem. a marginal increase in S or a marginal increase in F ? Justify your answer. state law dictates that every property must have a side smaller than S in length. A farmer has a …xed amount F of fencing material that she can use to enclose a property. one for each constraint. but it will be a rectangle with length L and width W . CONSTRAINED OPTIMIZATION 51 10.] (b) Write down the Lagrangian with two mulitpliers.] By convention. She does not yet have the property.CHAPTER 4. and in this case S < F=4. .] Please use as the second multiplier.

pn . then the budget constraint is p1 x1 + ::: + pn xn M. then. and income is M . :::. the goods are x1 . :::. If the prices are p1 . xn ) = q: Other times equality constraints are not the right thing to do. q. has the consumer choosing a commodity bundle to maximize utility. For example. the …rm’ cost-minimization s problem is to …nd the least-cost combination of inputs to produce a …xed amount of output. 52 .CHAPTER 5 Inequality constraints The previous chapter treated all constraints as equality constraints. xn ) be the production function when the inputs are x1 . The consumer choice problem. letting F (x1 . or. for example. :::. If she gets utility from saving. in which case the constraint would hold with equality. is that output must be q. xn . The constraint. It may be the case that the consumer spends her entire income. xn . :::. the constraint is F (x1 . Sometimes this is the right thing to do. :::. subject to the constraint that she does not spend more than her income.

salt. The goal for this chapter is to …gure out how to deal with inequality constraints. butter. In the real world …rms often have excess capacity. and the cost function is 4x2 .capacity constraints Let’ begin with a simple unconstrained pro…t maximization problem. and unbroken eggs. is to s maximize pro…t subject to the constraint that output not exceed capacity. The s …rm chooses an amount to produce x. Finally. though. so the optimum is actually a maximum. The best way to do this is through a series of exceptionally lame examples. The beauty of lame examples. which means that the capacity constraint does not hold with equality. in which case the budget constraint would not hold with equality. or q q. 8x = 0. it is di¢ cult to turn a cake back into ‡ our. The problem is max 80x x 4x2 . is that this transparency allows you to see exactly what is going on. sugar. When they build manufacturing facilities.1. The …rst-order condition is 80 so the optimum is x = 10. she may not want to spend her entire income. Often we want to assume that we cannot consume negative amounts. . The second-order condition is 8<0 which obviously holds. baking powder. Firms cannot produce negative amounts by transforming outputs back into inputs. What makes the examples lame is that the solutions are so transparent that it is hardly worth going through the math. INEQUALITY CONSTRAINTS 53 though. the size of the facility places a constraint on the maximum output the …rm can produce. the market price is …xed at 80. A capacity-constrained …rm’ problem. 5.CHAPTER 5. The problem is illustrated in Figure 5. After all. Firms have capacity constraints. then. As economists we must deal with these nonnegativity constraints.1 Lame example . economics often has implicit nonnegativity constraints.

(See why it’ a lame example?) The constraint will hold with s equality. max 80x 4x2 x s. This will obviously restrict s the …rm’ output because it would like to produce 10 units but can only s produce 8.t. in which case we say that the constraint is binding. INEQUALITY CONSTRAINTS \$ 54 π(x) x 8 10 12 Figure 5. x Form the Lagrangian L(x.CHAPTER 5.1 A binding constraint Now let’ put in a capacity constraint: x 8. ) = 80x The …rst-order conditions are 8 4x2 + (8 x): @L = 80 8x =0 @x @L = 8 x = 0: @ The second equation tells us that x = 8. Let’ look at s the math.1: A lame example using capacity constraints 5. and the …rst equation tells us that = 80 64 = 16: .1.

Remember that the interpretation of the Lagrange multiplier is that it is the marginal value of relaxing the constraint. First. we know that pro…t reaches its unconstrained maximum when x = 10. so the …rm would still produce 10 units and still generate exactly the same amount of pro…t. Nonbinding constraints are sometimes called slack. say. let’ plug them in. so the Lagrangian becomes L(x. and we have = 0. Because of this the capacity constraint is nonbinding. the marginal value of relaxing the constraint must be zero. So. let’ go back and look at the problem. x This problem generates the Lagrangian L(x. We can compute this directly: (x) = 80x 4x2 0 (x) = 80 8x 0 (8) = 80 64 = 16 Plugging the constrained value (x = 8) into the marginal pro…t function 0 (x) tells us that when output is 8. s max 80x x 4x2 12 s. The constraint does not rule this level of production out. we Since we already know the answers. so the constrained optimum is also x = 10. 5. s know that = 0.CHAPTER 5. so it is the marginal pro…t from relaxing the constraint.2 A nonbinding constraint Now let’ change the capacity constraint to x s 12 and solve the problem intuitively. x 13? The unconstrained maximum is still feasible. an increase in output leads to an increase in pro…t by 16. ) = 80x 4x2 .1. Let’ think about the Lagrange multiplier. In this case value is pro…t. We know that it is the mars ginal value of relaxing the constraint. ) = 80x 4x2 + (12 x): In particular. that is. INEQUALITY CONSTRAINTS 55 So far we’ done nothing new. The important step here is to think ve about . it does not hold with equality. How would pro…t change if we relaxed the constraint from x 12 to. Now that we know the answers. And this is exactly the Lagrange multiplier.t.

:::. :::. Di¤erentiating the Lagrangian yields @L = 80 8x =0 @x @L = 12 x = 0 @ The second equation obviously implies that x = 12. When output is 12 the pro…t function is downward-sloping. We arrived at our answers (x = 10. :::. we get a negative Lagrange multiplier when we are past the pro…t-maximizing level of output. in which case L(x1 . xn ) The Lagrangian is L(x1 . which is wrong. in which case the …rst equation tells us that = 80 8x = 80 96 = 16. You can see this in Figure 5. xn ) once again. xn . = 0) intuitively.:::. :::. How can we get them mechanically? After all.t. the term M g(x1 . If we solve the problem using our old approach we …nd that (1) the constraint is binding. Since the Lagrange multiplier is marginal pro…t. g(x1 . INEQUALITY CONSTRAINTS 56 which is just the original unconstrained pro…t function.xn max f (x1 .1) When the constraint is binding. which means that relaxing the constraint makes pro…t even lower. xn ) M s. When the constraint is nonbinding the Lagrange multiplier is zero. ) = f (x1 . :::. xn . the purpose of the math is to make sure we get the answers right without relying solely on our intuition. ) = f (x1 . :::. xn ). :::. and (2) the Lagrange multiplier is negative. ) = f (x1 . in which case L(x1 . To see why. xn ) + [M g(x1 . suppose we analyze our 12-unit constraint problem in the usual way.1. :::.CHAPTER 5. :::. xn )]: (5. xn ) is zero. xn . :::. One thing for sure is that we will need a new approach. :::.2 A new approach The key to the new approach is thinking about how the Lagrangian works. Suppose that the problem is x1 . So we need a condition that says [M g(x1 . xn )] = 0: . 5.

The …rst-order conditions for the inequality-constrained problem are @L = 0 @x1 . :::. xn )) < 0 we have violated the constraint. xn )]] = (M @ g(x1 . and it binds if g(x1 . In fact. xn )] = 0 is known as a complementary slackness condition. :::. :::. :::. so that is not allowed. The other constraint is g(x1 . but notice that @[ [M g(x1 . If one of the constraints is slack. xn )]] = M g(x1 . :::. :::. :::. It says that one of two constraints must bind. INEQUALITY CONSTRAINTS Note that 57 g(x1 . One constraint is 0. We can no longer do this when the constraint is nonbinding. looking at expression (5. and it binds if = 0. :::. Remember from the lame example when the capacity constraint was binding at x = 12 we got a negative Lagrange multiplier. in which case the complementary slackness condition also holds. The beauty of the complementary slackness condition is that it forces one of two constraints to bind using a single equation. xn )) really negative. xn )): @[ [M This is exactly what we need to be equal to zero. xn ) M . @L = 0 @xn @L = 0 @ 0 . xn ) @ and in the old approach we set this equal to zero. and that was the wrong answer. in which case the complementary slackness condition holds. xn ) = M . .CHAPTER 5. :::. We also need to restrict the Lagrange multiplier to be nonnegative. . The condition that @L = [(M @ g(x1 . But when (M g(x1 .1) we can make L really large by making both and (M g(x1 . the other one has to bind.

which means that relaxing the constraint cannot reduce the value of the objective function. Each Lagrange multiplier can be . INEQUALITY CONSTRAINTS 58 The …rst set of conditions (@L=@xi = 0) are the same as in the standard case. The last condition says that the Lagrange multiplier cannot be negative. We are left with only one solution. Then x = 12. or (3) both. (2) 12 x = 0. or the constraint does not bind. The last two conditions are the ones that are di¤erent. and it is the correct one: x = 10. Plugging = 0 into the …rst equation yields 80 8x 0 = 0. s Case 1: = 0. To do this. So what we have to do is …nd the solution when = 0 and …nd the solution when 12 x = 0. The secondlast condition ( @L=@ = 0) guarantees that either the constraint binds. x which generates the Lagrangian L(x. let’ go back to our lamest example: s max 80x x 4x2 12 s. = 0.CHAPTER 5. If = 0 then the second and third conditions obviously hold. We have a set of …rst-order conditions. in which case = 0.t. So case 2 cannot be the answer. but this does not tell us how to solve them. Case 2: 12 x = 0. which violates the last condition. in which case @L=@ = 0. ) = 80x The …rst-order conditions are 4x2 + (12 x): @L = 80 8x =0 @x @L = (12 x) = 0 @ 0 Now what? The methodology for solving this system is tedious. but it works. and plugging this into the …rst equation yields = 80 96 = 16. The general methodology for multiple constraints is as follows: Construct a Lagrange multiplier for each constraint. The second equation ( (12 x) = 0) is true if either (1) = 0. Let’ see what happens. or x = 10.

the prices of the two s goods are px = py = 1. in which case its constraint is binding.t. If only one does not lead to a violation. INEQUALITY CONSTRAINTS 59 either zero. in which case that constraint is nonbinding. y) = x1=3 y 2=3 . Most of them will lead to violations. 2) = x3 y 3 + 1 2 1 (60 x y) + 2 (120 x y): .2. x + y x+y 60 120 Why is this a lame example? Because we know that the second constraint must be nonbinding.y 1 2 Let’ look at another lame example: s s. If you want to maximize the objective function. The utility function is Cobb-Douglas. x + y 60 This looks like a utility maximization problem. We want to solve it mechanically. and from what we learned in Section 4. or it can be positive.CHAPTER 5. If several combinations do not lead to violations.3 Multiple inequality constraints max x 3 y 3 x. After all if a number is smaller than 60.5 we know that x = 20 and y = 40. Assign a separate Lagrange multiplier to each constraint to get the Lagrangian L(x. that is the answer. though. as can be seen in Figure 5. 5. and the consumer has 60 to spend. then you must choose which one is the best. You can do this by plugging the values you …nd into the objective function. y. 1. to learn the steps. you choose the case that generates the highest value of the objective function. The solution to this problem will be the same as the solution to the problem max x 3 y 3 x. The consumer’ utility function is u(x.t. it must also be strictly smaller than 120.y 1 2 s. Then try all possible combinations of zero/positive multipliers.

which implies that y = 0. So Case 1 cannot be the answer. so that only the second constraint binds. (4) 1 > 0.2: A lame example with two budget constraints The …rst-order conditions are @L @x @L @y @L 1 @ 1 @L 2 @ 2 1 2 1 y 2=3 3 x2=3 2 x1=3 = 3 y 1=3 = = = 0 0 1 (60 2 (120 1 2 =0 =0 1 2 x x y) = 0 y) = 0 This time we have four possible cases: (1) 1 = 2 = 0. The second 3 2 equation reduces to 3 (x=y)1=3 = 0. Case 1: 1 = 2 = 0.CHAPTER 5. 2 = 0. (3) 1 = 0. so that both constraints bind. As before. so that neither constraint binds. (2) 1 > 0. we need to look at all four cases. so that only the …rst constraint binds. 2 > 0. 2 > 0. but this cannot be true if y = 0 because we are not allowed to divide by zero. There . INEQUALITY CONSTRAINTS y 120 60 60 40 E 20 60 120 x Figure 5. In this case the …rst equation in the …rst-order conditions reduces to 1 (y=x)2=3 = 0.

But then the …rst constraint. 1 = 1 2=3 2 . But this obviously violates the constraints. y = 80. y = 40. x + y 60 cannot possibly hold. and since there are no constraints we want both x and y to be as large as possible. and it corresponds to the case shown in Figure 5. 3 . If neither constraint binds. and 1 = 1 22=3 > 0. is violated because x + y = 120. The second condition implies that x + y = 120. and 2 = 1 22=3 > 0. 2 > 0.CHAPTER 5. Case 2: 1 > 0. The …rst of these conditions implies that the …rst constraint binds. There is an easier way to see this. y = 40. The 3 remaining constraint.2. The …rst-order conditions reduce to 1 y 2=3 3 x2=3 2 x1=3 3 y 1=3 60 x 1 = 0 = 0 1 y = 0 and the solution to this system is x = 20. the problem becomes 2 1 max x 3 y 3 x. Case 2 works. INEQUALITY CONSTRAINTS 61 is an easier way to see this. Only Case 2 works. x + y 120. though. So x ! 1 and y ! 1. x + y 60. is satis…ed because x + y = 60 < 120. so we know the solution: x = 20. so that x + y = 60. though. Case 3: 1 = 0. 2 = 0. If the second constraint binds. But then the inner budget constraint is violated. Now the …rst-order conditions reduce to 1 y 2=3 3 x2=3 2 x1=3 3 y 1=3 120 x 2 = 0 = 0 2 y = 0 The solution to this system is x = 40. so that we are on the outer budget constraint in Figure 5. and 2 = 0. just by looking at the constraints and exploiting the lameness of the example.y The objective function is increasing in both arguments. so Case 4 does not work. x + y = 120. The 3 remaining constraint.2. 2 > 0. Case 4: 1 > 0. so Case 3 does not work.

3x + 4y x y 60 0 0 This problem has three constraints. We have been doing this throughout this book. 3 0 Since there are three constraints. such as in the following example: max 2x + y x.y s. 0. It is useful in what follows to refer to the …rst constraint.4 A linear programming example A linear programming problem is one in which the objective function and all of the constraints are linear.CHAPTER 5. 1. The Lagrangian is L(x. as a budget constraint. that is (60 3x 4y) and (x 0) and (y 0) so that they are nonnegative. y. We are going to narrow some of them down intuitively before going on. so we must use the multiple constraint methodology from the preceding section. The other two constraints are nonnegativity constraints. The …rst-order conditions are @L @x @L @y @L 1 @ 1 @L 2 @ 2 @L 3 @ 3 1 = 2 = 1 = = = 3 4 1 (60 2x 3y 1 + + 3x 2 =0 =0 4y) = 0 1 3 =0 =0 2 0. there are 23 = 8 possible cases. The …rst .t. 3x + 4y 60. 3) = 2x + y + 1 (60 3x 4y) + 2 (x 0) + 3 (y 0): Notice that we wrote all of the constraint terms. INEQUALITY CONSTRAINTS 62 5. 2.

(3) We will 1 > 0. so x = 0. 2 > 0. 2 = 0. so that x > 0 and y = 0. 3 = 0. A binding budget constraint means that we cannot have both x = 0 and y = 0. So Case 2 is the solution. 3 > 0. 2 = 0. so that both x and y are positive. To see what we just did. The objective function then takes the value 2x + y = 2 20 + 0 = 40: Case 3: 1 > 0. because if we did then we would have 3x + 4y = 0 < 60. saying that the consumer cannot consume negative amounts of the goods. These cannot both hold. The only question is whether one of the other two constraints binds. but not in Case 3. so that x = 0 but y > 0. For the budget constraint to hold we must have y = 15. The other two constraints are nonnegativity constraints. and the second reduces to 1 = 1=4. 3 = 0. consider these one at a time. In Case 1 the maximized objective function took a value of 15 and in Case 2 it took a value of 40. 2 > 0.3. which means that 1 > 0. Case 1: 1 > 0. and 3 > 0. and the budget constraint would not bind. The …rst two equations in the …rst-order conditions become 2 1 3 4 1 1 = 0 = 0 The …rst of these reduces to 1 = 2=3. Since 2 > 0 the constraint x 0 must bind. We got solutions in Case 1 and Case 2. look at Figure 5. and the objective function is increasing in both of its arguments. We are now left with three possibilities: (1) 1 > 0 so the budget constraint binds. In this case both x and y are positive. so Case 3 does not work. Since there is only one budget-type constraint. and 3 = 0. This time we have y = 0. INEQUALITY CONSTRAINTS 63 constraint is like a budget constraint. (2) 1 > 0. 2 = 0. Case 1 is the corner solution where the budget line meets the y-axis. The three constraints identify a triangular feasible set. which is obviously higher. and the budget constraint implies that x = 20. and Case 2 is the corner solution where the budget line . 2 = 0.CHAPTER 5. So which is the answer? The one that generates a larger value for the objective function. and 3 = 0. This yields a value for the objective function of 2x + y = 2 0 + 15 = 15: Case 2: 1 > 0. it has to bind.

:::. and we chose the corner solution that gave us the larger value.W. The methodology for solving linear programming problems involves …nding all of the corners and choosing the corner that yields the largest value of the objective function. Tucker. The objective was to …nd the point in the feasible set that maximized the function f (x. This generates lots and lots of corners. 5. We did this by comparing the values we got at the two corner solutions. and it has more than one budget constraint. y) = 2x + y. The …rstorder conditions we will derive are known as the Kuhn-Tucker conditions.3: Linear programming problem meets the x-axis. The alternative formulation was developed by Harold Kuhn and A. and the real issue in the study of linear programming is …nding an algorithm that e¢ ciently checks the corners. INEQUALITY CONSTRAINTS y 64 15 indifference curve x 20 Figure 5. Case 3 is an "interior" solution that is on the budget line but not on either axis.5 Kuhn-Tucker conditions Economists sometimes structure the …rst-order conditions for inequalityconstrained optimization problems di¤erently than the way we have done it so far. Begin with the very general maximization problem. so it involves …nding x1 . letting x be the vector .CHAPTER 5. and it is known as the Kuhn-Tucker formulation. xn . A typical problem has more than two dimensions.

1 . g 1 (x) x1 b1 . . :::. :::. vn @g k + v1 = 0 k @x1 @g k + vn = 0 @xn k 0 There are 2n + k conditions plus the n + k nonnegativity constraints for the . :::. INEQUALITY CONSTRAINTS x = (x1 . :::. . @f @L @g 1 = ::: 1 @xn @xn @xn @L = 1 [b1 g 1 (x)] = 0 1 @ 1 . To solve this. . vk ) = f (x) + k X i=1 i [bi g i (x)] + n X j=1 vj xj : We get the following …rst-order conditions: @f @L @g 1 = ::: 1 @x1 @x1 @x1 . . k . xn 0: There are k "budget-type" constraints and n non-negativity constraints. :::.t. :::. @L = k [bk g k (x)] = 0 k @ k @L v1 = v1 x1 = 0 @v1 . @L = vn xn = 0 vn @vn 1 . :::. k . v1 . form the Lagrangian L(x.CHAPTER 5. xn ): max f (x) x 65 s. . . g k (x) bk 0. v1 .

n j = 1. :::. v1 . INEQUALITY CONSTRAINTS 66 multipliers. :::. n (5. :::. 1 . k i = 1. :::. :::. :::. k . :::. n j = 1.2) Suppose instead we had constructed a di¤erent Lagrangian: K(x. :::. n Pay close attention to the …rst and third equations. n @xi @xi @K @L = j = 0 for j = 1. vk ) = K(x. k j @ j @ j vi xi = 0 for i = 1. :::. :::. It is useful to have some shorthand to shrink this system down: @L = 0 for @xi @L = 0 for j @ j vi xi = 0 for 0 for j vi 0 for i = 1. n 0 for j = 1. k i = 1. The two Lagrangians are related.CHAPTER 5. If vi = 0 then the …rst equation yields @K @K vi = 0 =) = 0 =) xi = 0: @xi @xi On the other hand. if vi > 0 then the i-th inequality constraint. only has k multipliers for the k budget-type constraints.2) as @L @K = + vi = 0 for i = 1. 1 . known as the Kuhn-Tucker Lagrangian. :::. k) = f (x) + This Lagrangian. :::. with L(x. and no multipliers for the nonnegativity constraints. is binding which means that vi > 0 =) xi = 0 =) xi @K = 0: @xi . 1 . k j vi 0 for i = 1. :::. :::. xi 0. k) + n X j=1 k X i=1 i [bi g i (x)]: vj xj : We can rewrite the system of …rst-order conditions (5.

:::. …rst-order conditions is xi xi @K @xi @K j @ j j 67 The new set of = 0 for i = 1. Consider the following problem: = x2 y + 1 (24 2x 3y) + 2 (20 4x y) .CHAPTER 5. @xi The Kuhn-Tucker conditions use this information.t. Then the Kuhn-Tucker conditions reduce to zj @K=@zj = 0 and zj 0 for j = 1. :::. n. :::. This is fairly easy to remember. 5. n = 0 for j = 1. xn . it simpli…es the original set of conditions by removing n equations and n unknowns. :::. Remember that the Kuhn-Tucker Lagrangian is K(x1 . n (5. :::. 1 . It is also a very symmetriclooking set of conditions. is remembering that they are just a particular reformulation of the standard inequality-constrained optimization problem with multiple constraints. 1. INEQUALITY CONSTRAINTS Either way we have @K = 0 for i = 1. which is an advantage. :::. :::.3) xi This is a system of n + k equations in n + k unknowns plus n + k nonnegativity constraints. Instead of distinguishing between x’ and ’ let s s. :::. The key to Kuhn-Tucker conditions. s.y x2 y s. Thus. 2x + 3y 24 4x + y 20 The Lagrangian can be written L(x. zn+k ). n+k. them all be z’ in which case the Kuhn-Tucker Lagrangian is K(z1 . k 0 for j = 1. y.6 Problems maxx. though. :::. k 0 for i = 1. k ). 2) 1.

y x2 y s.CHAPTER 5.t. 2x + 3y = 24 Do the resulting values of x and y satisfy 4x + y (b) Solve the alternative problem maxx.t.t. 2x + 3y = 24 Do the resulting values of x and y satisfy 4x + y (b) Solve the alternative problem maxx. 4x + y = 36 Do the resulting values of x and y satisfy 2x + 3y 24? 36? .y x2 y s. INEQUALITY CONSTRAINTS (a) Solve the alternative problem maxx. which of the two constraints bind? What do these imply about the values of 1 and 2 ? (d) Solve the original problem. 4x + y = 20 Do the resulting values of x and y satisfy 2x + 3y 24? 20? 68 (c) Based on your answers to (a) and (b).t. 2.y x2 y s. Consider the following problem: maxx. 2x + 3y 24 4x + y 36 The Lagrangian can be written L(x. 2) = x2 y + 1 (24 2x 3y) + 2 (36 4x y) (a) Solve the alternative problem maxx. (e) Draw a graph showing what is going on in this problem. y.t.y x2 y s. 1.y x2 y s.

CHAPTER 5. x + 4y 5x + 2y The Lagrangian can be written L(x. 4. x + 4y = 36 Do the resulting values of x and y satisfy 5x + 2y (b) Solve the alternative problem max 4xy x. 5x + 2y = 45 Do the resulting values of x and y satisfy x + 4y 36? (c) Find the solution to the original problem. (e) Draw a graph showing what is going on in this problem. y. 2) = 4xy 3x2 + 1 (36 x 4y) + 2 (45 5x 2y) (a) Solve the alternative problem max 4xy x.y 45? 3x2 s. 3. y. including the values of 1 and 2 .y 8x 24 30 s.y 3x2 s. INEQUALITY CONSTRAINTS 69 (c) Based on your answers to (a) and (b).t.t. x + 4y 5x + 2y The Lagrangian can be written L(x. 1.t. Consider the following problem: max 3xy x. 1. which of the two constraints bind? What do these imply about the values of 1 and 2 ? (d) Solve the original problem.y 3x2 36 45 s. 2) = 3xy 8x + 1 (24 x 4y) + 2 (30 5x 2y) . Consider the following problem: max 4xy x.t.

y 30? 8x s. 4x + 2y 42 x 0 y 0 (a) Write down the Kuhn-Tucker Lagrangian for this problem.y 70 8x s. 5.t.y x2 y s. (c) Solve the problem.t. (c) Solve the problem. 6. (b) Write down the Kuhn-Tucker conditions.t. INEQUALITY CONSTRAINTS (a) Solve the alternative problem max 3xy x. y 12 0 (a) Write down the Kuhn-Tucker Lagrangian for this problem. Consider the following problem: maxx. .t. x + 4y = 24 Do the resulting values of x and y satisfy 5x + 2y (b) Solve the alternative problem max 3xy x. (b) Write down the Kuhn-Tucker conditions. including the values of 1 and 2 .y s.CHAPTER 5. x + y x. 5x + 2y = 30 Do the resulting values of x and y satisfy x + 4y 24? (c) Find the solution to the original problem. Consider the following problem: max xy + 40x + 60y x.

PART II    SOLVING SYSTEMS OF  EQUATIONS            (linear algebra)        .

They also behave di¤erently that ordinary real numbers. and they are useful for many things. The element in row i and 72 A matrix is a rectangular array of numbers. The matrix A above has 3 rows and 2 columns. 6. so it is a 3 2 matrix. This chapter tells how to work with matrices and what they are for. Matrix dimensions are always written as (# rows) (# columns). An element of a matrix is one of the entries. They have dimensions corresponding to the number of rows and number of columns.1 Matrix algebra Matrices are typically denoted by capital letters.CHAPTER 6 Matrices Matrices are 2-dimensional arrays of numbers. such as the one below: 0 1 6 5 3 A: A=@ 2 1 4 .

the order matters. @ . C . . MATRICES column j is denoted aij . and so in 0 a11 B a21 B A=B . . A tA = t @ . . . To multiply matrices A and B. . . . A . . . First. . Therefore. A: . . . . . . A @ . both matrices are n k. Matrix addition is done element by element: 0 1 0 1 0 1 a11 a1k b11 b1k a11 + b11 a1k + b1k B . as . . C: . . . . C: . In the above example. . A @ ... an1 ank tan1 tank The big deal in matrix algebra is matrix multiplication. A matrix in which the number of rows equals the number of columns is called a square matrix. an2 ank 73 The matrix A above is an n k matrix. . an1 general a matrix looks like 1 a12 a1k a22 a2k C C .... @ . . C B . . it is possible to multiply a matrix by a scalar. C + B .. though. In such a matrix. several things are important. . . This is done element by element: 0 1 0 1 a11 a1k ta11 ta1k B .. C=B . in matrix notation vectors are written vertically: 0 1 x1 B . elements of the form aii are called diagonal elements because they land on the diagonal of the square matrix. . it is important to make sure that the dimensions of the two matrices are identical. xn When we write a vector as a column matrix we typically leave o¤ the accent and write it simply as x.CHAPTER 6. A: . . An n-dimensional vector can be thought of as an n 1 matrix.. C x = @ . . an1 ank bn1 bnk an1 + bn1 ank + bnk Before one can add matrices. Just like with vectors. A=@ .. . .

@ .B= 3 4 2 4 : . As an example. Element cij is found by multiplying each member of row i in matrix A by the corresponding member of column j in matrix B and then summing. MATRICES 74 you will see. C: . . Second. . . For there to be the right number of elements for this to work. BA = 3 6+4 4 3( 1) + 4 3 ( 2)6 + 4 4 ( 2)( 1) + 4 3 = 34 9 4 14 : 6 3 + ( 1)( 2) 6 4 + ( 1)4 4 3 + 3( 2) 4 4+3 4 = 20 20 6 28 : 6 4 1 3 . . A@ . an1 Element c11 is we write the matrices A 10 a12 a1k b11 C B b21 a22 a2k C B . . an2 ank bk1 and B side-by-side: 1 b12 b1m b22 b2m C C .. with A an n k matrix and B a k m matrix. Then k X cij = ais bsj : s=1 This is easier to see when 0 a11 B a21 B C = AB = B . The result will be an n m matrix. element c11 is found by multiplying each member of row 1 in matrix A by the corresponding member of column 1 in matrix B and then summing.. bk2 bkm c11 = a11 b11 + a12 b21 + ::: + a1s bs1 + ::: + a1k bk1 : So. CB .CHAPTER 6. A . . . multiply the two matrices below: A= Then AB = However. Let C = AB. So. . . with the k’ canceling out. the number of columns in the …rst matrix must equal the number of rows in the second. . . the number of columns in A must equal the number of rows in B. . one must multiply an n k matrix on the left by a k m matrix on the right. . . The formula for multiplying matrices is s as follows.

IA = A. . multiplying a matrix by the identity matrix is the same as multiplying an ordinary number by 1. ank 1 C C C: A Obviously. .B= 3 4 2 4 : . .. To check this. C . Then AI = A. we get bij = 0 a1j + 0 a2j + ::: + 1 aij + ::: + 0 anj = aij . .CHAPTER 6. 0 0 1 The transpose is generated by switching the rows and columns of the original matrix. the transpose of an n k matrix is a k n matrix. Letting B = IA. AB 6= BA. an1 an2 ank is the matrix AT given by B B AT = B @ 0 a11 a21 a12 a22 . . A= 6 4 1 3 . A @ . To see why it is special. and let I be the n-dimensional identity matrix. and matrix multiplication is not commutative. and is called the identity matrix. . . . . .. . . C: . . a1k a2k an1 an2 . . . So. but you have to switch the order of the matrices. So. . we use the terminology that we left-multiply by B when we want BA and right-multiply by B when we want AB. Because of this. MATRICES 75 is special. The transpose of the matrix 1 0 a11 a12 a1k B a21 a22 a2k C C B A=B . . A @ . The same thing happens when we right-multiply A by a kdimensional identity matrix. consider any n k matrix A. . Note that (AB)T = B T AT so that the transpose of the product of two matrices is the product of the transposes of the two matrices. .. . . . consider the following example employing the same two matrices we used above. . . Because of this. The square matrix 0 1 1 0 0 B 0 1 0 C B C I = B ..

. . x = B . . C . and b = B . that is.CHAPTER 6. but AT B T = 6 4 1 3 3 4 2 4 = 34 4 9 14 = (BA)T : 3 4 2 4 20 20 6 28 6 4 1 3 . (AB)T = . C . when does a solution exist. As you have seen in the optimization examples. . A . Letting 0 0 0 1 1 1 a11 a1n x1 b1 B . how do we …nd the solution when it exists? The answers to both questions depend on the inverse matrix A 1 . an1 ann xn bn Ax = b: (6. . Equation (6. systems of equations regularly arise in economics. . : 76 6 4 1 3 20 6 20 28 6. which is a matrix having the property that A 1 A = AA 1 = I. .1) .2) we can rewrite the system of equations (6. First. A A=@ . such as this one: a11 x1 + ::: + a1n xn = b1 a21 x1 + ::: + a2n xn = b2 (6. C . @ . . A @ . when can we …nd a vector x such that Ax = b? Second.1) as The primary use of matrices is to solve systems of equations. . an1 x1 + ::: + ann xn = bn We can write this system easily using matrix notation. MATRICES AB = AT = B T AT = as desired.. .2) raises two questions.2 Uses of matrices Suppose that you have a system of n equations in n unknowns. BT = = 3 4 20 6 20 28 2 4 : = (AB)T .

3 Determinants Determinants of matrices are useful for determining (hence the name) whether a matrix has an inverse and also for solving equations such as (6. so we must devote considerable attention to whether or not a matrix has an inverse. When f is a function of several variables. every number except 0 has a multiplicative inverse. though. If A has an inverse (and that is a big if). MATRICES 77 that is.CHAPTER 6.. and that for ordinary numbers the inverse of y is the number y 1 such that y 1 y = 1. @f @xn @x1 . @2f @x1 @xn @2f @x2 @xn . . . . C C C: C A 6. however. and we will do this in Chapter 9. the second-order condition is determined by the sign of the second derivative f 00 . then (6.2).2) can be solved by left-multiplying both sides of the equation by the inverse matrix A 1 : A 1 Ax = A 1 b x = A 1b For ordinary real numbers. Recall that in a single-variable optimization problem with objective function f . C @ . The de- . . we can write the vector of …rst partial derivatives 0 @f 1 and the matrix of second partials 0 @2f @2f B B B B @ @x2 1 @2f @x2 @x1 B . . 1 @2f @xn @x1 @2f @xn @x2 @2f @x2 n The relevant second-order conditions will come from conditions on the matrix of second partials. this formula is exactly what we are looking for. . . A . @x1 @x2 @2f @x2 2 . the matrix that you multiply A by to get the identity matrix. . Remembering that the identity matrix plays the role of the number 1 in matrix multiplication. A second use of matrices is for deriving second-order conditions for multivariable optimization problems. Many more matrices than just one fail to have an inverse.

. De…ning it depends on the size of the matrix. Then jAj = a1j c1j + a2j c2j + ::: + anj cnj . it is the determinant of the submatrix Aij multiplied by 1 if i + j is odd and multiplied by 1 if i + j is even. and you can get that submatrix by eliminating the element’ row and column from the original s matrix. Note that there is one submatrix for each element. Speci…cally. Begin with the 3 0 1 a11 a12 a13 A = @ a21 a22 a23 A : a31 a32 a33 In general. MATRICES 78 terminant of a square matrix A (and the matrix must be square) is denoted jAj. that is. or any other square matrix for that matter. if we delete the second row and the …rst column we are left with the submatrix A21 = a12 a13 a32 a33 : 3 matrix we go through some more steps. There are two ways to do it.CHAPTER 6. Now look at a 2 2 matrix A= Here the determinant is de…ned as jAj = For a 3 matrix a11 a12 a21 a22 = a11 a22 a21 a12 : 3 a11 a12 a21 a22 : We can get a submatrix of A by deleting a row and column. It can be either positive or negative. Every element also has something called a cofactor which is based on the element’ submatrix. The determinant jAj is simply a11 . Using these de…nitions we can …nally get the determinant of a 3 3 matrix. so don’ confuse the determinant with t the absolute value. even though they both use the same symbol. submatrix Aij is obtained from A by deleting row i and column j. Start with a 1 1 matrix A = (a11 ). the cofactor of aij is the number s cij given by cij = ( 1)i+j jAij j. The most common is to choose a column j. For example.

Choosing column j = 1 gives us a11 a12 a21 a22 = a11 c11 + a12 c12 = a11 a22 + a12 ( a21 ). = a11 c11 + a21 c21 + a31 c31 = a11 (a22 a33 a32 a23 ) a21 (a12 a33 a32 a13 ) +a31 (a12 a23 a22 a13 ): We can also …nd determinants by choosing a row. where c11 = a22 because 1 + 1 is even and c21 = a12 because 1 + 2 is odd. let’ look at a 3 s a11 a12 a13 a21 a22 a23 a31 a32 a33 The freedom to choose any row or column allows one to use zeros strategically. .. A A=@ . .. MATRICES 79 Before we see what this means for a 3 3 matrix let’ check that it works s for a 2 2 matrix. j = 2: a11 a12 a21 a22 = a12 c12 + a22 c22 = a12 ( a21 ) + a22 a11 : 3 matrix. an1 ann . when evaluating the determinant 6 8 2 0 9 4 1 0 7 it would be best to choose the second row because it has two zeros. Begin with the matrix s 1 0 a11 a1n B . then the determinant is given by jAj = ai1 ci1 + ai2 ci2 + ::: + ain cin . .CHAPTER 6. We get exactly the same thing if we choose the second column. . For example. C: . and the determinant is simply a21 c21 = 2( 60) = 120: 6. If we choose row i. Finally. choosing j = 1.4 Cramer’ rule s The process of using determinants to solve the system of equations given by Ax = b is known as Cramer’ rule.

bn an2 B B Bi = B @ 0 a11 a21 . . so that 0 b1 a12 B b2 a22 B B1 = B ..CHAPTER 6. an1 a1(i a2(i . @ . MATRICES Construct the matrix Bi from A by replacing column vector b. . . .. . . ann 1 C C C A a1n a2n . . .. . . . . bn an(i+1) . 1 According to Cramer’ rule. . . s equations is 4x1 + 3x2 = 18 5x1 3x2 = 9 Adding the two equations together yields 9x1 = 27 x1 = 3 x2 = 2 Now let’ do it using Cramer’ rule... the solution to Ax = b is the column vector x s where jBi j : xi = jAj Let’ make sure this works using a simple example. . 1) b1 a1(i+1) b2 a2(i+1) . . an(i 1) 1) 80 the i-th column of A with the a1n a2n . . . . ann and . We have s s A= Generate the matrices B1 = 18 9 3 3 and B2 = 4 18 5 9 : 4 5 3 3 and b = 18 9 : C C C: A The system of .

and it helps to de…ne 0 1 x11 x1n B . Singular matrices do not have 6 inverses. if it exists. A A 1=@ .CHAPTER 6. A .5 Inverses of matrices One important implication of Cramer’ rule links the determinant of A to s the existence of an inverse. we know from Cramer’ rule that s xi = jBi j = jAj. To see how. Also. recall that the inverse is de…ned so that AA 1 = I. 0 . For this number to exist. xn1 xnn 1 x1i B . C xi = @ . . . and so 0 1 1 B 0 C B C e1 = B . but nonsingular matrices do. it must be the case that jAj = 0. . to the system Ax = b is x = A 1 b. MATRICES Now compute determinants to get jAj = 27. xni 0 Remember that the coordinate vector ei has zeros everywhere except for the i-th element. jB1 j = jB1 j jAj jB2 j jAj 81 81. . C .. and jB2 j = 81 27 54 27 54: Applying Cramer’ rule we get s x= ! = = 3 2 : 6. To see why. 6 This is su¢ ciently important that it has a name: the matrix A is singular if jAj = 0 and it is nonsingular if jAj = 0. A : . With some clever manipulation we can use Cramer’ rule to invert the s matrix A. and We want to …nd A 1 . C @ .. recall that the solution. which is 1.

. Let’ check to see if this works for a 2 2 matrix. . . .CHAPTER 6. . Note that only one element of the j-th i i column of Bj is non-zero. Then i-th column of the inverse matrix A 1 can be found by applying Cramer’ rule to the system s Axi = ei . . So. . B .. we can take the determinant of Bj by using the j-th column and getting i Bj = ( 1)i+j jAij j = cij . . .3) column of A with the a1n . . . a(i 1)n ain a(i+1)n . B2 = a 1 c 0 2 . . B B a(i 1)(j 1) 0 a(i 1)(j+1) B a(i 1)1 B i ai(j 1) 1 ai(j+1) Bj = B ai1 B a(i+1)1 a(i+1)(j 1) 0 a(i+1)(j+1) B B . .. Then 0 a11 a1(j 1) 0 a1(j+1) . Note that ei i j is the i-th column of the identity matrix.3) is Once again we can only get an inverse if jAj = 0. . . Use the matrix s A= We get 1 B1 = a b c d : 1 b 0 d 2 . 1 C C C C C C C C C A . . . . . . . @ . . . . . ... . So. . 6 We can simplify this further. . . i Construct the matrix Bj from A by replacing the j-th column vector ei . and B2 = a 0 c 1 : . B1 = 0 b 1 d 1 . MATRICES 82 and ei is the column vector with ei = 1 and ei = 0 when j 6= i. ann The solution to (6. which is a cofactor of the matrix A. an1 an(j 1) 0 an(j+1) i Bj : xji = jAj (6. . we get the formula for the inverse: xji = ( 1)i+j jAij j : jAj Note that the subscript on x is ji but the subscript on A is ij.

= 1 ad bc d c b a : 1 ad 1 ad bc 1 0 0 1 bc d c b a a b c d ba + ba cb + ad ad bc ca + ac : 6. Thus.6 Problems 6 3 4 2 3 9 1 0 2 3 6 5 1. Perform the following computations: 0 1 0 1 8 10 3 3 1 4 A (a) 6 @ 4 5 A 2 @ 6 1 2 6 2 3 2 1 4 4 1 0 1 2 2 1 0 0 B 3 3 2 1 0 AB @ 7 2 1 0 7 4 5 0 10 3 1 1 1 1 @ 2 1 1 A@ 1 2 2 1 1 0 C C 1 A 0 1 5 1 A 1 .CHAPTER 6. B2 = 2 c. Perform the following computations: (a) 4 (b) 0 5 2 1 1 4 2 (c) @ 3 1 (d) 2 2. MATRICES The determinants are 1 2 B1 = d. 83 We also know that jAj = ad A We can check this easily: A 1A = = = 1 bc. B1 = 1 b. and B2 = a.

CHAPTER 6. Use Cramer’ rule to solve the following system of equations: s 5x 2y + z = 9 3x y = 9 3y + 2z = 15 2 0 @ 1 3 (b) 0 6 1 1 0 A 1 . Find the determinants of the following matrices: (a) 0 3 4 6 1 3 1 (b) @ 2 7 2 0 1 0 2 A 6 5. Find the determinants of the following matrices: (a) 0 1 1 4 2 4 C C 3 4 A 6 4 10 1 5 A@ 1 A 3 4. Use Cramer’ rule to solve the following system of equations: s 6x 2y 3z = 1 2x + 4y + z = 2 3x z = 8 6. MATRICES 5 1 @ 4 0 (b) 1 2 (c) 0 10 3 0 A@ 1 6 1 2 0 5 B 0 6 1 3 0 B 1 5 2 1 @ 1 0 0 2 0 1 10 5 1 @ 1 3 1 5 0 2 2 1 4 5 1 4 5 A 5 84 (d) 3.

Invert the following matrices: (a) 0 2 3 2 1 85 8.CHAPTER 6. Invert the following matrices: (a) 0 4 2 1 4 1 1 1 2 (b) @ 0 1 1 A 1 1 0 5 1 @ 0 2 (b) 0 1 1 1 1 A 3 . MATRICES 7.

xn ). They use n = 2 to allow graphical analysis. :::. . This expands to a11 x1 + a12 x2 + ::: + a1n xn = b1 a21 x1 + a22 x2 + ::: + a2n xn = b2 . . an1 x1 + an2 x2 + ::: + ann xn = bn The task for this chapter is to determine (1) whether the system has a solution (x1 .1) 1 vectors. and (2) whether that solution is unique.CHAPTER 7 Systems of equations Think about the general system of equations Ax = b where A is an n n matrix and x and b are n the system of equations (7. 86 . We will use three examples to motivate our results.

Example 5 x 2y This one has no solution. ann 1 b1 . .1 7. that is. So.1) by A 1 yields x = A 1 b. what we really want to know is.. . and ending up with a matrix in which all the elements below the diagonal are 0. which is n (n+1). . x 2 1 ). when does an inverse exist? We already know that an inverse exists if the determinant is nonzero. y = 4 2x = 5 7. bn Write the augmented matrix . then left-multiplying both sides of (7. B = (Ajb) = @ . . A .1. y) = (1. SYSTEMS OF EQUATIONS Example 3 2x + y = 6 x y = 3 The solution to this one is (x.1. Example 4 x 4y 2y = 1 2x = 2 87 This one has an in…nite number of solutions de…ned by (x. 4). if jAj = 0. an1 0 a1n .1 Identifying the number of solutions The inverse approach If A has an inverse A 1 . y) = (x. C . This is the row-echelon form of the matrix.2 Row-echelon decomposition a11 B .CHAPTER 7.. 6 7. The goal is to transform the matrix B through operations that consist of multiplying one row by a scalar and adding it to another row.

CHAPTER 7. If there are rows with zeros everywhere except in the last column. Example 4 has an in…nite number of them. and Example 5 has no solution. De…nition 1 The rank of a matrix is the number of nonzero rows in its row-echelon form: Proposition 9 The n rank is n. n square matrix A has an inverse if and only if its . If the row-echelon form of the augmented matrix has only nonzero diagonal elements. SYSTEMS OF EQUATIONS Example 6 (3 continued) Form the augmented matrix B= Multiply the …rst row by R= 2 1 1 1 1 2 88 2 1 1 1 6 3 and add it to the second row to get 1 2 1 6 3 3 = 2 0 1 3 2 6 6 Example 7 (4 continued) Form the augmented matrix B= 1 2 2 4 1 2 Multiply the …rst row by 2 and add it to the second row to get R= 1 2 2+2 4 4 1 2+2 = 1 0 2 0 1 0 Example 8 (5 continued) Form the augmented matrix B= 1 2 1 2 4 5 Multiply the …rst row by 2 and add it to the second row to get R= 1 1 2+2 2 2 4 5+8 = 1 0 1 0 4 13 Example 3 has a unique solution. If it has some rows that are zero. there are in…nitely many solutions. there is a unique solution. there is no solution. These results correspond to properties of the row-echelon matrix R.

Each column of A is an n 1 vector.4 Graphing in column space This approach is completely di¤erent but really useful. So is b. The question becomes. What happens if we move to three dimensions? An equation reduces the dimension by one.y) space This is pretty simple and is shown in Figure 7. we get a unique solution if the planes intersect in a single point. where x and y are scalars.3 Graphing in (x. SYSTEMS OF EQUATIONS 89 y x –y = – 3 y y 2y –2x = 5 x 2x + y = 6 x x –2y = 1 4y –2x = – 2 x –y = 4 x Figure 7. is b in the column space.CHAPTER 7. We get a unique solution in example 3.1. In example 3 the equations constitute two di¤erent lines that cross at a single point. Two planes intersect in a line. So. and no solution in example 5.1. In example 4 the lines coincide. an in…nite number of solutions if the three planes intersect in a line or in a plane.1: Graphing in (x. the space spanned by the columns of A? A linear combination of vectors a and b is given by xa + y b. in…nite number of solutions when the lines coincide (center graph). that is. an in…nite number of them in example 4. In example 5 the lines are parallel. The equations are lines. The intersection of a line and a plane is a point. The span of the vectors x and y is the set of linear combinations of the . and no solutions when the lines are parallel (right graph) 7. and no solution if two or more of the planes are parallel.1. 7. so each equation identi…es a plane. y) space: One solution when the lines intersect (left graph).

2) is also on that line. the column vectors are (1.CHAPTER 7. there is a solution. In this case the vector b = (1. and there is no solution.2) x b = (1. This leads to our rule for a solution: If b is in the span of the columns of A. SYSTEMS OF EQUATIONS 90 y (– 2. so we can solve this. The vector b = (6. which span a single line. though.2. 4). 1) and (1.– 2) (1. 1). it is possible to …nd numbers x and y that solve x(2. so there is a solution. They are on the same line. Look at left graph in Figure 7. This time.– 3) y y b = (4. The column space is spanned by the two vectors (2.2.2. 1) + y(1. z2 ). 5) is not on that line. These span the entire plane. 2).4) (2. For any vector (z1 . 2) and ( 1.1) x (1. Example 11 (5 continued) In the right panel of Figure 7. 2) and ( 2. 1) = (z1 . 1).-1) b = (6. but when the vector b is not in the column space there is no solution (right graph) two vectors. the vector b = (4.– 2) x Figure 7. so they only span that line. Example 10 (4 continued) In the center graph in Figure 7. 3) lies in the span of the column vectors. Example 9 (3 continued) The column vectors are (2. 1) and (1. Written in matrix notation this is 2 1 x z1 = : 1 1 y z2 We already know that the matrix has an inverse. the column vectors are (1. .5) (– 1. z2 ).2: Graphing in the column space: When the vector b is in the column space (left graph and center graph) a solution exists.

given by r1 r2 =A 1 0 0 = 0 0 : So. SYSTEMS OF EQUATIONS 91 Two vectors a and b are linearly dependent if there exists a scalar r 6= 0 such that a = rb. and construct the matrix A by using the vectors a and b as its columns: A= Now we can write (7. Equivalently.CHAPTER 7.2) as A r1 r2 = 0 0 : a1 b 1 a2 b 2 : If A has an inverse. a and b are linearly independent if there do not exist scalars r1 and r2 . 7. such that r1 a + r2 b = 0: (7.2) One more way of writing this is that a and b are linearly independent if the only solution to the above equation has r1 = r2 = 0. there is a unique solution to this equation.2 Summary of results Ax = b The system of n linear equations in n unknowns given by has a unique solution if: . This last one has some impact when we write it in matrix form. invertibility of A and the linear independence of its columns are inextricably linked. not both equal to zero. Suppose that the two vectors are 2-dimensional. Proposition 10 The square matrix A has an inverse if its columns are mutually linearly independent. Two vectors a and b are linearly independent if there is no scalar r 6= 0 such that a = rb.

3 Problems 1. A has rank n. 2. an in…nite number of solutions. 7. The row-echelon form of the augmented matrix (Ajb) has rows with all zeros. 4. A 1 92 exists. Determine whether or not the following systems of equations have a unique solution. The columns of A are mutually linearly independent. 5.CHAPTER 7. 6 3. 2. 6. (a) 3x + 6y = 4 2x 5z = 8 x y z = 10 . or no solution. The row-echelon form of the augmented matrix (Ajb) has at least one row with all zeros except in the last column. The row-echelon form of A has no rows with all zeros. jAj = 0. The vector b is not contained in the span of the columns of A. The columns of A span n-dimensional space. 2. The vector b is contained in the span of a subset of the columns of A. The system of n linear equations in n unknowns given by Ax = b has no solution if: 1. SYSTEMS OF EQUATIONS 1. The system of n linear equations in n unknowns given by Ax = b has an in…nite number of solutions if: 1.

CHAPTER 7. (a) 6 2 (b) 0 1 a (c) 1 5 a 0 @ 4 2 1 A 1 3 1 5 3 3 a . Find the values of a for which the following matrices do not have an inverse. SYSTEMS OF EQUATIONS (b) 4x y + 8z = 160 17x 8y + 10z = 200 3x + 2y + 2z = 40 (c) 2x 3y = 6 3x + 5z = 15 2x + 6y + 10z = 18 (d) 4x y + 8z = 30 3x + 2z = 20 5x + y 2z = 40 (e) 6x y z = 3 5x + 2y 2z = 10 y 2z = 4 93 2.

CHAPTER 7. SYSTEMS OF EQUATIONS (d) 0 1 1 3 1 @ 0 5 a A 6 2 1 94 .

or spending.1 IS-LM analysis Y C I M = = = = C +I +G c((1 t)Y ) i(R) P m(Y.CHAPTER 8 Using linear algebra in economics 8. mR < 0 Here Y is GDP. with C 95 0 < c0 < 1 . which you can think of as production. income. I. The variables C. R) Consider the following model of a closed macroeconomy: with 1 t i0 < 0 mY > 0. and G are the spending components of GDP.

The model is useful for describing a closed economy in the short run. The variables G. and c0 > 0 means that it is an increasing function. before the price level has time to adjust. not what you do in the stock market). Because of this. C + I + G. and we want to see how they change when the policy variables change. s To …nd them. after-tax (or disposable) income is (1 t)Y . since money is cash and checking account balances. Everything else is endogenous. and when the interest rate increases borrowing becomes more expensive and the amount of investment falls. and so when interest rates rise people tend to move some of their assets into interestbearing accounts. investment depends on the interest rate R. When income Y increases people want to buy more stu¤.CHAPTER 8. t. so everything else is a function of G. t. USING LINEAR ALGEBRA IN ECONOMICS 96 standing for consumption. This means that money demand falls when interest rates rise. and G for government spending. I for investment (which is spending by businesses on new capital and by consumers on new housing. We are primarily interested in the variables Y and R. Let’ look for the comparative statics derivatives dY =dG and dR=dG. The …rst equation says that total spending Y is equal to the sum of the components of spending. The answer is that we want to do some comparative statics analysis. and the right-hand side of the fourth equation is money demand. M is money supply. P is the price level. …rst simplify the system of four equations to a system of two . Consequently the investment function i(R) is decreasing. The amount of consumption depends on consumers’after-tax income. Also. There are no imports or exports. The …rst three equations describe the IS curve from macro courses and the fourth equation describes the LM curve. P is predetermined and assumed constant for the problem. so money demand increases when income increases. At this point you should be wondering why we care. Investment is typically spending on large items. so this is a closed-economy model. and M are exogenous policy variables. which tend not to earn interest. known as the IS-LM model. and when Y is income and t is the tax rate. and it is often …nanced through borrowing. The four equations provide a model of the economy. and M . and when things become more expensive it takes more money to purchase the same amount of stu¤. that is. So the second equation says that consumption is a function c( ) of disposable income. and they need more money to do it with.

USING LINEAR ALGEBRA IN ECONOMICS equations: Y = c((1 t)Y ) + i(R) + G M = P m(Y. dR=dt. rises when government spending increases.CHAPTER 8. dR=dG > 0. . It is also possible to …nd the comparative statics derivatives dY =dt. R) Implicitly di¤erentiate with respect to G to get dY = (1 dG dY dR + i0 +1 dG dG dY dR 0 = P mY + P mR dG dG t)c0 97 Rearrange as dY dR i0 = 1 dG dG dY dR mY + mR = 0 dG dG We can write this in matrix form dY dG (1 t)c0 1 (1 t)c0 mY i0 mR dY dG dR dG = 1 0 Now use Cramer’ rule to solve for dY =dG and dR=dG: s dY = dG 1 i0 0 mR (1 t)c0 i0 mY mR mR : t)c0 ]mR + mY i0 GDP 1 = [1 (1 Both the numerator and denominator are negative. dY =dM . You should …gure them out yourselves. Thus. An increase in government spending increases both GDP and interest rates in the short run. so dY =dG > 0. and dR=dM . Now for interest rates: 1 dR = dG 1 (1 t)c0 mY (1 t)c0 mY 1 0 i0 mR mY : t)c0 ]mR + mY i0 = [1 (1 The numerator is negative and so is the denominator.

1 Least squares analysis We want to minimize the sum of the squared errors. The estimates will not be perfect. But notice that there is a and a T in the expression. and the matrix X contains the data on the independent. or explanatory.2.1) as e=y and note that e e= Then T X n X i=1 e2 : i eT e = (y X )T (y X ) T = yT y X T y yT X + T XT X We want to minimize this expression with respect to the parameter vector . From the equation we see that there are n observations and k explanatory variables. The fundamental problem in econometrics is to use data to estimate the coe¢ cients in in order to make the errors e small. Rewrite (8. 8. is to make the sum of the squared errors as small as possible. variables. The matrix is a vector of k coe¢ cients.CHAPTER 8. and each column is an explanatory variable. one for each of the n observations.2 Econometrics y (n 1) We want to look at the estimation equation = X (n k)(k 1) + e: (n 1) (8. and so the matrix e contains error terms. Let’ treat these as s two separate variables to get two FOCs: XT y + XT X = 0 T T T y X+ X X = 0 . USING LINEAR ALGEBRA IN ECONOMICS 98 8." and the one that is consistent with the standard practice in econometrics.1) The matrix y contains the data on our dependent variable. The notion of "small. Each row is an observation. one for each of the k explanatory variables.

1 2 Since e1 = 4 3 and e2 = 8 4 .2 A lame example Consider a regression with two observations and one independent variable.CHAPTER 8. because the …rst is the transpose of the second. Observation Dependent Independent number variable y variable X 1 4 3 2 8 4 There is no constant. The FOC is 88 + 50 = 0 88 44 = = : 50 25 . by the independent variable matrix X and the dependent variable vector y. So let’ use the …rst one because it has instead of T . Note that it is determined entirely by the data. s Solving for yields XT X = XT y ^ = (X T X) 1 X T y We call ^ the OLS estimator. Our two observations lead to the two equations 4 = 3 + e1 8 = 4 + e2 We want to …nd the value of that minimizes e2 + e2 . we get e2 + e2 = (4 1 2 = 16 = 80 3 )2 + (8 4 )2 24 + 9 2 + 64 64 + 16 88 + 25 2 : 2 Minimize this with respect to . 8. that is.2. with the data given in the table below. USING LINEAR ALGEBRA IN ECONOMICS 99 These are two copies of the same equation.

1.4) x1 Figure 8.CHAPTER 8. The key here is to think about what we are doing when we …nd the value of to minimize e2 + e2 .8) Xβ 100 X =(3. and X is the equation of the line through that point.3 Graphing in column space We want to graph the previous example in column space. shown by the vector X = (3. USING LINEAR ALGEBRA IN ECONOMICS x2 e y = (4.2. 1 = 3 4 ( )+ e1 e2 : 3 4 4 8 8. The two equations can be written s 4 8 The OLS estimator is ^ = (X T X) 1 X T y 3 3 4 = 4 44 = (25) 1 (44) = : 25 We get the same answer. The example is lame for precisely this reason –so I can graph it in two dimensions. 4) 1 2 in Figure 8. y is .1: Graphing the lame example in column space Now let’ do it with matrices. X is a point.

Remembering that e = y X . e is the vector connecting some point X ^ on the line X to the point y. Two vectors a and b are orthogonal if a b = 0. 8. Graphically.CHAPTER 8. which is exactly the answer we got before. and it projects the vector y onto the column space of X.2. and we do that by …nding the point on the line X that is closest to the point y. USING LINEAR ALGEBRA IN ECONOMICS 101 another point. we have a b = AT B: Thus we can rewrite the above expression as X T (y X ^ ) XT y XT X ^ XT X ^ ^ = = = = 0 0 XT y (X T X) 1 X T y. the closest point is the one that causes the vector e to be at a right angle to the vector X. 8) in the …gure. . so we want to minimize the length of e. shown by the vector y = (4.4 Interpreting some matrices ^ = (X T X) 1 X T y: We have the estimated parameters given by This tells us that the predicted values of y are y = X ^ = X(X T X) 1 X T y: ^ The matrix X(X T X) 1 X T is a projection matrix. This means we can …nd our coe¢ cients by having the vector e be orthogonal to the vector X. e2 + e2 is the 1 2 square of the length of e. we get X (y X ^ ) = 0: Now notice that if we write the two vectors a and b as column matrices A and B.

3 Stability of dynamic systems In macroeconomics and time series econometrics a common theme is the stability of the economy. USING LINEAR ALGEBRA IN ECONOMICS The residuals vector can be written e = y X^ = (I X(X T X) 1 X T )y: 102 The two matrices X(X T X) 1 X T and (I X(X T X) 1 X T ) have the special property that they are idempotent. If we apply the same projection matrix a second time. that is. Similarly. The initial period is period 0.CHAPTER 8. it doesn’ do anything t because y is already in the column space of X. The variable we are interested in is y. s . because then y1 = y2 = ::: = y0 . That projects y onto the column space of X. We want some movement in yt . so period t + 1 is the period that immediately follows period t. 8. Doing it a second time does nothing.1 Stability with a single variable yt+1 = ayt : A dynamic system takes the form of The variable t denotes the period number. so let’ assume that a 6= 0 and a 6= 1. and the initial value of yt is y0 . In this section I show how stability works and relate stability to matrices. we would like to know how y varies over time. because then y1 = y2 = y3 = ::: = 0. The process is not very interesting if a = 0. 8. which for the sake of argument we will assume is positive. they satisfy the property that AA = A. the matrix (I X(X T X) 1 X T ) projects y onto the space that is orthogonal to the column space of X. so that X(X T X) 1 X T y lies in the column space of X. in particular. and.3. because it is projecting into the same space a second time. Geometrically it is clear why this happens. Suppose we apply the projection matrix X(X T X) 1 X T to the vector y. and it’ also not very interesting if s a = 1.

What’ more. The reasoning is the same. look at t!1 lim yt = lim at y0 = y0 lim at : t!1 t!1 If a > 1 then at ! 1. in mathematical terms. The process y0 . if a < 1 the process also diverges.CHAPTER 8. and therefore y t ! 0. s the process is stable when a 2 [0. if 0 < a < 1. the value at < 1 and thus yt = at y0 < y0 . as t ! 1. and it follows the following process: y1 = ay0 y2 = ay1 = a2 y0 . Whether or not the process is stable is determined by the magnitude of the parameter a. which is unlikely. y t = at y 0 . 1) and limt!1 at = 0. respectively (because y0 is positive). . . 0) we have t a 2 ( 1. 1). When a 2 ( 1. USING LINEAR ALGEBRA IN ECONOMICS 103 The single-variable system is pretty straightforward (some might even say lame). Similarly. y1 . It also turns out to be stable when 1 < a < 0. Thus. . diverging to either +1 or 1. or. and the process cannot be stationary unless y0 just happens to be zero. if there exists a value y such that t!1 lim yt = y: If the process is not stable then it explodes. this time cycling between positive and negative values depending on whether t is positive or negative. This reasoning gives us two stability conditions: jaj < 1 lim yt = 0: t!1 . On the other hand. ::: is if it eventually converges to a …nite value. . To see how. the quantity at ! 0.

2) = a b c d t t y0 z0 . Then we would have two separate single-variable dynamic processes yt+1 = ayt zt+1 = dzt . = a3 + 2bca + bcd b (a2 + ad + d2 + bc) c (a2 + ad + d2 + bc) d3 + 2bcd + abc 10 is too big to …t on the page. (8. variables: yt+1 = ayt + bzt zt+1 = cyt + dzt Now both variables depend on the past values of both variables.2 Stability with two variables Let’ look at a dynamic system with two s All of that was pretty simple. that is. But this gives us a really complicated system. if b = c = 0. in shorthand notation. For example.CHAPTER 8.3. Things would be easy if the matrix was diagonal. When is this system stable? We can write the system in matrix form: yt+1 zt+1 or. yt+1 = Ayt . a b c d and a b c d 3 = a b c d yt zt . Sure. USING LINEAR ALGEBRA IN ECONOMICS 104 8. we know that yt zt but the matrix a b c d is really complicated.

USING LINEAR ALGEBRA IN ECONOMICS 105 and we already know the stability conditions for these. An eigenvalue of the square matrix A is a value such that det(A I) = a c d b = 0. so just remember the goal when we get there: we want to generate a system that looks like xt+1 = 1 0 2 0 xt (8. the eigenvalues are the solutions to the quadratic equation 2 (a + d) + (ad bc) = 0.3. call them For example. So let’ mess with things to get a system with a diagonal s matrix.CHAPTER 8. 3: . 8. 1 In general quadratic equations have two solutions. It will take a while. But the matrix is not diagonal. 3 6 2 1 : (3 1) + ( 3 2 2 12) = 0 15 = 0 = 5.3 Eigenvalues and eigenvectors Begin by remembering that I is the identity matrix. suppose that the matrix A is A= The eigenvalues satisfy the equation 2 and 2.3) because then we can treat the stability of the elements of the vector x separately. Taking the determinant yields a c d b = ad a d + 2 bc: So.

The …rst equation holds when v11 v21 because 2 6 2 6 1 1 = 0 0 .CHAPTER 8. USING LINEAR ALGEBRA IN ECONOMICS 106 These are the eigenvalues of A. which is what you get when you make the determinant equal to zero. = 1 1 and the second equation holds when v12 v22 because 6 2 6 2 1 3 = 0 0 : = 1 3 . where I made the second subscript denote the number of the eigenvector and the …rst subscript denote the element of that eigenvector. For our example. the two eigenvectors are the solutions to 2 2 v11 0 = 6 6 v21 0 and 6 2 6 2 v12 v22 = 0 0 . Look at the two matrices we get from the formula A I: 3 6 3 ( 3) 6 5 2 1 5 2 1 ( 3) = = 2 6 6 2 6 2 2 6 and both of these matrices are singular. An eigenvector of A is a vector v such that (A I)v = 0 where is an eigenvalue of A. These two equations have simple solutions.

If V has an inverse. USING LINEAR ALGEBRA IN ECONOMICS 107 The relationship between eigenvalues and eigenvectors is useful for our task.4) 0 2 This is the diagonal matrix we were looking for to make our dynamic system easy.CHAPTER 8. Recall that if is an eigenvalue and v is an eigenvector then (A Av I)v = 0 Iv = 0 Av = v: In particular. we have A and A v12 v22 = 2 v11 v21 = 1 v11 v21 v12 v22 : Construct the matrix V so that the two eigenvectors are columns: V = Then we have AV = = = = V A 1 v11 v12 v21 v22 : v11 v21 v11 v21 A 2 1 v12 v22 v12 v22 0 2 v11 v12 v21 v22 1 0 : 2 0 0 We are almost there. V 1 . . we can left-multiply both sides by V 1 to get 0 1 V 1 AV = : (8.

CHAPTER 8. ::: is stable if t!1 lim yt = 0: .3. y1 . This implies that y = V x: (8.4). We began with a matrix A. and we found the two corresponding eigenvectors and combined them to form the matrix V . s Let’ look back at what we have. USING LINEAR ALGEBRA IN ECONOMICS 108 8.4 Back to the dynamic system Go back to the dynamic system yt+1 = Ayt : Create a di¤erent vector x according to x=V 1 y. we say that the process y0 . All of this comes from the matrix A. t t But our original problem was about the vector yt . and our new problem is about the vector xt . we get xt+1 = V 1 (Ayt ) = V 1 A yt = (V 1 A) (V xt ) = V 1 AV xt : We …gured out the formula for V xt+1 = 1 1 yt+1 : AV in equation (8. It’ about time. where V is the matrix of eigenvectors we constructed above. We s found its two eigenvalues 1 and 2 .3). so we haven’ added anything that wasn’ in the problem. Using the intuition we gained from the section with a single-variable dynamic system.5) Then xt+1 = V Since yt+1 = Ayt . So we are there. which gives us 1 0 2 0 xt : But this is exactly the diagonal system we wanted in equation (8.

remember that we constructed xt according to (see equation (8. USING LINEAR ALGEBRA IN ECONOMICS 109 The dynamic process yt+1 = Ayt was hard to work with. if these two equations hold. But the dynamic process wt+1 xt+1 = 1 0 2 0 wt xt is easy to work with. because multiplying it out yields the two simple. So. Consider the following IS-LM model: . The …rst one is stable if j 1 j < 1. and the second one is stable if j 2 j < 1.5)) yt = V xt . we have t!1 lim xt = lim t!1 wt xt = 0 0 : Finally. That was a lot of steps. R) 1. singlevariable equations wt+1 = xt+1 = 1 wt 2 xt : And we know the stability conditions for single-variable equations. 8. and so t!1 lim yt = lim V xt = V lim xt = 0 t!1 t!1 and the original system is also stable.CHAPTER 8. and so it was di¢ cult to determine whether or not it was stable.4 Problems Y C I M = = = = C +I +G c((1 t)Y ) i(R) P m(Y. but it all boils down to something simple. and it works for more than two dimensions. The dynamic system yt+1 = Ayt is stable if all of the eigenvalues of A have magnitude smaller than one.

xR < 0 0. so that a \$1 increase in GDP leads to less than a dollar increase in spending. 2. so that a \$1 increase in GDP leads to less than a dollar increase in spending. (a) Assume that (1 t)c0 < 1. Y C I X M with c0 i0 xY mY > < < > 0 0 0. and M are exogenous policy variables. predetermined and assumed constant for the problem. and M are exogenous policy variables. this time for an open economy. P is predetermined and assumed constant for the problem. T . (b) Compute and interpret dY =dM and dR=dM . (c) Compute and interpret dY =dP and dR=dP . Compute and interpret dY =dt and dR=dt. R) The variables G.CHAPTER 8. (a) For this problem assume that c0 + xY < 1. mR < 0 110 The variables G. mR < 0 P is = = = = = C +I +G+X c(Y T ) i(R) x(Y. t. where X is net exports and T is total tax revenue (as opposed to t which was the marginal tax rate). Consider a di¤erent IS-LM model. Compute and interpret dY =dG and dR=dG. . R) P m(Y. USING LINEAR ALGEBRA IN ECONOMICS with c0 > 0 i0 < 0 mY > 0.

(a) Compute and …nd the signs of dY =dG. I. w) q D = qS The …rst equation says that the quantity demanded in the market depends on the price of the good p and household income I. X. (c) Compute and interpret dY =dM and dR=dM . USING LINEAR ALGEBRA IN ECONOMICS (b) Compute and interpret dY =dT and dR=dT . The second equation says that the quantity supplied in the market depends on . The following is a model of the long-run economy: Y C I X M Y with c0 i0 xY mY > < < > 0 0 0. (b) Compute and …nd the signs of dY =dM . dR=dG. I) qS = S(p. mR < 0 = = = = = = C +I +G+X c((1 t)Y ) i(R) x(Y. and Y is also exogenous but not a policy variable. R) Y 111 The variables G. xR < 0 0. and dP=dG. Consistent with this being a normal good. The variables Y. R) P m(Y. It is interpreted as potential GDP. and M are exogenous policy variables. and dP=dM . 3.CHAPTER 8. t. 4. C. P are all endogenous. we have Dp < 0 and DI > 0. or full-employment GDP. Consider the following system of equations: qD = D(p. dR=dM .

Two variables are exogenous: I and w. Consider a regression based on the following data table: x2 y 8 1 24 0 -16 -1 (a) Show that the matrix X T X is not invertible. Suppose that you are faced with the following data table: Observation number x1 1 1 2 1 3 1 4 1 x2 2 3 5 4 y 10 0 25 15 . why there is no unique coe¢ cient vector for this regression. (b) Show that the market price increases when the wage rate increases.CHAPTER 8. (b) Explain intuitively. 7. We have Sp > 0 and Sw < 0. w. 5. The third equation says that markets must clear. qS . and p. USING LINEAR ALGEBRA IN ECONOMICS 112 the price of the good and the wage rate of the work force. Find the coe¢ cients for a regression based on the following data table: Observation number x1 1 1 2 1 3 1 Observation number x1 1 2 2 6 3 -4 x2 9 4 3 y 6 2 5 6. Find the coe¢ cients for a regression based on the following data table: Observation number x1 1 5 2 2 3 3 x2 4 1 6 y 10 2 7 8. so that quantity demanded equals quantity supplied. Three variables are endogenous: qD . using the idea of a column space. (a) Show that the market price increases when income increases.

to the regression. 9. is the dynamic system stable? . is the dynamic system stable? . Find the eigenvalues and eigenvectors of the following matrices: (a) (b) 5 1 4 2 10 12 1 3 3 2 3 4 0 2 10 3 4 6 6 4 (c) C = (d) D = (e) E = 10. x3 . x3 is given by 0 1 24 B 36 C C x3 = B @ 60 A 48 Explain why this would be a bad idea. Consider the dynamic system yt+1 = Ayt : (a) If A = (b) If A = (c) If A = (d) If A = 1=3 0 4=3 1=3 1=4 0 0 1=5 1=4 1=4 1 2=3 . is the dynamic system stable? . USING LINEAR ALGEBRA IN ECONOMICS 113 You are thinking about adding one more explanatory variable.CHAPTER 8. is the dynamic system stable? 1=5 1 2 8=9 .

take the horizontal distance (x x0 ) and multiply it by the slope of the tangent line. This gives us the quantity f 0 (x0 )(x x0 ).1 Taylor approximations for R ! R Consider a di¤erentiable function f : R ! R. The linear approximation L(x) is a straight line that is tangent to the function f (x) at f (x0 ). then add that amount to f (x0 ). The linear approximation can be written L(x) = f (x0 ) + f 0 (x0 )(x x0 ). which must be added to f (x0 ) to get the right linear approximation. When x is not equal to x0 we multiply the di¤erence (x x0 ) by the slope at point x0 . 114 . To get the approximation at point x. as opposed to a linear approximation at the point. Note that this is a linear approximation to the function. All of this is shown in Figure 9. so L(x0 ) = f (x0 ). Since f is di¤erentiable at point x0 it has a linear approximation at that point.1. which is just f 0 (x0 ).CHAPTER 9 Second-order conditions 9. When x = x0 we get the original point.

So.1: A …rst-order Taylor approximation The Taylor approximation builds on this intuition. Then an n-th order approximation is f (x) f (x0 ) + f 0 (x0 ) (x 1! x0 ) + f 00 (x0 ) (x 2! x0 )2 + ::: + f (n) (x0 ) (x n! x0 )n . we can write the third-degree approximation of f (x) at x = x0 as g(x) = f (x0 ) + f (x0 )(x 0 f 00 (x0 ) x0 ) + (x 2 f 000 (x0 ) x0 ) + (x 6 2 x0 )3 : . for example. We care mostly about second-degree approximations. Suppose that f is n times di¤erentiable.CHAPTER 9. and so on. the second derivative of the second-order approximation equals the second derivative of the original function. or f (x) f (x0 ) + f 0 (x0 )(x x0 ) + f 00 (x0 ) (x 2 x0 )2 : The key to understanding the numbers in the denominators of the different terms is noticing that the …rst derivative of the linear approximation matches the …rst derivative of the function. SECOND-ORDER CONDITIONS f(x) 115 L(x) f’ 0)(x –x0) (x f(x0) L(x) f(x) x x0 x –x0 x Figure 9.

Take a second-order Taylor approximation: f (x) x0 ) + Since f is maximized when x = x0 . We need some notation: 0 1 f1 (x) B C . it must be the case that f 00 (x0 ) 0. In other words. . f 00 (x0 ) (x 2 for all x. We are left with f 00 (x0 ) (x x0 )2 : f (x) f (x0 ) + 2 If f is maximized. rf (x) = @ A . it must mean that any departure from x0 leads to a decrease in f . SECOND-ORDER CONDITIONS Di¤erentiate with respect to x to get f 000 (x0 ) g (x) = f (x0 ) + f (x0 )(x x0 ) + (x 2 g 00 (x) = f 00 (x0 ) + f 000 (x0 )(x x0 ) g 000 (x) = f 000 (x0 ) 0 0 00 116 x0 )2 9.2 Second order conditions for R ! R f (x0 ) + f 0 (x0 )(x f 00 (x0 ) (x 2 x0 )2 : Suppose that f (x) is maximized when x = x0 . since (x x0 )2 0. the …rst-order condition has f 0 (x0 ) = 0.CHAPTER 9. This is how we can get the second order condition from the Taylor approximation.3 Taylor approximations for Rm ! R This time we are only going to look for a second-degree approximation. Thus the second term in the Taylor approximation disappears. fm (x) . Simplifying gives us f (x0 ) + f 00 (x0 ) (x 2 x0 )2 f (x0 ) x0 )2 0 and. 9.

. which is what we want. s which are (1 m)(m m)(m 1) = (1 1). .CHAPTER 9. the second degree Taylor approximation of the function f : Rm ! R is given by f (x) f (x0 ) + (x 1 x0 )T rf (x0 ) + (x 2 x0 )T H(x0 )(x x0 ) . fm1 (x) fmm (x) 117 is called the Hessian. SECOND-ORDER CONDITIONS and is called the gradient of f at x. The …rst term is simply f (x0 ). A second-order Taylor approximation for a function of m variables can be written f (x) f (x0 ) + m X i=1 fi (x0 )(xi x0 ) + i 1 XX fij (x0 )(xi 2 i=1 j=1 m m x0 )(xj i x0 ) j The Let’ write this in matrix notation. It is the matrix of second derivatives. H(x) = @ A . First check the dimensions. to get m m XX i=1 j=1 0 T 0 x0 )T H(x0 ) fij (x0 )(xi x0 )(xj i x0 ): j So. . (x x0 )T H(x0 ) is a (1 m) matrix with element j given by m X fij (x0 )(xi x0 ): i i=1 To get (x x ) H(x )(x x0 ) we multiply each element of (x by the corresponding element of (x x0 ) and sum. . Also 0 1 f11 (x) f1m (x) B C . . . s second term is (x x0 ) rf (x0 ) = (x x0 )T rf (x0 ): The third term is 1 (x x0 )T H(x0 )(x x0 ): 2 Let’ check to make sure this last one works. . Then break the problem down.

Then the …rst-order condition is rf (x0 ) = 0 and the second term in the Taylor approximation drops out. Obviously. Theorem 11 Let A be a symmetric m m matrix. we have xT Ax 0. SECOND-ORDER CONDITIONS 118 9. (Note that this is di¤erent from the submatrix we used to …nd determinants in Section 6. In the form it is written in. :::. maximized when x = x0 it must be the case that 1 f (x0 ) + (x x0 )T H(x0 )(x x0 ) f (x0 ) 2 or (x x0 )T H(x0 )(x x0 ) 0.CHAPTER 9. . it is a di¢ cult thing to check.4 Suppose that f : Rm ! R is maximized when x = x0 . There are other corresponding notions: A is negative de…nite if xT Ax < 0 for all nonzero vectors x. :::. For f to be Second order conditions for Rm ! R 9.) The determinant of Ai is called the i-th leading principal minor of A. though. Form a submatrix Ai from the square matrix A by keeping the square matrix formed by the …rst i rows and …rst i columns. m. and this occurs if and only if its m leading principle minors alternate in sign so that ( 1)i jAi j > 0 for i = 1.5 Negative semide…nite matrices The matrix A is negative semide…nite if. the second order condition for a maximum is that H(x0 ) is negative semide…nite.3. for every column matrix x. Then A is negative semide…nite if and only if its m leading principle minors alternate in sign so that ( 1)i jAi j 0 for i = 1. m.

The requirements for H to be negative semide…nite are f11 f11 f12 f21 f22 0 = f11 f22 2 f12 0 0. A is positive semide…nite if xT Ax 0 for all vectors x. :::. and this occurs if and only if its m leading principle minors are nonnegative.1 Application to second-order conditions Suppose that the problem is to choose x1 and x2 to maximize f (x1 . x2 ).5. so that jAi j 0 for i = 1. m.CHAPTER 9. m. 9. and this occurs if and only if its m leading principle minors are positive. That is. Note that f21 appears in both o¤-diagonal elements. :::. so that jAi j > 0 for i = 1. Note that the two conditions together imply that f22 . which is okay because f12 = f21 . SECOND-ORDER CONDITIONS 119 A is positive de…nite if xT Ax > 0 for all nonzero vectors x. it doesn’ matter if you di¤erentiate t f …rst with respect to x1 and then with respect to x2 or the other way around. A is inde…nite if none of the other conditions holds. The FOCs are f1 = 0 f2 = 0 and the SOC is H= f11 f21 f21 f22 is negative semide…nite.

SECOND-ORDER CONDITIONS 120 9.2. When this property holds for every pair of points on the curve. f (xa )). In the single-variable case we require that the second derivative is nonpositive for a maximum and nonnegative for a minimum. Such a point is called convex combination of xa and xb .5. Figure 9. and draw the line segment connecting them. When t = 1 we get x = 1 xa + 0 xb = xa .6 Concave and convex functions All of the second-order conditions considered so far rely on the objective function being twice di¤erentiable. 6 1 A= is positive de…nite because a11 > 0 and a11 a22 a12 a21 = 1 3 17 > 0. that line segment always lies below the curve. as shown in Figure 9.2 A = Examples 3 3 is negative de…nite because a11 < 0 and a11 a22 3 4 a12 a21 = 3 > 0.CHAPTER 9. f (xb )). and point b has coordinates (xb . and when t = 0 we get 1 x = 0 xa + 1 xb = xb . It also has the following property. Any value x between xa and xb can be written as x = txa + (1 t)xb for some value t 2 [0. It is also possible to characterize a concave function mathematically. When t = 1 we get x = 1 xa + 2 xb which is the 2 2 midpoint between xa and xb . 1]. too.2 shows a function with a maximum. we say that the function is concave. But objective functions are not always di¤erentiable. and we would like to have some second order conditions that work for these cases. The points on the line segment connecting a and b in the …gure have . If you choose any two points on the curve. Point a in the …gure has coordinates (xa . such as points a and b. 9. In the many-variable case we require that the matrix of second and cross partials (the Hessian) is negative semide…nite for a maximum and positive semide…nite for a minimum. 5 3 A = is inde…nite because a11 < 0 and a11 a22 a12 a21 = 3 4 29 < 0.

CHAPTER 9. for all xa and xb . t)xb ) tf (xa ) + (1 t)f (xb ) . say t = 1 . SECOND-ORDER CONDITIONS f(x) 121 f(½xa + ½xb) ½f(xa) + ½f(xb) a b f(x) x xa ½xa + ½xb xb Figure 9. 2 1 1 f ( xa + xb ) 2 2 1 1 f (xa ) + f (xb ) 2 2 where the left-hand side is the height of the curve and the right-hand side is the height of the line segment connecting a to b. Concavity says that this is true for all possible values of t. According to the …gure. The point on the line s 2 segment is 1 1 1 1 xa + xb . tf (xa ) + (1 t)f (xb )) for t 2 [0. 2 De…nition 2 A function f (x) is concave if. Let’ choose one value of t. f (xa ) + f (xb ) 2 2 2 2 and it is the midpoint between points a and b. and we can evaluate f (x) at x = 1 xa + 2 xb . But 1 xa + 1 xb is just a value 2 2 1 of x. f (txa + (1 for all t 2 [0. 1]. not just t = 1 .2: A concave function coordinates (txa + (1 t)xb . 1].

De…nition 3 A function f (x) is convex if. SECOND-ORDER CONDITIONS f(x) 122 a b f(x) x Figure 9. Convexity is the appropriate assumption for minimization. we can instead assume that it is concave. t)xb ) tf (xa ) + (1 t)f (xb ) .3. We get a corresponding de…nition.3: A convex function Concave functions tend to be upward sloping. downward sloping. and if a convex function is twice di¤erentiable its second derivative is nonnegative. consider the standard pro…t-maximization problem. instead of assuming that a function has the right properties of its second derivative. for all xa and xb . Thus. or have a maximum (that is.CHAPTER 9. 1]. where output is q and pro…t is given by (q) = r(q) c(q). it has a nonpositive second derivative. f (txa + (1 for all t 2 [0. as in Figure 9. If a concave function is twice di¤erentiable. A convex function has the opposite property: the line segment connecting any two points on the curve lies above the curve. The same de…nition of concave works for both singlevariable and multi-variable optimization problems. And notice that nothing in the de…nition says anything about whether x is single-dimensional or multidimensional. As an example. upward sloping and then downward sloping).

CHAPTER 9. SECOND-ORDER CONDITIONS
\$ c(q) r(q)

123

q

Figure 9.4: Pro…t maximization with a concave revenue function and a convex cost function

where r(q) is the revenue function and c(q) is the cost function. The standard assumptions are that the revenue function is concave and the cost function is convex. If both functions are twice di¤erentiable, the …rst-order condition is the familiar r0 (q) c0 (q) = 0 and the second-order condition is r00 (q) c00 (q) 0:

This last expression holds if r00 (q) 0 which occurs when r(q) is concave, and if c00 (q) 0 which occurs when c(q) is convex. Figure 9.4 shows the standard revenue and cost functions in a pro…t maximization problem, and in the graph the revenue function is concave and the cost function is convex. Some functions are neither convex nor concave. More precisely, they have some convex portions and some concave portions. Figure 9.5 provides an example. The function is convex to the left of x0 and concave to the right of x0 .

CHAPTER 9. SECOND-ORDER CONDITIONS
f(x)

124

f(x)

x x0

Figure 9.5: A function that is neither concave nor convex

9.7

Quasiconcave and quasiconvex functions

The important property for a function with a maximum, such as the one shown in Figure 9.2, is that it rise and then fall. But the function depicted in Figure 9.6 also rises then falls, and clearly has a maximum. But it is not concave. Instead it is quasiconcave, which is the weakest second-order condition we can use. The purpose of this section is to de…ne the terms "quasiconcave" and "quasiconvex," which will take some work. Before we can de…ne them, we need to de…ne a di¤erent term. A set S is convex if, for any two points x; y 2 S , the point x + (1 )y 2 S for all 2 [0; 1]. Let’ break this into pieces using Figure 9.7. In the left-hand s graph, the set S is the interior of the oval, and choose two points x and y in S. These can be either in the interior of S or on its boundary, but the ones depicted are in the interior. The set fzjz = x + (1 )y for some 2 [0; 1]g is just the line segment connecting x to y, as shown in the …gure. The set S is convex if the line segment is inside of S, no matter which x and y we choose. Or, using di¤erent terminology, the set S is convex if any convex combination of two points in S is also in S. In contrast, the set S in the right-hand graph in Figure 9.7 is not convex. Even though points x and y are inside of S, the line segment connecting them passes outside of S. In this case the set is nonconvex (there is no such thing as a concave set).

CHAPTER 9. SECOND-ORDER CONDITIONS

125

f(x)

f(x)

x

Figure 9.6: A quasiconcave function

f(x)

f(x)

y x S

y

S x x x

Figure 9.7: Convex sets: The set in the left-hand graph is convex because all line segments connecting two points in the set also lie completely within the set. The set in the right-hand graph is not convex because the line segment drawn does not lie completely within the set.

CHAPTER 9. SECOND-ORDER CONDITIONS
f(x)

126

f(x) y

x x1 B(y) x2

Figure 9.8: De…ning quasiconcavity: For any value y, the set B(y) is convex

The graphs in Figure 9.7 are for 2-dimensional sets. The de…nition of convex, though, works in any number of dimensions. In particular, it works for 1-dimensional sets. A 1-dimensional set is a subset of the real line, and it is convex if it is an interval, either open, closed, or half-open/half-closed. Now look at Figure 9.8, which has the same function as in Figure 9.6. Choose any value y, and look for the set B(y) = fxjf (x) yg:

This is a better-than set, and it contains the values of x that generate a value of f (x) that is at least as high as y. In Figure 9.8 the set B(y) is the closed interval [x1 ; x2 ], which is a convex set. This gives us our de…nition of a quasiconcave function. De…nition 4 The function f (x) is quasiconcave if for any value y, the set B(y) = fxjf (x) yg is convex. To see why quasiconcavity is both important and useful, ‡ way back ip to the beginning of the book to look at Figure 1.1 on page 2. That graph depicted either a consumer choice problem, in which case the line is a budget line and the curve is an indi¤erence curve, or it depicted a …rm’ costs minimization problem, in which case the line is an isocost line and the curve

CHAPTER 9. SECOND-ORDER CONDITIONS
f(x)

127

f(x)

y

x x1 W(y) x2

Figure 9.9: A quasiconvex function

is an isoquant. Think about it as a consumer choice problem. The area above the indi¤erence curve is the set of points the consumer prefers to those on the indi¤erence curve. So the set of points above the indi¤erence curve is the better-than set. And it’ convex. So the appropriate second-order s condition for utility maximization problems is that the utility function is quasiconcave. Similarly, an appropriate second-order condition for costminimization is that the production function (the function that gives you the isoquant) is quasiconcave. When you take microeconomics, see where quasiconcavity shows up as an assumption. Functions can also be quasiconvex. De…nition 5 The function f (x) is quasiconvex if for any value y, the set W (y) = fxjf (x) yg is convex. Quasiconvex functions are based on worse-than sets W (y). This time the set of points generating values lower than y must be convex. To see why, look at Figure 9.9. This time the points that generate values of f (x) lower than y form an interval, but the better-than set is not an interval. Concave functions are also quasiconcave, which you can see by looking at Figure 9.2, and convex functions are also quasiconvex, which you can see by looking at Figure 9.3. But a quasiconcave function may not be concave, as

like we did in Figure 9. (a) (b) 3 2 2 1 1 2 2 4 . 9. the better-than set or the worse-than set? As shown in the …gure. We know that it is also quasiconcave. Tell whether the following matrices are negative de…nite. 2. positive de…nite.9. it’ the better-than set that is convex. 0). Find the second-degree Taylor approximations of the following functions at x0 = 1: (a) f (x) = 2x3 5x + 9 p 40 x + ln x (b) f (x) = 10x (c) f (x) = ex 3. and a quasiconvex function may not be convex. Find the gradient of 2. x3 ) = 2x1 x2 + 3x1 x2 3 2 and then evaluate it at the point (5. The easiest way to remember the de…nition for quasiconcave is to draw a concave function with a maximum.CHAPTER 9. Which set is convex. Find the second-degree Taylor approximation of the function f (x) = ax2 + bx + c at x0 = 0. or inde…nite. Choose a value of y and draw the horizontal line. Find the second-degree Taylor approximation of the function f (x) = 3x3 4x2 2x + 12 at x0 = 0.8 Problems f (x1 . 4x2 1 1. 5. negative semide…nite. SECOND-ORDER CONDITIONS 128 in Figure 9. If you draw a convex function with a minimum and follow the same steps you can …gure out the right de…nition for a quasiconvex function. so we get the right s de…nition. positive semide…nite.8. 4. x2 . as in Figure 9.8.

Is (6.y 5xy xy x2 y2 2y 2 3 2 @ 2 4 (h) 3 0 1 3 0 A 1 (b) maxx. 0)? Explain. and not the second derivative. State whether the second-order condition is satsi…ed for the following problems.CHAPTER 9. 2) a convex combination of (11. 8. Use the formula for convexity.y 6x2 + 3y 2 7.y 4y 2 (c) maxx.y 7 + 8x + 6y (d) minx. to show that the function f (x) = x2 is convex. 4) and ( 1. (a) minx. . SECOND-ORDER CONDITIONS (c) 0 3 4 4 3 0 3 2 129 4 (d) @ 0 1 (e) (f) (g) 0 6 1 1 3 4 16 16 4 2 1 1 4 1 1 2 A 1 6.

PART III    ECONOMETRICS            (probability and statistics)        .

(T. :::. so the sample space is = f0. biological. (T. T )g and each pair is an outcome. Another experiment is an exam. 100g: 131 . Scores are out of 100. for example. The sample space is = f(H. Suppose. The performance of an experiment is a trial.CHAPTER 10 Probability 10. H). or anything else. is denoted ! (lower case omega). or outcome. ?. The sample space is denoted (the Greek letter omega) and a typical element.1 Some de…nitions An experiment is an activity that involves doing something or observing something resulting in an outcome. T ). The impossible event is the empty set. that the experiment consists of tossing a coin twice. The sample space for an experiment is the set of possible outcomes of the experiment. Experiments can be physical. social. 1. (H. H).

PROBABILITY 132 We are going to work with subsets. which is a concept we will not get into). that is. Two events A and B are mutually exclusive if there is no outcome that is in both events. and we are going to work with some mathematical symbols pertaining to subsets. 10. A \ B = ?. If A and B are mutually exclusive then if event A occurs event B is impossible. (T. Our eventual goal is to assign probabilities to events. (H. Call it (for sigma-algebra. The function P has three key properties: Axiom 1 P (A) 0 for any event A 2 . a function mapping events into numbers between 0 and 1. T ). and so is the impossible event ?. Note that the entire sample space is an event. Axiom 2 P ( ) = 1 . The notation is given in the following table. If the experiment consists of tossing a coin twice. the event that there is at least one head can be written A = f(H. It is best to think of outcomes and events as occurring. and write ! instead of f!g. To do this we need notation for the set of all possible events. H)g. an event A occurs if there is some outcome ! 2 A such that ! occurs.2 De…ning probability abstractly A probability measure is a mapping P : ! [0. H).CHAPTER 10. and vice versa. 1]. that is. In math !2A A\B A[B A B A B AC In English omega is an element of A A intersection B A union B A is a strict subset of B A is a weak subset of B The complement of A An event is a subset of the sample space. Sometimes we want to talk about single-element events. For the latter.

Suppose that A and B are events with A B. Then A and C are mutually exclusive with A [ C = B. These three axioms imply some other properties. that is. ::: is a (possibly …nite) sequence of mutually exclusive events. The next result concerns the relation A is contained in B or A is equal to B. PROBABILITY 133 Axiom 3 If A1 . Theorem 12 P ( ?) = 0: Proof. = Theorem 14 P (AC ) = 1 P (A). The second one states that the probability that something happens is one. De…ne C = f! : ! 2 B but ! 2 Ag. A2 . For the next theorems. then P (A1 [ A2 [ :::) = P (A1 ) + P (A2 ) + ::: The …rst axiom states that probabilities cannot be negative. axiom 3 implies that \ ? = ?. AC = f! 2 : ! 2 Ag. . where A B means that either P (B). The events and ? are mutually exclusive since [ ? = . Theorem 13 If A B then P (A) . The third axiom states that when events are mutually exclusive.CHAPTER 10. Proof. Since 1 = P ( [ ?) = P ( ) + P (?) = 1 + P (?). and it follows that P ( ?) = 0. the probability of the union is simply the sum of the probabilities. and = axiom 3 implies that P (B) = P (A [ C) = P (A) + P (C) P (A). let AC denote the complement of the event A.

PROBABILITY 134 Proof.1) 10.1) yields P (A [ B) = P (A) + P (AC \ B) = P (A) + P (B) P (A \ B): P (A \ B): (10.1. The result follows. Then P (AC [ A) = P (AC ) + P (A) = 1. The events A and AC \ B are mutually exclusive. and by axiom 1 we have P (AC ) 0. …rst write P (A) = 1 P (AC ). Note that AC [ A = and AC \ A = ?. To see why. These two parts are mutually exclusive.3 De…ning probabilities concretely The previous section told us what the abstract concept probability measure means. Theorem 15 P (A [ B) = P (A) + P (B) P (A \ B). and therefore P (AC ) = 1 P (A). so P (B) = P (A \ B) + P (AC \ B). You can see this in Figure 10. Proof. the part that intersects A and the part that does not. Rearranging yields P (AC \ B) = P (B) Substituting this into equation (10. Once again this is clear in Figure 10. Sometimes we want to know actual probabilities. First note that A [ B = A [ (AC \ B).1. The points in A [ B are either in A or they are outside of A but in B. . but it says that we can separate B into two parts. Note that this theorem implies that P (A) 1 for any event A. How do we get them? The answer relies on the ability to partition the sample space into equally likely outcomes.CHAPTER 10. Axiom 3 says P (A [ B) = P (A) + P (AC \ B): Next note that B = (A\B)[(AC \B).

This theorem allows us to compute probabilities from experiments like ‡ ipping coins.1: Finding the probability of the union of two sets Theorem 16 Suppose that every outcome in the sample space = f! 1 .CHAPTER 10. By Axiom 3 we have P ( ) = P (f! 1 g) + ::: + P (f! n )g = 1: Since each of the outcomes is equally likely. . If event A contains k of the outcomes in the set f! 1 . For an exercise. if a die has six sides. Answer: There are 16 possible outcomes. By construction f! i g \ f! j g = ? when i 6= j. the probability is 1=4. f! n g are mutually exclusive. ! n g. So. Four of them have one head. :::. and so the events f! 1 g. the probability of the outcome 5 is 1=6. :::. and so on. The probability of the event f1. this implies that P (f! i g) = 1=n for i = 1. 2g is 1=6 + 1=6 = 1=3. We know that P ( ) = P (f! 1 g [ [ f! n g) = 1. it follows that P (A) = k=n. and so on. rolling dice. n. For example. :::. :::. ! n g is equally likely. …nd the probability of getting exactly one head in four tosses of a fair coin. Then the probability of any event A is the number of outcomes in A divided by n. Proof. PROBABILITY 135 A ∩ BC A∩B AC ∩ B A B Ω Figure 10.

Instead.00 0. as required. the probabilities of the events are given directly. The probability that y = 4 is the sum of the probabilities in the last column.02 0. or 0. based on the values of x and y. It is given by the formula P (AjB) = P (A \ B) : P (B) Think about what this expression means. The probability that y 2 and x 3 is the sum of the four cells . Note that all the entries are nonnegative and that the sum of all the entries is one.03 0. as required for a probability measure. so that the event B occurs with positive probability. P (x = 2jy = 4) = 0:11=0:38 = 11=38.11.04 0. The denominator is the probability that B occurs. PROBABILITY 136 In general events are not equally likely. The ratio can be interpreted as the fraction of the time when B occurs that A also occurs.11 0. The numerator is the probability that both A and B occur.15 0.09 0. Consider the probability distribution given in the table below. Clearly A \ B is a subset of B. Then P (AjB) is the conditional probability that event A occurs given that event B has occurred.02 0. Now …nd the conditional probability that y is either 1 or 2 given that x 3.01 0.10 0. 10. The probability that x = 2 and y = 4 is just the entry in a single cell.08 x=1 x=2 x=3 x=4 Find the conditional probability P (x = 2jy = 4). so we cannot determine probabilities theoretically in this manner. y=1 y=2 y=3 y=4 0.38. So. The probabilities of the di¤erent outcomes are given in the entries of the table. Outcomes are two dimensional.05 0. The formula is P (x = 2 and y = 4)=P (y = 4).10 0.4 Conditional probability Suppose that P (B) > 0.16 0.CHAPTER 10.02 0.02 0. and is 0. so the numerator is smaller than the denominator.

or 0. The probabilities are given in the following table: Test positive Test negative Condition A 0. PROBABILITY 137 in the lower left. 987=988 = 0:999: 10. Now look at a medical example. 10=12 = 0:833. A patient can have condition A or not. Then P (AjB) = Proof. The probability that x 3 is 0. We get the following conditional probabilities: P (Ajpositive) P (healthyjnegative) P (positivejA) P (negativejhealthy) = = = = 10=11 = 0:909. 987=989 = 0:998.66.CHAPTER 10.002 0.45. Also. Note that P (AjB) = and P (BjA) = Rearranging the second one yields P (A \ B) = P (BjA)P (A) .001 0. with only 12 people in 1000 having it. He takes a test which turns out positive or not. So. a positive test is ten times more likely to come from a patient with the condition than from a patient without the condition. the conditional probability is 45=66 = 15=22.5 Bayes’rule P (BjA)P (A) : P (B) P (A \ B) P (B) P (A \ B) : P (A) Theorem 17 (Bayes’rule) Assume that P (B) > 0.987 Healthy Note that condition A is quite rare.010 0.

Psychologists have also come up with studies in which subjects overweight the prior.5. If the ratio is smaller than one. People don’ seem to follow it. and according to Bayes’rule the posterior should be the same as the prior. 138 Let’ make sure Bayes’rule works for the medical example given above. Using Bayes’rule we get P (Ajpositive) = P (positivejA) P (A) = P (positive) 10 12 12 1000 11 1000 = 10 : 11 The interpretation of Bayes’rule is for responding to updated information. All subjects were then given the following description: Dick is a 30 year old man. He is married with no children. Bayes’rule tells us how to do this. P (positive) = 11=1000. so P (BjA)=P (B) = 1. and P (A) = 12=1000. The new information is uninformative. Some subjects are told that a group consists of 70 lawyers and 30 engineers. We start with the prior P (A). the posterior probability is lower. the posterior probability is higher than the prior probability. Bayes’rule is important in game theory. s We have P (positivejA) = 10=12. We are interested in the occurrence of event A after we receive some new information. Then we …nd out that B holds.CHAPTER 10. We should use this new information to update the probability of A occurring. ignoring the prior information. PROBABILITY and the theorem follows from substitution. He is well liked by his colleagues. The ratio is greater than one if A occurring makes B more likely. 1973 Psychological Review). and when they overweight the prior it is called conservatism. . This example has people overweighting the new information. and macro. A man of high ability and high motivation. When subjects overweight the new information it is called representativeness. The rest of the subjects are told that the group has 30 lawyers and 70 engineers. …nance. Subjects were then asked to judge the probability that Dick is an engineer. Subjects in both groups said that it is about 0. We multiply the prior P (A) by the likelihood P (BjA) : P (B) If this ratio is greater than one. he promises to be quite successful in his …eld. Here is a famous example (Kahneman t and Tversky. P (AjB) is called the posterior probability.

o¤ers s a contestant the choice among three doors. labeled A. Plugging this into Bayes’rule yields 6 3 2 P (prize Cj reveal B) = 1 1 2 1 3 2 = : 3 The probability that the prize is behind A given that he revealed door B is 1 P (prize Cj reveal B) = 1=3. There is a prize behind one of the doors. PROBABILITY 139 10. After the contestant chooses a door. Monty Hall. to build suspense Monty Hall reveals one of the doors with no prize. so P (prize A) = P (prize B) = P (prize C) = 1=3. Remember that Monty cannot reveal the door with the prize behind it or the door chosen by the contestant. the probability of revealing 3 door B is P (reveal B) = 1 + 1 = 1 . He then asks the contestant if she would like to stay with her original door or switch to the other one. We can write this as P (reveal B) = P (reveal Bj prize A) P (A) +P (reveal Bj prize B) P (B) +P (reveal Bj prize C) P (C): The middle term is zero because he cannot reveal the door with the prize behind it. The last term is 1=3 for the reasons given above. What should she do? The answer is that she should take the other door. the conditional probability P (reveal Bj prize A) = 1=2. and nothing behind the other two. Next …nd the conditional probability that Monty reveals door B given that the prize is behind door C. The contestant should switch. What is the probability that the prize is behind door C given that door B was revealed? Bayes’rule says we use the formula P (prize behind Cj reveal B) = P (reveal Bj prize C) P (prize C) : P (reveal B) Before revealing the door. and that Monty reveals door B. and C. B. Therefore Monty must reveal door B if the prize is behind door C. suppose she chooses door A. and the conditional probability P (reveal Bj prize C) = 1. To see why. .6 Monty Hall problem At the end of the game show Let’ Make a Deal the host. the prize was equally likely to be behind each of the three doors. assuming he does so randomly. The remaining piece of the formula is the probability that he reveals B. Consequently the 1 1 …rst term is 2 1 = 6 . If the prize is behind A he can reveal either B or C and.CHAPTER 10. Using all this information.

P (accident) = 0:06. Theorem 18 If A and B are independent then P (AjB) = P (A). We can prove an easy theorem about independent events. We have P (AjB) = P (A)P (B) P (A \ B) = = P (A): P (B) P (B) According to this theorem. Accident No accident Drunk driver 0. the probability of getting in an accident when drunk is the same as the overall probability of getting in an accident. accidents and drunk driving are not independent events.10 0. PROBABILITY 140 10. then P (accidentjdrunk) = P (accident).84 Sober driver Notice that from this table that half of the accidents come from sober drivers. but there are many more sober drivers than drunk ones. The question is whether accidents are independent of drunkenness. and …nally P (drunk) P (accident) = 0:13 0:06 = 0:0078 6= 0:03: So. P (drunk) = 0:13. Compute P (drunk \ accident) = 0:03.CHAPTER 10.03 0. if accidents and drunkenness were independent events.7 Statistical independence Two events A and B are independent if and only if P (A\B) = P (A) P (B). as we would expect drunkenness to be a contributing factor to accidents.03 0. This is not surprising. that is. Consider the following table relating accidents to drinking and driving. Note that P (accidentjdrunk) = 3=13 = 0:23 while P (accidentjsober) = 3=87 = 0:03 4. Proof. .

(c) Find the probability that b 6= 3. b) : b 2 and a = 5g? (f) Are the events a 2 f1.8 Problems 1. 4g statistically independent? y 4. Answer the questions using the table below. Answer the questions from the table below: a=1 a=2 a=3 a=4 a=5 b=1 0:02 0:03 0:01 0:00 0:12 b=2 0:02 0:01 0:01 0:05 0:06 b=3 0:21 0:05 0:01 0:00 0:00 b=4 0:02 0:06 0:06 0:12 0:14 a 4g or B = (b) List the largest impossible event. 20g and 2 (e) Find the probability that y (g) Are the events x (f) Verify Bayes’rule for P (y = 4jx = 20). 2. y=1 y=2 y=3 y=4 y=5 0:01 0:03 0:17 0:00 0:00 0:03 0:05 0:04 0:20 0:12 0:11 0:04 0:02 0:07 0:11 x=5 x = 20 x = 30 (a) What is the most likely outcome? (b) What outcomes are impossible? (c) Find the probability that x = 30. 20 and y 2 f1.CHAPTER 10. A = f(a. (d) Find P (b = 2ja = 5). 20. 4g statistically independent? . 3g and b 2 f1. (e) Find P (a 3jb 2 f1. (d) Find the probability that x 2 f5. 2 conditional on x 2. PROBABILITY 141 10. 4g): (a) Which event is more likely. b) : 3 f(a.

and negative for 95% of the people who do not have the disease. There are two age categories: below 40 (young) and above 40 (old). A disease hits 1 in every 20. You wanted to know the probability that an old person is an entrepreneur. There are three occupations: doctor. You have data that sorts individuals into occupations and age groups. the test is positive for 95% of people with the disease. PROBABILITY 142 3. .CHAPTER 10. Your grad student misunderstands you. and entrpreneur. lawyer. and presents you with the following information: 20% 40% 20% 70% of of of of the the the the sample are doctors and 30% are entrepreneurs doctors are young entrepreneurs are young lawyers are young Find the probability that the an old person is an entrepreneur. What is the probability he has it? 4. that is. Max just tested positive for the disease. A diagnostic test is 95% accurate. though.000 people.

CHAPTER 11 Random variables 11. P (~ = x) is the probability that the x 143 . The standard notation for a random variable is to place a tilde over the variable. ~ A random variable is discrete if it can take only discrete values (either a …nite number or a countable in…nity of values). if I toss a coin ten times and …nd four heads. the realization of the random variable is 4. For example. For example. When the random variable is denoted x its realization is denoted x. and the Dow-Jones Industrial Average is a random variable. Random variables have probability measures. and that number is determined by the outcome of an experiment. the number of heads in ten coin tosses is a random variable. A random variable is continuous if it can take any value in an interval.1 Random variables A random variable is a variable whose value is a real number. and we can use random variables to de…ne events. ~ The realization of a random variable is based on the outcome of an actual experiment. So x is a random variable. For example.

2 Distribution functions The distribution function for the random variable x with probability mea~ sure P is given by F (x) = P (~ x): x The distribution function F (x) tells the probability that the realization of the random variable is no greater than x. RANDOM VARIABLES 144 realization of the random variable x is x. and P (~ 2 [2. the density function is P (~ = x) x for each possible value of x. Theorem 19 Distribution functions are nondecreasing and take values in the interval [0. 1]. and ~ ~ by Theorem 13 we have F (x) = P (~ x x) P (~ x y) = F (y): 11. suppose x < y. Sometimes distribution functions are neither di¤erentiable nor discrete. 3]) is the probability ~ x that the realization of the random variable x falls in the interval [2. This causes headaches that we will not deal with here.CHAPTER 11. Note that it is possible to go from a density function to a distribution function: Z x F (x) = f (t)dt: 1 . The second part of the statement is obvious. 11. Proof. 3]. Then the event x x is contained in the event x y.3 Density functions f (x) = F 0 (x): If the distribution function F (x) is di¤erentiable. 3]. the density function is If the distribution function F (x) is discrete. Distribution functions are almost always denoted by capital letters. The ~ event in the latter example is the event that the realization of x falls in the ~ interval [2. For the …rst part.

f. Let the vector (outcome 1.4.d. but the de…nition given here is mathematically correct. the distribution function is the accumulated value of the density function. what is the probability of x successes in n trials? Let’ work this out for the case of two trials.1 Useful distributions Binomial (or Bernoulli) distribution The binomial distribution arises when the experiment consists of repeated trials with the same two possible outcomes in each trial. the set of points for which the density is positive. If we are checking lightbulbs to see if they work.f. These are just labels. The question is. The density function is often called the probability density function. A coauthor (Harold Winter) and I used the binomial distribution to model juror bias." or we could do it the other way around. which means that the probability of a failure is q = 1 p. 11. we could label a working lightbulb as a "success" and a nonworking bulb as a "failure." or we could do it the other way around. or p. RANDOM VARIABLES 145 So. The support of a distribution F (x) is the smallest closed set containing fxjf (x) 6= 0g. For our purposes there is no real reason for using a closed (as opposed to open) set. To get a general formula. For a discrete distribution this is just the set of outcomes to which the distribution assigns positive probability. s . The most obvious example is ‡ ipping a coin n times. that is.d. we could count a head as a "success" and a tail as a "failure. and the two possible outcomes were a juror biased toward conviction and a juror biased toward acquittal. assume that the trials are statistically independent.CHAPTER 11. The outcome is a series of heads and tails. so that a success in trial t has no e¤ect on the probability of success in period t + 1. We had to arbitrarily label one form of bias a "success" and the other a "failure. For a continuous distribution the support is the smallest closed set containing all of the points that have positive probability. This leads to some additional common terminology. or c. label one possible outcome of the trial a success and the other a failure." Suppose that the probability of a success is p in any given trial. Also.4 11. and the probability distribution governing the number of heads in the series of n coin tosses is the binomial distribution. We obviously cared about the total bias of the group. In coin tossing. The distribution function is often called the cumulative density function.

the probability of two successes is 3p2 q. and the probability of two failures is q 2 . s. f. s) = pq 2 q3 Thus the probability of three successes is p3 . s. we have P (s. f ) P (f. so b(x.CHAPTER 11. f ) = P (f. s) = P (f. where x is x ~ the random variable measuring the number of successes in n trials. f. The probability of a single. n. the probability of one success is 3pq 2 . which is given by x n x = 1 2 ::: n n! = x!(n x)! [1 ::: x][1 ::: (n x)] : Note that the function b(x. f ) = = = = p3 P (s. p) = P (~ = x). The number of possible con…gurations with x successes and n x failures is n . failure) = = = = p2 pq pq q2 The probability of two successes in two trials is p2 . p) is a density function. In general. s) = p2 q P (f. s. For example. n. and the probability of no successes is q 3 . success) P (success. f. The binomial distribution is a discrete distribution. failure) P (failure. the probability of one success in two trials is 2pq. s. s) P (s. n. particular con…guration of x successes and n x failures is px q n x . Using P as the probability measure. f ) P (s. we have P (success. the probability that the …rst x trials are successes and the last n x trials are failures is px q n x . . RANDOM VARIABLES 146 outcome 2) denote the event in which outcome 1 is realized in the …rst trial and outcome 2 in the second trial. p) = n x n p q x x There are two pieces of the formula. the rule for x successes in n trials when the probability of success in a single trial is p is b(x. success) P (failure. With three trials. f. letting s denote a success and f a failure.

so that it is the probability of getting x or fewer successes in n trials. Let’ go back to my jury example. p) is =BINOMDIST(x. b] = . and that 20% of them are biased toward acquittal. The probability of drawing 2 jurors biased toward acquittal and 10 biased toward conviction is b(2. p) is =BINOMDIST(x. with the rest biased toward conviction. 0) and the formula for B(x.2 Uniform distribution The great appeal of the uniform distribution is that it is easy to work with mathematically. It’ density function is given by s 8 1 x 2 [a. n. n. 0:2) = 0:283 The probability of getting at most two jurors biased toward acquittal is B(2. 12. n. Suppose we want to draw a pool s of 12 jurors from the population. RANDOM VARIABLES The binomial distribution function is B(x. 12. 0:2) = :558 The probability of getting a jury in which every member is biased toward conviction (and no member is biased toward acquittal) is b(0. The formula for b(x. p). 1) The last argument in the function just tells the program whether to compute the density or the distribution. 12. Microsoft Excel. n. 0:2) = 0:069 11. b] < b a f (x) = if : 0 x 2 [a.4.CHAPTER 11. make it easy to compute the binomial density and distribution. p. and probably similar programs. n. p. n. p) = x X i=0 147 b(x.

RANDOM VARIABLES The corresponding distribution function is 8 x<a < 0 x a if a x b F (x) = : b a 1 x>b 148 Okay.1. 1). as shown in Figure 11. The standard normal has mean 0 and standard deviation 1. Graphically.4. The distribution function is just a line with slope 1=(b a) in the support [a.CHAPTER 11. so these look complicated. Intuitively. For now. then the density function is the constant function f (x) = 1=(b a) and the distribution function is the linear function F (x) = (x a)=(b a). the uniform density is a horizontal line. 11. A more general normal distribution with mean and standard deviation has density function 1 2 2 f (x) = p e (x ) =2 2 and distribution function Z x 1 2 2 F (x) = p e (t ) =2 dt: 2 1 . if x is in the support interval [a. as we will see later in the course. It is also central to statistical analysis. which is why it has its name. it spreads the probability evenly (or uniformly) throughout the support [a. b]. The distribution function is Z x 1 2 e t =2 dt F (x) = p 2 1 and we write it as an integral because there is no simple functional form. But. its density function is 1 f (x) = p e 2 x2 =2 and its support is the entire real line: ( 1.3 Normal (or Gaussian) distribution The normal distribution is the one that gives us the familiar bell-shaped density function. b]. We begin with the standard normal distribution. b]. Later on we will …nd the mean and standard deviation for di¤erent distributions.

Changing the standard deviation has a di¤erent e¤ect. The thin curve in the …gure has a mean of 2. which is the thick curve in the …gure. and the thin curve has = 2. increasing the standard deviation lowers the peak and spreads the density out. RANDOM VARIABLES y 0.4.1 -5 -4 -3 -2 -1 0 1 2 3 4 x 5 Figure 11. 11. The exponential distribution is often used for the failure rate of equipment: the probability that a piece of equipment will fail .2 0.2.4 149 0.CHAPTER 11. As you can see.4 Exponential distribution 1 f (x) = e The density function for the exponential distribution is x= and the distribution function is F (x) = 1 e x= : It is de…ned for x > 0. The peak of the normal distribution is at the mean. so the standard normal peaks at x = 0.3.1: Density function for the standard normal distribution The e¤ects of changing the mean can be seen in Figure 11.5. as shown in Figure 11. moving probability away from the mean and into the tails.3 0. The thick curve is the standard normal with = 1.

5 x 5 Figure 11.1 0 -5 -2.3 0.5 x 5 Figure 11.5 0 2.5 0 2.2: Changing the mean of the normal density y 0. RANDOM VARIABLES 150 y 0.2 0.2 0.1 0 -5 -2.3: Increasing the standard deviation of the normal density .CHAPTER 11.3 0.

0 0 1 2 3 4 x 5 Figure 11. 11. Accordingly.4.4 0. It is shown in Figure 11. 11.CHAPTER 11. RANDOM VARIABLES y 151 0.6 Logistic distribution 1 1+e The logistic distribution function is F (x) = Its density function is f (x) = x : e x : (1 + e x )2 .4.5 Lognormal distribution The random variable x has the lognormal distribution if the random vari~ able y = ln x has the normal distribution.3 0.4: Density function for the lognormal distribution by time x is F (x). The density function is ~ ~ 1 h (ln x)2 =2 i 1 e f (x) = p x 2 and it is de…ned for x 0.4. f (x) is the probability of a failure at time x.1 0.2 0.5 0.6 0.

.5: Density function for the logistic distribution It is de…ned over the entire real line and gives the bell-shaped density function shown in Figure 11.15 0. RANDOM VARIABLES y 0.25 152 0.05 -5 -2.1 0.2 0.5 x 5 Figure 11.5.CHAPTER 11.5 0 2.

integration is the opposite of di¤erentiation. (12. you probably learned two things about integration: (1) integration is the opposite of di¤erentiation. We sometimes think about pro…t as the area between the price line and the marginal cost curve. of course. in economics we rarely talk about the area under a curve. The following two statements provide the fundamental relationship between derivatives and integrals: Z b f (x)dx = F (b) F (a). and we sometimes compute consumer surplus as the area between the demand curve and the price line. Unfortunately.1) a 153 . and (2) integration …nds the area under a curve. But this is not the primary reason for using integration in economics. we should …rst deal with the mechanics. As already stated. Both of these are correct. To make this explicit. There are exceptions. Before we get into the interpretation. suppose that the function F (x) has derivative f (x).CHAPTER 12 Integration If you took a calculus class from a mathematician.

while the solution in (12. notice that Z Z f (x)dx = 1 f (x)dx. or de…nite. To see why. and they can be checked by di¤erentiating the right-hand side. and its distinguishing feature is that the integral is taken over a …nite interval. INTEGRATION and Z 154 f (x)dx = F (x) + c.1) and (12. and we shift it upward by one unit. 1). Z b f (x)dx = a = [F (b) + c] [F (a) + c] = F (b) F (a): Z b f (x)dx 1 Z a f (x)dx 1 Some integrals are straightforward.2) where c is a constant. The integral in (12. then. call it F (x). This occurs because when we integrate the function f (x). From our point of view the most important ones follow.2) is not unique.1) is a de…nite integral. all we know is the slope of the function F (x). The role of the constant c in (12.2). is to account for the indeterminacy of the height of the curve when we take an integral. and it has no endpoints. but others require some work. The reason for the names is that the solution in (12.1) is unique. (12. The integral in (12.CHAPTER 12. If we choose one function that has slope f (x). Furthermore. Z xn+1 + c for n 6= 1 xn dx = n+1 Z 1 dx = ln x + c x Z erx erx dx = +c r Z ln xdx = x ln x x + c .2) are consistent with each other. and we do not know anything about its height.2) is an inde…nite integral. its slope is still f (x). The two equations (12. 1 so an inde…nite integral is really just an integral over the entire real line ( 1.

way after me. Repeat after me (okay. ow and our measurement times are t1 . We measure the ‡ once per time interval.3) .1 Interpreting integrals Integrals are used for adding. The total estimated volume for the year is then V (n) = tn X t=t1 T h(ti ) : n (12. INTEGRATION There are also two useful rules for more complicated integrals: Z Z af (x)dx = a f (x)dx Z Z Z [f (x) + g(x)] dx = f (x)dx + g(x)dx: 155 The …rst of these says that a constant inside of the integral can be moved outside of the integral.CHAPTER 12. Together they say that integration is a linear operation. At time t the volume of water ‡ owing past a particular point as measured by the Hydro‡ owometerTM is h(t). but we can measure the ‡ at any instant in time using our Acme Hydro‡ ow owometerTM . The second one says that the integral of the sum of two functions is the sum of the two integrals. but in economics the primary use is for addition. We use the measured ‡ at time ow ti to calculate the total ‡ for period i according to the formula ow h(ti ) T n where h(ti ) is our measure of the instantaneous ‡ and T =n is the length ow of time in the interval. suppose we wanted to do something strange like measure the amount of water ‡ owing in a particular river during a year. where T is the amount of time in a year. :::. 12. tn . We have not …gured out how to measure the total volume. because I wrote this in April 2008): They can also be used to …nd the area under a curve. To see why. Suppose that we break the year into n intervals of length T =n each.

4) a Note that the left-hand term is just the integral (with respect to x) of a derivative (with respect to x). Our correct formula for the volume would be Z T h(t)dt: V = 0 The expression dt takes the place of the expression T =n in the sum. b]: Z b a d [f (x)g(x)]dx = dx Z b f (x)g(x)dx + 0 a Z b f (x)g 0 (x)dx: (12. The nice thing about integration by parts is that it is simple to reconstruct. and it is the length of each measurement interval. ow Then T =n ! 0. Before going on. which is the topic of the next chapter. and if we tried to do the summation in equation (12. As we do this n becomes larger and T =n becomes smaller. But the total volume is not zero. though. 12. It’ s not. The idea behind an integral is adding an in…nite number of zeroes together to get something that is not zero. and combining those two operations leaves . so this cannot be the right approach.2 Integration by parts Integration by parts is a very handy trick that is often used in economics. Now suppose that we could measure the ‡ at every instant of time. INTEGRATION 156 We can make our estimate of the volume more accurate by taking more measurements of the ‡ ow. there are two useful tricks involving integrals that deserve further attention. Start with the product rule for derivatives: d [f (x)g(x)] = f 0 (x)g(x) + f (x)g 0 (x): dx Integrate both sides of this with respect to x and over the interval [a. which is zero. The right approach uses integrals. We can use this approach in a variety of economic applications.CHAPTER 12. each of which is multiplied by zero.3) we would add up a whole bunch of numbers. One major use is for taking expectations of continuous random variables. It is also what separates us from the lower animals.

u(x) must be nondecreasing. but you will someday. s 12. If she likes money. integration by parts was this mysterious thing. Now you know the secret –it’ just the product rule for derivatives. b is the highest possible payo¤. and F 0 (x) is the density function corresponding to the probability distribution function F (x). How can we do this? We must start with what we know about the individual. so u0 (x) 0. and u(x) is the utility of money. Suppose that an individual is choosing between two lotteries. and the individual’ objective function is s Z b u(x)F 0 (x)dx (12. (I use the derivative notation F 0 (x) instead of the density notation f (x) to make the use of integration by parts more transparent. INTEGRATION the function unchanged: Z b d [f (x)g(x)]dx = f (x)g(x)jb a dx a = f (b)g(b) f (a)g(a): Plugging this into (12.1 Application: Choice between lotteries Here is my favorite use of integration by parts. .5) When you took calculus.) The individual can choose between lottery F (x) and lottery G(x).CHAPTER 12. The individual likes money. u is a utility function de…ned over amounts of money.4) yields Z b Z b b 0 f (x)g(x)dx + f (x)g 0 (x)dx: f (x)g(x)ja = a a 157 Now rearrange this to get the rule for integration by parts: Z b Z b b 0 f (x)g 0 (x)dx: f (x)g(x)dx = f (x)g(x)ja a a (12.6) a where a is the lowest possible payo¤ from the lottery. You may not understand the economics yet. Lotteries are just probability distributions. and we would like to …nd properties of F (x) and G(x) that guarantee that the individual likes F (x) better.2.

a that is.7) a We can go through the same analysis for the other lottery. Since b is the highest possible payo¤ from the lottery. Since a is the lowest possible payo¤ from the lottery. and expression (12. and …nd b b u(x)G (x)dx = u(b) 0 u0 (x)G(x)dx: (12. she chooses F (x) over G(x) if Z b Z b 0 u(x)F (x)dx u(x)G0 (x)dx 0: a a . the distribution function must satisfy F (b) = 1. But the only thing we know about the individual is that u0 (x) 0.6). So let’ integrate s by parts. s Now look back at expression (12. Or written di¤erently. the above expression reduces to Z Z b u(x)F (x)dx = u(b) 0 a Z Z b b u0 (x)F (x)dx: (12. if the lottery F (x) generates a higher value of the objective function than the lottery G(x) does.CHAPTER 12.8) a a The individual chooses the lottery F (x) over the lottery G(x) if Z b u(x)F (x)dx 0 a Z u(x)G0 (x)dx. INTEGRATION 158 And that’ all we know about the individual. G(x). So. It is the integral of the product of u(x) and F 0 (x).6) does not have u0 (x) in it. the distribution function must satisfy F (a) = 0 (this comes from Theorem 19). Z b Z b b 0 u0 (x)F (x)dx: u(x)F (x)dx = u(x)F (x)ja a a To simplify this we need to know a little more about probability distribution functions.

CHAPTER 12. INTEGRATION Subtracting (12.8) from (12.7) yields Z b Z b 0 u(x)F (x)dx u(x)G0 (x)dx a a Z b Z b 0 u (x)F (x)dx u(b) u0 (x)G(x)dx = u(b) a a Z b Z b u0 (x)G(x)dx u0 (x)F (x)dx = a a Z b u0 (x) [G(x) F (x)] dx: =
a

159

The di¤erence depends on something we know about: u0 (x), which we know is nonnegative. The individual chooses F (x) over G(x) if the above expression is nonnegative, that is, if Z b u0 (x) [G(x) F (x)] dx 0:
a

We know that u0 (x) 0. We can guarantee that the product u0 (x) [G(x) F (x)] is nonnegative if the other term, [G(x) F (x)], is also nonnegative. So, we are certain that she will choose F (x) over G(x) if G(x) F (x) 0 for all x.

This turns out to be the answer to our question. Any individual who likes money will prefer lottery F (x) to lottery G(x) if G(x) F (x) 0 for all x. There is even a name for this condition –…rst-order stochastic dominance. But the goal here was not to teach you about choice over lotteries. The goal was to show the usefulness of integration by parts. So let’ look back and s see exactly what it did for us. The objective function was an integral of the product of two terms, u(x) and F 0 (x). We could not assume anything about u(x), but we could assume something about u0 (x). So we used integration by parts to get an expression that involved the term we knew something about. And that is its beauty.

12.3

Di¤erentiating integrals

As you have probably noticed, in economics we di¤erentiate a lot. Sometimes, though, the objective function has an integral, as with expression

CHAPTER 12. INTEGRATION

160

(12.6) above. Often we want to di¤erentiate an objective function to …nd an optimum, and when the objective function has an integral we need to know how to di¤erentiate it. There is a rule for doing so, called Leibniz’ rule, s named after the 17th-century German mathematician who was one of the two independent inventors of calculus (along with Newton). We want to …nd Z d b(t) f (x; t)dx: dt a(t) Note that we are di¤erentiating with respect to t, and we are integrating with respect to x. Nevertheless, t shows up three times in the expression, once in the upper limit of the integral, b(t), once in the lower limit of the integral, a(t), and once in the integrand, f (x; t). We need to …gure out what to do with these three terms. A picture helps. Look at Figure 12.1. The integral is the area underneath the curve f (x; t) between the endpoints a(t) and b(t). Three things happen when t changes. First, the function f (x; t) shifts, and the graph shows an upward shift, which makes the integral larger because the area under a higher curve is larger. Second, the right endpoint b(t) changes, and the graph shows it getting larger. This again increases the integral because now we are integrating over a larger interval. Third, the left endpoint a(t) changes, and again the graph shows it getting larger. This time, though, it makes the integral smaller because moving the left endpoint rightward shrinks the interval over which we are integrating. Leibniz’ rule accounts for all three of these shifts. Leibniz’ rule says s s d dt Z
b(t)

f (x; t)dx =

a(t)

Z

b(t)

a(t)

@f (x; t) dx + b0 (t)f (b(t); t) @t

a0 (t)f (a(t); t):

Each of the three terms corresponds to one of the shifts in Figure 12.1. The …rst term accounts for the upward shift of the curve f (x; t). The term @f (x; t)=@t tells how far upward the curve shifts at point x, and the integral Z
b(t)

a(t)

@f (x; t) dx @t

tells how much the area changes because of the upward shift in f (x; t). The second term accounts for the movement in the right endpoint, b(t). Using the graph, the amount added to the integral is the area of a rectangle

CHAPTER 12. INTEGRATION
f(x,t) f(x,t)

161

x a(t) x1 b(t) x1

Figure 12.1: Leibniz’ rule s

that has height f (b(t); t), that is, f (x; t) evaluated at x = b(t), and width b0 (t), which accounts for how far b(t) moves when t changes. Since area is just length times width, we get b0 (t)f (b(t); t), which is exactly the second term. The third term accounts for the movement in the left endpoint, a(t). Using the graph again, the change in the integral is the area of a rectangle that has height f (a(t); t) and width a0 (t). This time, though, if a(t) increases we are reducing the size of the integral, so we must subtract the area of the rectangle. Consequently, the third term is a0 (t)f (a(t); t). Putting these three terms together gives us Leibniz’ rule, which looks s complicated but hopefully makes sense.

12.3.1

Application: Second-price auctions

A simple application of Leibniz’ rule comes from auction theory. A …rsts price sealed bid auction has bidders submit bids simultaneously to an auctioneer who awards the item to the highest bidder who then pays his bid. This is a very common auction form. A second-price sealed bid auction has bidders submit bids simultaneously to an auctioneer who awards the item to the highest bidder, just like before, but this time the winning bidder pays the second-highest price. To model the second-price auction, suppose that there are n bidders and

CHAPTER 12. INTEGRATION

162

Let’ interpret this function. Bidder i wins if his is the highest bid, which s occurs if the highest other bid is between 0 (the lowest possible bid) and his own bid bi . If the highest other bid is above bi bidder i loses and gets a payo¤ of zero. This is why the integral is taken over the interval [0; bi ]. If bidder i wins he pays the highest other bid b, which is distributed according to the density function fi (b). His surplus if he wins is vi b, his value minus how much he pays. Bidder i chooses the bid bi to maximize his expected payo¤ Vi (bi ). Since this is a maximization problem we should …nd the …rst-order condition: Z bi d 0 (vi b)fi (b)db = 0: Vi (bi ) = dbi 0 Notice that we are di¤erentiating with respect to bi , which shows up only as the upper endpoint of the integral. Using Leibniz’ rule we can evaluate this s …rst-order condition: Z bi d 0 = (vi b)fi (b)db dbi 0 Z bi @ dbi d0 = [(vi b)fi (b)] db + (vi bi )fi (bi ) (vi 0)fi (0): dbi dbi 0 @bi The …rst term is zero because (vi b)fi (b) is not a function of bi , and so the partial derivative is zero. The second term reduces to (vi bi )fi (bi ) because dbi =dbi is simply one. The third term is zero because the derivative d0=dbi = 0. This leaves us with the …rst-order condition 0 = (vi bi )fi (bi ):

that bidder i values the item being auctioned at vi , which is independent of how much everyone else values the item. Bidders do not know their opponents’valuations, but they do know the probability distribution of the opponents’valuations. Bidder i must choose his bid bi . Let Fi (b) be the probability that the highest other bid faced by i, that is, the highest bid except for bi , is no larger than b. Then Fi (b) is a probability distribution function, and its density function is fi (b). Bidder i’ expected s payo¤ is Z
bi

Vi (bi ) =

(vi

b)fi (b)db:

0

CHAPTER 12. INTEGRATION

163

Since density functions take on only nonnegative values, the …rst-order condition holds when vi bi = 0, or bi = vi . In a second-price auction the bidder should bid his value. This result makes sense intuitively. Let bi be bidder i’ bid, and let b s denote the highest other bid. Suppose …rst that bidder i bids more than his value, so that bi > vi . If the highest other bid is in between these, so that vi < b < bi , bidder i wins the auction but pays b vi more than his valuation. He could have avoided this by bidding his valuation, vi . Now suppose that bidder i bids less than his value, so that bi < vi . If the highest other bid is between these two, so that bi < b < vi , bidder i loses the auction and gets nothing. But if he had bid his value he would have won the auction and paid b < vi , and so he would have been better o¤. Thus, the best thing for him to do is bid his value.

12.4

Problems

1. Suppose that f (x) is the density function for a random variable distributed uniformly over the interval [2; 8]. (a) Compute Z Z
8

xf (x)dx
8

2

(b) Compute

x2 f (x)dx

2

2. Compute the following derivative: Z 2 d t tx2 dx dt t2 3. Find the following derivative: d dt Z
4t2

t2 x3 dx
3t

4. Let U (a; b) denote the uniform distribution over the interval [a; b]. Find conditions on a and b that guarantee that U (a; b) …rst-order stochastically dominates U (0; 1).

CHAPTER

13 Moments

13.1

Mathematical expectation

Let x be a random variable with density function f (x) and let u(x) be a ~ real-valued function. The expected value of u(~) is denoted E[u(~)] and x x it is found by the following rules. If x is discrete taking on value xi with ~ probability f (xi ) then X E[u(~)] = x u(xi )f (xi ):
i

If x is continuous the expected value of u(~) is given by ~ x Z 1 E[u(~)] = x u(x)f (x)dx:
1

Since integrals are for adding, as we learned in the last chapter, these formulas really do make sense and go together.

164

10. it is the expected ~ x value of the function u(x) = x. The mean is E[~] = (4)(0:1) + (10)(0:2) + (12)(0:3) + (20)(0:4) = 14 x 13. that is.2 Normal distribution The mean of the general normal distribution is the parameter .1 Uniform distribution The mean of the uniform distribution over the interval [a. compute t Z b 1 x E[~] = x dx b a a Z b 1 xdx = b a a 1 2 = x b a 2 a 1 b 2 a2 = b a 2 b+a = : 2 1 b 13. b] is (a + b)=2.2. 0:2. If you don’ believe me. Consider the discrete distribution with outcomes (4. 20) and corresponding probabilities (0:1. draw it.2. Recall that the normal density function is 1 f (x) = p e 2 (x )2 =2 2 : . which means that E[au(~)] = aE[u(~)] x x E[u(~) + v(~)] = E[u(~)] + E[v(~)] x x x x 165 13. 0:3.2 The mean The mean of a random variable x is = E[~]. 12. MOMENTS The expectation operator E[ ] is linear. To …gure it out from the formula.CHAPTER 13. 0:4).

Note that E[(~ x )2 ] = = = = E[~2 2 x + 2 ] x ~ 2 E[~ ] 2 E[~] + x x 2 2 E[~ ] 2 + 2 x 2 E[~2 ] x . and like all densities its integral is 1. Thus. and dx = dy. (x )2 = 2 = y 2 . MOMENTS The mean is Z 166 x 2 2 p e (x ) =2 dx: 2 Use the change-of-variables formula y = x so that x = + y. 2 . we get E[~] = . so the expression is zero. Then y 2 = z 2 and dy = dz.3 Variance The variance of the random variable x is E[(~ ~ x )2 ].CHAPTER 13. where = E[~] is the x 2 mean of the random variable. The second integral can be split into two parts: Z 1 Z 0 Z 1 2 y 2 =2 y 2 =2 ye dy = ye dy + ye y =2 dy: E[~] = x 1 1 0 Use the change of variables y = z in the …rst integral on the right-hand side. The variance is denoted . Then we can rewrite Z x 2 2 p e (x ) =2 dx E[~] = x Z 1 2 + y 2 p e y =2 dy = 2 1 Z 1 Z 1 1 2 y 2 =2 p e dy + p ye y =2 dy: = 2 2 1 1 The …rst integral is the integral of the standard normal density. so Z 1 Z 0 2 y 2 =2 ze z =2 dz ye dy = 1 0 Plugging this back into the expression above it yields Z 1 Z 1 Z 1 y 2 =2 z 2 =2 ye dy = ze dz + ye 1 0 0 y 2 =2 dy: But both integrals on the right-hand side are the same. x 13.

0:2. The variance is E[(~ x )2 ] = (0:1)(4 14)2 + (0:2)(10 14)2 + (0:3)(12 14)2 + (0:4)(20 14)2 = (0:1)(100) + (0:2)(16) + (0:3)(4) + (0:4)(36) = 28:8 We can also …nd it using the alternative formula: E[~2 ] x 2 = (0:1)(42 ) + (0:2)(102 ) + (0:3)(12)2 + (0:4)(202 ) 142 : You should be able to show that V ar(a~) = a2 V ar(~): x x p The standard deviation of the random variable x is E[(~ ~ x )2 ]. 0:4). 13.CHAPTER 13. It is the square root of the variance. which means that the standard deviation is simply . The outcomes were (4. 0:3. 12.1 Uniform distribution The variance of the uniform distribution can be found from Z b 2 x 2 dx E[~ ] = x a a b = 1 3 x b a 3 a 1 b 3 a3 = b a 3 1 b . 10. 20) and the corresponding probabilities were (0:1. The mean was 14.3. MOMENTS 167 We can …nd the variance of the discrete random variable used in the preceding section.

3. compute Z 1 (x )2 (x )2 =2 2 2 p e dx E[(~ x ) ]= 2 1 Using the same change-of-variables trick as before. we get Z 1 (x )2 (x )2 =2 2 2 p e E[(~ x )] = dx 2 1 Z 1 ( + y )2 y2 =2 p = e dy 2 1 Z 1 y2 2 p e y =2 dy = 2 1 Z 1 2 y 2 p e y =2 dy: = 2 2 1 The integral is the variance of the standard normal. Consequently. The variance of the general normal distribution had better be the parameter 2 . To make sure.2 Normal distribution The variance of the standard normal distribution is 1. which we already said was 1. Let’ take that on s faith.CHAPTER 13. The value of the highest . MOMENTS and note that b3 2 168 a3 = (b a)(b2 + ab + a2 ). 13.4 Application: Order statistics Suppose that you make n independent draws from the random variable x ~ with distribution function F (x) and density f (x). 2 = E[~2 ] x b2 + ab + a2 (b + a)2 = 3 4 2 2 4b + 4ab + 4a 3b2 6ab = 12 b2 2ab + a2 = 12 (b a)2 = : 12 3a2 13.

but they didn’ know the di¤erence at the t time) is equally bright. As a side note. Hubble was clearly an overachiever. MOMENTS 169 of them is a random variable. The seller’ expected revenue in this auction is the exs pected value of the second-highest of the n bids. the value of the second highest is a random variable. Think about the …rst order statistic. Now think about the second-price sealed-bid auction. in particular. He was the …rst astronomer to use the giant 200-inch Hale telescope at Mount Palomar. the study of the cosmos. The seller’ expected revenue. He was a Rhodes scholar. the …rst order statistic is the expected value of the highest of the n draws. The n-th order statistic is the expected value of the n-th highest draw. In set the Illinois state high school record for high jump. Think about a …rst-price sealed bid auction in which n bidders submit their bids simultaneously and then the highest bidder wins and pays her bid. We want to …nd the order statistics. This was a problem because we know that stars that are farther away are dimmer. we can’ tell whether a particular star is dim because it’ far t s away or because it’ just not very bright (econometricians would call this an s identi…cation problem). It is the expected value of the highest of the n draws. order statistics have also played a role in cosmology. but the most straightforward one is auctions. Importantly for this story. . which is the second order statistic. But what is the distribution? We must construct it from the underlying distribution F . but not all stars have the same brightness. the second order statistic is the expected value of the second highest of the n draws. Astronomers before Hubble made the heroic (that is. In this auction n bidders submit their bids simultaneously. So. He was honored with a 41 cent postage stamp. which is the …rst order statistic. In other words. and. Hubble used the milder assumption that the brightest star in every nebula (or galaxy. then. he assumed that the …rst order statistic is the same for every nebula. and the highest of the n draws is a random variable with a distribution. To do this we have to …nd some distributions. is the expected value s of the highest of the n bids. the highest bid wins. though. So. and in particular they were used by Edwin Hubble. and so on. the …rst and second order statistics. and so on. unreasonable) assumption that all starts were the same brightness and worked from there. for n random variables. He has the Hubble Space Telescope named after him. the Milky Way. and the winner pays the second-highest bid.CHAPTER 13. he established that the universe extends beyond our galaxy. We use order statistics in a variety of settings.

We want to derive G(1) (x). and so G(1) (x) = F n (x): From this we can get the density function by di¤erentiating G(1) with respect to x: g (1) (x) = nF n 1 (x)f (x): Note the use of the chain rule. since we know the distribution and density functions for the highest of n draws. which is 1=2. When there are n draws the probability that all of them are less than or equal to x is (F (x))n . We have F (x) = x on [0. When n = 2 the probability that both draws are less than or equal to x is (F (x))2 . This makes it possible to compute the …rst order statistic.CHAPTER 13. Remember that G(1) (x) is the probability that the highest draw is less than or equal to x. For the highest draw to be less than or equal to x. The …rst order statistic is Z (1) s = xnF n 1 (x)f (x)dx Z 1 = xnxn 1 dx 0 Z 1 = n xn dx = n x n+1 n = : n+1 0 n+1 1 0 This answer makes some sense. We just take the expected value in the usual way: Z (1) s = xnF n 1 (x)f (x)dx: Example 12 Uniform distribution over (0. If n = 1 the …rst order statistic is just the mean. and f (x) = 1 on [0. it must be the case that every draw is less than or equal to x. 1]. 1]. MOMENTS 170 Let G(1) (x) denote the distribution for the highest of the n values drawn independently from F (x). 1). When n = 1 the probability that the one draw is less than or equal to x is F (x). And so on. If n = 2 and the distribution is uniform so that the .

. MOMENTS 171 draws are expected to be evenly spaced. The probability of getting draw 2 above y and the rest below is the same. we are looking for the probability that the second-highest draw is no greater than some number. In the last row all of the draws are below y.CHAPTER 13. If n = 3 the highest draw should be about 3=4. and so on for the …rst n rows of the table. . Draw n is above y and the rest are below y All the draws are below y Probability (1 F (y))F n 1 (y) (1 F (y))F n 1 (y) . Regarding the …rst line. The probability of all n draws being below y is just F n (y). s the probability that draws 2 through n are below y is the probability of getting n 1 draws below y. . beginning with identifying the distribution of the second-highest draw. then the highest draw should be about 2=3 and the lowest should be about 1=3. There are a total of n + 1 ways that we can get the second-highest draw to be below y. Multiplying these together yields the probability of getting draw 1 above y and the rest below. in which case both the highest and the second highest draws are below y. which is F n 1 (y). We also care about the second order statistic. To make this exercise precise. To …nd it we follow the same steps. (1 F (y))F n 1 (y) F n (y) Let’ …gure out these probabilities one at a time. the same as when we looked at the …rst order statistic. and they are listed below: Event Draw 1 is above y and the rest are below y Draw 2 is above y and the rest are below y . call it y. The probability that draw 1 is above y is 1 F (y). . Summing the probabilities yields the distribution of the second-highest draw: G(2) (y) = n(1 F (y))F n 1 (y) + F n (y): Multiplying this out and simplifying yields G(2) (y) = nF n 1 (y) (n 1)F n (y): The density function is found by di¤erentiating G(2) with respect to y: g (2) (y) = n(n 1)F n 2 (y)f (y) n(n 1)F n 1 (y)f (y): . and so on.

and the lowest at 1=4.CHAPTER 13. MOMENTS It can be rearranged to get g (2) (y) = n(n 1)(1 F (y))F n 2 (y)f (y): 172 The second order statistic is the expected value of the second-highest draw. the second highest at 2=4. If there are …ve draws. Z (2) s = yn(n 1)(1 F (y))F n 2 (y)f (y)dy Z 1 yn(n 1)(1 y)y n 2 dy = 0 Z 1 = n(n 1) y n 1 y n dy 0 yn = n(n 1) n(n n 0 n(n 1) = (n 1) n+1 n 1 : = n+1 1 1) y n+1 n+1 1 0 If there are four draws uniformly dispersed between 0 and 1. and so on. which is Z (2) s = yg (2) (y)dy Z = yn(n 1)(1 F (y))F n 2 (y)f (y)dy: Example 13 Uniform distribution over [0. . the highest draw is expected to be at 3=4. 1]. the highest is expected to be at 4=5 and the second highest is expected to be at 3=5.

x 10 15 20 30 100 f (x) 0.5 Problems 1.2 g(x) 0.10 -14 . The following table shows the probabilities for two random variable. (b) Verify that it is a distribution function.CHAPTER 13.1 0.05 0.3 (a) Compute the means of the two variables.10 4 . and F (1) = 1.30 0. (b) Compute the variances of the two variables. MOMENTS 173 13. that is.1 0. . Suppose that the random variable x takes on the following values with ~ the corresponding probabilities: Value Probability 7 . that F is increasing. 2. (b) Compute the variance. one with density function f (x). F (0) = 0.02 (a) Compute the mean. 1]. Consider the triangular density given by f (x) = 2x on the interval [0. and speci…cally for this case. (c) Find the mean.1 0. and one with density function g(x).15 -6 . (c) Compute the standard deviations of the two variables.20 0. 3.40 -2 .15 0. (a) Find its distribution function F .5 0.23 2 .

1 x 8 174 on the interval (b) Verify that it satis…es the properties of a distribution function. MOMENTS (d) Find the variance. respectively. F (0) = 0. ~ 8. Suppose that the random variable x takes the value 6 with probability ~ 1 and takes the value y with probability 1 . and F increasing. and if y = 3~ ~ x 1. 2 2 where 2 is the variance of x. then the variance 7. (c) Find the mean. Consider the triangular density given by f (x) = [0. 4]. Show that if the variance of x is ~ where a is a scalar. ~ 2 x 2 then the variance of a~ is a2 x 2 . 6. 5.CHAPTER 13. Show that if the variance of x is ~ 2 of y is 9 x . . (a) Find its distribution function F . Find the derivative d 2 =dy. F (4) = 1. (d) Find the variance. Show that G(1) …rst-order stochastically dominates G(2) . that is. Let G(1) and G(2) be the distribution functions for the highest and second highest draws. 4.

1) is the probability that x ~ x and y ~ 1.1 Bivariate distributions Let x and y be two random variables. rainfall. and so F (x. ~ 175 . A bivariate distribution function is de…ned over two random variables. The function F (x. y) is ~ ~ given by F (x. 1) is the univariate distribution function for the random variable x. A multivariate distribution function is de…ned over a vector of random variables. 14. what we normally refer to as "the weather" is comprised of several random variables: temperature. For example. The distribution function F (x. humidity.CHAPTER 14 Multivariate distributions Multivariate distributions arise when there are multiple random variables. The latter is sure to hold. Everything can be extended to multivariate distributions by adding more random variables. In this chapter I restrict attention to bivariate distributions. y) = P (~ x and y y): x ~ It is called the joint distribution function. etc.

1 0.CHAPTER 14. If they are both discrete then the density is given by f (x. and the random ~ variable y can take on two possible values. MULTIVARIATE DISTRIBUTIONS 176 Similarly.2 0. t)dsdt: 1 1 14. If they are both continuous the density is x ~ given by @2 F (x.2 x=1 ~ x=2 ~ x=3 ~ The random variable x can take on three possible values. y): f (x.2 Marginal and conditional densities Consider the following example with two random variables: y=1 y=2 ~ ~ 0.3 x=3 ~ 0.3 0.3 fy (y) 0. The probabilities in the table ~ are the values of the joint density function f (x. ~ The density function depends on whether the random variables are continuous or discrete.1 0.1 0. y) = @x@y This means that the distribution function can be recovered from the density using the formula Z Z y x F (x. y) = f (s.4 0. y) = P (~ = x and y = y).4 x=2 ~ 0.6 1 ~ .1 0.3 0.2 0.1 0. the function F (1.2 0. y) is the univariate distribution function for the random variable y . Now add a total row and a total column to the table: y = 1 y = 2 fx (x) ~ ~ ~ x=1 ~ 0.1 0. y).

2) = 0:2. We would like to have conditional densities. For a discrete bivariate random variable (~. and the ~ sum of the second column is the total probability that y = 2. The probability that x = 3 and ~ ~ ~ y = 2 is f (3. A) f (xjA) = P (A) . the rule is just what we would expect in ~ the discrete case. y)dydt: fx (t)dt = Fx (x) = ~ ~ 1 1 1 We have already discussed conditional probabilities. For the continuous bivariate ~ random variable we de…ne the marginal density of x by ~ Z 1 f (x. it doesn’ matter in this case whether the random variables are discrete t or continuous for us to …gure out what to do. look for the probability that x = 3 given y = 2. Let A be an event. y)dy: fx (x) = ~ 1 From this we can recover the marginal distribution function of x by integrat~ ing with respect to x: Z x Z 1 Z x f (t. to de…ne ~ the conditional density much more generally. ~ ~ To see that this is true. Both of these formulas require conditioning on a single realization of y . 2)=fy (2) = 0:2=0:6 = 1=3. What about the continuous case? The same formula works: f (xjy) = f (x. The conditional probability is f (~ = 3j~ = 2) = ~ x y f (3. and let P be the probability measure over events. Then we can write the conditional density f (x. It is possible. y)=f (y). These are ~ the marginal densities. From the table above. though. :::. ~ ~ The probability that y = 2 is fy (2) = 0:6. MULTIVARIATE DISTRIBUTIONS 177 The sum of the …rst column is the total probability that y = 1.CHAPTER 14. it is apparent that the conditional density of x given the realization of y is f (xjy) = f (x. y) : fy (y) ~ So. yi ) where the possible values of y are y1 . So. yn . y ) we x ~ de…ne the marginal density of x by ~ fx (x) = ~ n X i=1 f (x.

Marginal density F (x) fx (x) ~ Conditional density f (xjy) or fxjy (xjy) ~ F (x. y) = fx (x)fy (y). we know that P (A) = F (x0 ). Then x ~ the expected value of u(~. y ) is x ~ XX E[u(~. so that A is the event ~ that the realization of x is no greater than x0 . table to help with notation and terminology. y) 1 Continuous: f (x. Here is a Function Density Distribution Notation Formula f (x.CHAPTER 14. y ) is a univariate random variable. ~ ~ ~y ~ ~ In other words. y)dy 1 f (x. For example. A) denotes the probability that both x and A occur.~(x. y) or fx. y)=f (y) Continuous: Ry Rx 1 1 f (s. y) Discrete: P ~y P y y ~ x x ~ f (x. y) be a realx ~ valued function. y) ~y F (x. y) or Fx. y )] = x ~ u(x.~(x. 14. Let u(x. t)dsdt The random variables x and y are independent if fx. Independence implies that f (xjy) = fx (x) and f (yjx) = fy (y). written as a density function.~(x. y) Univariate dist. y)f (x.3 Expectations Suppose we have a bivariate random variable (~. if A = fx : x x0 g. y ). MULTIVARIATE DISTRIBUTIONS 178 where f (x. 1) P Discrete: f yR (x. ~ Therefore ( f (x) if x x0 F (x0 ) f (xj~ x0 ) = x 0 if x > x0 At this point we have too many functions ‡ oating around. the random variables are independent if the bivariate density is the product of the marginal densities. so that the conditional densities and the marginal ~ ~ densities coincide. y) y x . in which case u(~.

~ ~ . y ) = E[(~ x ~ x y x )(~ y )]: xy Note that xx is just the variance of x. 6) = 0 while fx (2) fy (6) = (0:2)(0:4) 6= 0. ~ ~ Consequently we can write xy = E[(~ x y x )(~ y )] = E[~ x x] E[~ y y )]: But each of the expectations on the right-hand side are zero. for example. Also. The following the~ ~ orems apply to correlation coe¢ cients. An example is given in the table below. Theorem 20 If x and y are independent then ~ ~ xy = xy = 0. It is important to remember that the converse is not true: sometimes two variables are not independent but still happen to have a zero covariance. y) = fx (x)fy (y). where ~ ~ x is the mean of x and y is the mean of y . and the result follows. The resulting expectation is called the covariance of x and y . We can do this using the marginal densities. y )] = x ~ Z 1 1 179 Z 1 1 u(x. Proof. y) = (x x )(y y ). in the table above the mean of y is (0:4)(1) + (0:6)(2) = 1:6: ~ A particularly important case is where u(x. and it is denoted ~ ~ xy = Cov(~. MULTIVARIATE DISTRIBUTIONS in the discrete case and E[u(~. it is easy to show that ~ = E[~y ] x~ x y. One can compute that xy = 0 but note that f (2. f (x. y)dxdy in the continuous case. When the random variables are independent. The quantity xy = xy x y is called the correlation coe¢ cient between x and y .CHAPTER 14. It is still possible to compute the means of the random variables x and y ~ ~ separately. So. y)f (x.

Because y = E[~ x 2 x 2t~y + t2 y 2 ] x~ ~ 2 x ( 2 x 2t 2 y x y + t2 2 y) x y E[~2 ] x + t2 2 y + t2 E[~2 ] y 2t xy : 2t E[~y ] x~ Since this is true for any scalar t. If xy = 1 it means that the two random variables are perfectly correlated. choose t= Substituting gives us 2 xy : 2 y 0 0 2 xy 2 2 x y xy x y 2 x + xy 2 y 2 xy 2 y 2 y 2 xy 2 y xy 2 x 1 1: The theorem says that the correlation coe¢ cient is bounded between 1 and 1. we have 0 = = 2 x ty 2 t~. MULTIVARIATE DISTRIBUTIONS y = 6 y = 8 y = 10 ~ ~ ~ x=1 ~ 0.2 0 0.2 Theorem 21 1: 180 xy Proof.2 0 0. If xy = 1 the random variables are perfectly negatively . where t is a scalar.CHAPTER 14.2 0 x=2 ~ x=3 ~ 0. Consider the random variable x ~ variances cannot be negative. and once you know the value of one of them you know the value of the other.2 0 0.

At that point he has three choices: (i) but the camera at that store. This would be the conditional mean. wants to buy a particular digital camera. The reason is that x is integrated out on the right-hand side. 14. Max. I will only consider the continuous case here. let u(x) be a function of x but not of y. Stores draw their prices independently from the distribution . 14. the conditional ~ mean of x given that y = 8 is 2.CHAPTER 14. but for di¤erent reasons this time. ~ To make this as general as possible. but y is still there.4 Conditional expectations When there are two random variables. the discrete case is similar. We can use the above table for an example. x and y . and that value is 2. y)dx: E[u(~)j~ = y] = x y u(x)f (xjy)dx = fy (y) 1 ~ 1 Note that this expectation is a function of y but not a function of x. MULTIVARIATE DISTRIBUTIONS 181 correlated. If you know that x has attained its highest possible value and x and y are ~ ~ ~ perfectly negatively correlated. The conditional expectation of u(~) given that y = y is given by x ~ Z 1 Z 1 1 u(x)f (x. ~ Finally. So.4. then y must have attained its lowest value. This contains just as much information as perfect correlation. or (iii) go back to a previous store and buy the camera there. He goes to a store and looks at the price. if xy = 0 the two variables are perfectly uncorrelated (and possibly independent). What is the expected value of x given that y = 8? x can ~ ~ ~ only take one value when y = 8. one might want to …nd the ~ ~ expected value of x given that y has attained a particular value or set of ~ ~ values.calculating the bene…t of search Consider the following search process. A consumer. (ii) go to another store to check its price. The conditional mean of x given that ~ ~ ~ y = 10 is also 2.1 Using conditional expectations .

a 30% chance of …nding the same price and saving nothing. MULTIVARIATE DISTRIBUTIONS F (p) given by p p p p = = = = 200 190 180 170 with with with with probability probability probability probability 0:2 0:3 0:4 0:1 182 We want to answer the following question: If the lowest price so far is q. where the expectation is taken over the random variable p. so that the lowest price found so far is the worst possible price. what is the expected bene…t from checking one more store? Let’ begin by answering this in the most straightforward way possible. So now we know the answers.CHAPTER 14. The probabiliyt of …nding such a store is 10%. and the expected saving from search is \$1. the expected bene…t of search is \$6. in which case the savings are zero. and let’ use these answers to …gure out s a general formula. Otherwise the bene…t is zero because he would be better o¤ buying the item at a store he’ already s found. which comes by …nding a store that charges a price of \$170. This "if" statement lends itself to a conditional expectation. s Suppose that q = 200. If Max searches one more time there is a 10% chance of …nding a price of \$170 and saving \$30. When q = 200. When the best price found so far is q = 190. Now suppose that q = 190. in which case he also saves nothing. a 40% chance of …nding a price of \$180 and saving \$20. the "if" statement pertains to the conditional expectation E[q pj~ < q]. The expected saving is (:1)(20) + (:4)(10) + (:3)(0) + (:2)(0) = 6. the expected bene…t of search is \$14. This ~p epxression tells us what the average bene…t is provided that the bene…t is . In particular. suppose that q = 180. and a 20% chance of …nding another store that charges the highest possible price of \$200. Now there is only one way to improve. Finally. Note that when Max …nds a price of p and the best price so far is q. The expected saving is (:1)(30) + (:4)(20) + (:3)(10) + (:2)(0) = 14. a 40% chance of …nding a price of \$180 and saving \$10. and a 20% chance of …nding a higher price of \$200. so that the best price found so far is \$190. Max has a 10% chance of …nding a price of \$170 and saving \$20. leading to a \$10 saving. a 30% chance of …ncing a price of \$190 and saving only \$10. speci…cally one involving conditional expectations. his bene…t is q p if the new price p is lower than the old price q.

so that term is in brackets. For the second term. 200]. To …nd the conditional expectation.99: Z q Z 182:99 1 [q p]f (p)dp = [182:99 p] dp = 10:883: 50 150 150 . q] because outside of this interval the value of the bene…t is zero. because that is the de…nition of the distribution function. The actual expected bene…t is Prf~ < qgE[q p pj~ < q]. We take the integral over the interval p [150. we multiply by the conditional density which is the density of the random variable p divided by the probability that the conditioning event (~ < q) occurs. Using it we can …nd the net bene…t of searching at one more store when the best price so far is \$182. let’ s s look at q = 190. The conditional expectation is E[190 pj~ < 190] = (:4)(190 ~p = 6. MULTIVARIATE DISTRIBUTIONS nonnegative. When we multiply the two terms on the right-hand side of the top line together. leaving us with the very simple bottom line. Let the corresponding distribution function be F (p) and the density function be f (p). Suppose that prices are drawn independently from the uniform distribution over the interval [150. The conditional expectation lets us work with more complicated distributions. note that we are taking the expectation of q p. The probability that p < q ~ is simply F (q). The expected bene…t from searching at another store when the lowest price so far is q is Z q f (p) dp Prf~ < qgE[q pj~ < q] = F (q) p ~p [q p] F (q) 150 Z q = [q p]f (p)dp: 150 To see why this works. In particular. 180) + (:1)(190 170) which is exactly what we found before.CHAPTER 14. That gives us the …rst term on the right-hand side. we …nd that the F (q) term cancels out. look at the top line. Let’ make sure this works using the above example. ~p 183 which is the probability that the bene…t is positive times the expected bene…t conditional on the bene…t being positive.

As we have already learned. It is called the Law of Iterated Expectations.6 1 ~ Begin by …nding the conditional expectations E[~j~ = 1] and E[~j~ = 2]. this is a function of y but not a function of x.2 0. The best thing to do here is to look at an example to see what’ going s on.3 0. so let’ look at what it means.1 0. The inside of s s the left-hand side is the conditional expectation of u(~) given that y takes x ~ some value y. and it goes like this.2 The Law of Iterated Expectations There is an important result concerning conditional expectations. The Law of Iterated Expectations says that y E[v(~)] = Ex [u(~)]. xy xy We get E[~j~ = 1] = 2 and E[~j~ = 2] = 11=6. It is ~ Ex [~] = fx (1) 1 + fx (2) 2 + fx (3) 3 x ~ ~ ~ = (0:4)(1) + (0:3)(2) + (0:3)(3) = 1:9: It works. Ey [Ex [u(~)j~ = y]] = Ex [u(~)]: x y x It’ a complicated statement.1 0. Doing that removes the conditional part. and v(~) is a random variable. MULTIVARIATE DISTRIBUTIONS 184 14. Let’ call it v(y).3 x=3 ~ 0.1 0.2 0.4. Let’ use one of our previous examples: s y = 1 y = 2 fx (x) ~ ~ ~ x=1 ~ 0.4 0.4 x=2 ~ 0. y x Another way of looking at it is taking the expectation of a conditional expectation. Now take the expectation xy xy over y to get Ey [Ex [u(~)j~ = y]] = fy (1) E[~j~ = 1] + fy (2) E[~j~ = 2] x y xy xy ~ ~ = (0:4)(2) + (0:6)(11=6) = 1:9: Now …nd the unconditional expectation of x.CHAPTER 14.3 fy (y) 0. . So now let’ s y s take the expectation of v(~).

~ (f) Find the mean of x conditional on y = 20. y)dydx: 1 1 185 Note that f (x. y) y = 10 y = 20 y = 30 ~ ~ ~ x=1 ~ :04 0 :20 x=2 ~ :07 0 :18 :02 :11 :07 x=3 ~ x=4 ~ :01 :12 :18 (a) Construct a table showing the distribution function F (x. y)dy dx = 1 1 Z 1Z 1 = u(x)f (x. ~ ~ (c) Find the marginal densities fx (x) and fy (y).5 Problems 1. MULTIVARIATE DISTRIBUTIONS Now let’ look at it generally using the continuous case. y) ~ ~ given by the following table. with joint density f (x. f (x. ~ ~ . x and y . Begin with s Z 1 Ex [u(~)] = x u(x)fx (x)dx ~ 1 Z 1 Z 1 u(x) f (x. so we can rewrite the above expression ~ Z 1Z 1 Ex [u(~)] = x u(x)f (xjy)fy (y)dydx ~ 1 1 Z 1 Z 1 = u(x)f (xjy)dx fy (y)dy ~ 1 1 Z 1 Ex [u(~)j~ = y]fy (y)dy x y = ~ = Ey [Ex [u(~)j~ = y]]: x y 1 14.CHAPTER 14. ~ ~ (d) Find the conditional density f (xj~ = 20). y) = f (xjy)fy (y). There are two random variables. y (e) Find the mean of y . (b) Find the univariate distributions Fx (x) and Fy (y). y).

x y (i) Find Cov(~. with joint density given by ~ ~ the following table: f (x.02 0.02 0.05 0. y) x=1 ~ x=2 ~ x=3 ~ x=4 ~ y=3 ~ 0. ~ ~ (g) Are x and y independent? ~ ~ (h) Find V ar(~) and V ar(~). ~ ~ (f) Find the mean of x conditional on y = 3.03 0. Show that F (xjx c) is the uniform distribution over [a. b).1 ~ a x = +1 0.11 (a) Construct a table showing the distribution function F (x. ~ ~ (k) Verify the Law of Iterated Expectations for …nding Ex [~] =. x (e) Find the means of x and y . (b) Find the univariate distributions Fx (x) and Fy (y).20 0.07 y=8 ~ 0. Let F (x) be the uniform distribution over the interval [a.01 0.Ey [Ex [~jy]]: x x 3.3 ~ b What values must a and b take for x and y to be independent? ~ ~ . y ). 4. x ~ (j) Find the correlation coe¢ cient between x and y . Consider the table of probabilities below: f (x. y). y) y = 10 y = 20 ~ ~ x = 1 0.21 0. ~ ~ (d) Find the conditional density f (yj~ = 3).11 y = 10 ~ 0. b]. ~ ~ (c) Find the marginal densities fx (x) and fy (y). 186 2.05 0. c]. There are two random variables. x and y . MULTIVARIATE DISTRIBUTIONS (g) Are x and y independent? ~ ~ (h) Verify that the Law of Iterated Expectations works.12 0. and suppose that c 2 (a.CHAPTER 14.

we want to infer properties of the entire population from the random sample. :::. :::. Instead. or the ACT scores of all students in the state. each ~ ~ ~ of which has the same distribution as the parent random variable x. identically distributed (IID) random variables x1 . Let x1 . A statistic is a 187 . A random sample from a population random variable x is a set of in~ dependent. Di¤erent members of the population have di¤erent values for the variable. This is statistics. its mean is and its variance is 2 .1 Some de…nitions The set of all of the elements about which some information is desired is called the population. Examples might be the height of all people in Knoxville. xn be the outcomes of the random sample. so we can treat the population variable as a random variable x. ~ The reason for random sampling is that sometimes it is too costly to measure all of the elements of a population. x2 . So far everything we have done in probability theory is about the ~ population random variable.CHAPTER 15 Statistics 15. In particular. xn . or the opinions about Congress of all people in the US.

x y Proof. let’ …gure out the variance of the sum of two independent random variables s x and y . Note that (x x +y 2 y) = (x 2 x) + (y 2 y) + 2(x x )(y y ): Take the expectations of both sides to get 2 V ar(~ + y ) = E[(~ x ~ x ~ x+y y) ] 2 2 = E[(~ x y x x ) ] + E[(~ y ) ] + 2E[(~ = V ar(~) + V ar(~) + 2Cov(~. Examples include the sample mean and the sample variance. y ): x y x ~ y x )(~ y )] . 15. ~ ~ Theorem 22 Suppose that x and y are independent. the expected value of the sample mean is the population mean. STATISTICS 188 function of the outcomes of the random sample which does not contain any unknown parameters. The variance of the sample mean can also be found. The expected value of the sample mean can be found as follows: 1X E[~i ] x E[x] = n i=1 n 1X = n i=1 n = 1 (n ) = : n So. To do this. though.2 Sample mean x= x1 + ::: + xn : n The sample mean is given by Note that we use di¤erent notation for the sample mean (x) and the population mean ( ). Then V ar(~ + y ) = ~ ~ x ~ V ar(~) + V ar(~).CHAPTER 15.

y ) = 0. recall that V ar(a~i ) = a2 V ar(~i ). you will almost always use s2 . So. What you should remember is to use m2 as the computed variance when the random sample coincides with the entire population. since x and y are independent. and many scienti…c calculators compute both of them. 15. We can use this theorem to …nd the variance of the sample mean. as shown in Theorem 20. bigger samples imply better …ts. the theorem applies.3 Sample variance 1X m = (xi n i=1 n 2 We are going to use two di¤erent pieces of notation here. ~ ~ x ~ and the result follows.CHAPTER 15. One is x)2 and the other is s = 2 1 n 1 Both of these can be interpreted as the estimates of variance. Since a random sample is a set of IID random variables. So. Also. x x ! n 1X V ar(x) = V ar xi ~ n i=1 n 1 X = V ar(~i ) x n2 i=1 = = n 2 n2 2 n : This is a really useful result. and we refer n X i=1 (xi x)2 . and to use s2 when the random sample is a subset of the population. Cov(~. It says that the variance of the sample mean around the population mean shrinks as the sample size becomes larger. which we all knew already but we didn’ t know why. STATISTICS 189 But. In other words.

note that since xi has mean and variance ~ have E[~2 ] = 2 + 2 : xi Plugging this all in yields E[m2 ] = = ( = n n 1X ( n i=1 n 2 2 2 + 2 2 ) ! 2 2 2 n + 1 2 ) n 2 . But. m2 is useful for what we are about to do. Finally. It is easier to …nd the expected value of m2 and note that s2 = We get " n # X 1 E[m2 ] = E (~i x)2 x n i=1 " n # X 1 E xi 2 ~ = E[x2 ] n i=1 which follows from a previous manipulation of variance: V ar(~) = E[(~ x x 2 2 2 ) ] = E[~ ] x .CHAPTER 15. We know that E[x] = V ar(x) = 2 =n. we But we already know some of these values. Rearranging that formula and applying it to the sample mean tells us that E[x2 ] = E[x]2 + V ar(x). so E[m2 ] = 1X E[~2 ] xi n i=1 n n n 1 m2 : ! E[x]2 V ar(x): and . We want to …nd the expected value of the sample variance s2 . STATISTICS 190 to s2 as the sample variance.

It may not be a very good or precise estimate. This may seem cryptic and obvious. STATISTICS Now we can get the mean of the sample variance s2 : E[s2 ] = E = = = n n n n 2 191 n n 1 1 : 1 m2 E[m2 ] n n 1 2 The reason for using s2 as the sample variance instead of m2 is that s2 has the right expected value. . but we only have n 1 degrees of freedom when we compute the sample variance because one degree of freedom was used to compute the sample mean. They are also unbiased because their expected values are equal to the population parameters. the parameters of the population distribution. that is. but it is still an estimate. Unbiasedness is an important and valuable property. The terminology used in statistics is degrees of freedom. the expected value of the sample variance is equal to the population variance. but now think about what we need in order to compute a sample variance. Since we can use the …rst observation to compute a sample mean.CHAPTER 15. We only need one observation to compute a sample mean. in expectation. and we need at least one observation to do this. That leaves us with n 1 observations to compute the sample variance. because they depend only on the observed values of the random sample and they have no unknown parameters. Before we can compute the sample variance. With n observations we have n degrees of freedom when we compute the sample mean. Both x and s2 are statistics. and the sample variance to match the population variance in expectation. We have found that E[x] = and E[s2 ] = 2 . In both calculations (sample mean and sample variance) we divide by the number of degrees of freedom. we can use all of the data to compute all of the data to compute a sample mean. we want statistics that match. Since we use random samples to learn about the characteristics of the entire population. The random sample has n observations in it. we need to compute the sample mean. n for the sample mean x and n 1 for the sample variance s2 . We want the sample mean to match the population mean in expectation. Is there any intuition behind dividing by n 1 in the sample variance instead of dividing by m? Here is how I think about it.

and it says that the distribution of the sample mean converges to a normal distribution. if the sample mean truly converges to the population mean. The results are often referred to as asymptotic properties. We have two main results. And. The Weak Law says that this is true no matter how small " is.1 Law of Large Numbers Let xn be the sample mean from a sample of size n. To understand this. What is the probability that the sample mean xn is within " of the population mean? As the sample size grows.4. this type of convergence is called almost sure convergence and it is written xn ! a:e: when n ! 1: . the sample mean converges to the population mean. the probability that the sample mean is within " of the population mean should get closer and closer to 1. 15. One is called the Law of Large Numbers.CHAPTER 15. STATISTICS 192 15. regardless of whether the population is normally distributed or not. The basic law of large numbers is xn ! when n ! 1: The only remaining issue is what that convergence arrow means. and it says that as the sample size grows without bound. the sample mean should get closer and closer to the population mean. take any small positive number ". and it is written P xn ! when n ! 1: The Strong Law of Large Numbers states that P n!1 lim xn = = 1: This one is a bit harder to understand. In fact. both concerning the sample mean. It says that the sample mean is almost sure to converge to the population mean. This type of convergence is called convergence in probability. The Weak Law of Large Numbers states that for any " > 0 n!1 lim P (jxn j < ") = 1. The second is the Central Limit Theorem.4 Convergence of random variables In this section we look at what happens when the sample size becomes in…nitely large.

15. 1. the distribution of the standardized sample mean converges to the standard normal distribution. that is.CHAPTER 15. 2 ) denote a normal distribution with mean and variance 2 . 1. 6. Thus we 2 < 1. 1)). Let N ( . STATISTICS 193 15. lim P (Zn z) = (z): In words. 8.2 Central Limit Theorem This is the most important theorem in asymptotic theory. 5. 8. You collect the following sample of size n = 12: Find the sample mean and sample variance. To state it. compute the standardized mean: xn Zn = p E[xn ] V ar(xn ) : 2 We know some of these values: E[xn ] = and V ar(xn ) = get xn p : Zn = = n The Central Limit Theorem states that if V ar(~i ) = x if the population random variable has …nite variance.5 Problems 10. This kind of convergence is called convergence in distribution.4. 10 1. 2. 3. 6. then n!1 =n. Let (z) be the distribution function (or cdf) of the normal distribution with mean 0 and variance 1 (or the standard normal N (0. . 4. and it is the reason why the normal distribution is so important to statistics.

We know from the Central Limit Theorem that the distribution of the sample mean converges to the normal distribution as the sample size grows without bound.CHAPTER 16 Sampling distributions Remember that statistics. The purpose of this chapter is to …nd the distributions for the mean and other statistics when the sample size is …nite. It turns out to be the distribution you get when you square a normally distributed random variable. are themselves random variables. The density function for the chi-square distribution with n degrees of freedom is x n (x=2) 2 1 e 2 f (x) = 2 (n=2) 194 . like the mean and the variance of a random variable. they have probability distributions. So.1 Chi-square distribution The chi-square distribution turns out to be fundamental for doing statistics because it is closely related to the normal distribution. 16. Chi-square random variables can only have nonnegative values.

The density for the chi-square distribution with di¤erent degrees of freedom is shown in Figure 16. One strange thing about the chi-square distribution is its mean and variance.2 0. the thin line has 3. then the random ~ variable x2 has the chi-square distribution with 1 degree of freedom.1: Density for the chi-square distribution. and the dashed line has 5. The relationship between the standard normal and the chi-square distribution is given by the following theorem.CHAPTER 16. The thick line has 1 degree of freedom. 2 We use the notation y ~ n to denote a chi-square random variable with n degrees of freedom. The thick line is the density with 1 degree of freedom.0 0. Theorem 23 If x has the standard normal distribution. ~ . SAMPLING DISTRIBUTIONS y 1. Changing the degrees of freedom radically changes the shape of the density function.0 0 1 2 3 4 x 5 Figure 16.1.6 0. and the dashed line has 5. the number of degrees of n freedom. The mean of a 2 random variable is n.4 0.8 0. and the variance is 2n. the thin line has 3 degrees of freedom.2 195 1. R1 where (a) is the Gamma function de…ned as (a) = 0 y a 1 e y dy for a > 0.

CHAPTER 16. then z = x + y is also normally distributed. and the formula is complete. if x and y are independent normally ~ ~ distributed random variables. Using Leibniz’ s’rule. di¤eren~ tiate this with respect to y to get the density of y : ~ p fy (y) = 2fx ( y) ~ ~ p 1 p 2 y where the last term is the derivative of formula for the normal density to get 1 fy (y) = p e ~ 2 y=2 y with respect to y. From here we can compute Z py Fy (y) = 2 fx (x)dx ~ ~ 0 where fx (x) is the standard normal density. Both the normal distribution and the chi-square distribution have the property that they are additive. But. If ~ ~ ~ x and y are independent chi-square random variables with nx and ny degrees ~ ~ of freedom. respectively. Similarly. SAMPLING DISTRIBUTIONS Proof. we use the notation . 16. Plug in the y=2 y 1=2 (y=2) 1=2 e p = 2 : This looks exactly like the p formula for the density of 2 except for the de1 nominator. 2 ) to denote a random variable that is distrib~ uted normally with mean and variance 2 . then z = x + y is has a chi-square distribution with ~ ~ ~ nx + ny degrees of freedom. (1=2) = . That is. The distribution function for the variable y = x2 is ~ ~ Fy (y) = ~ = = = P (~ y) y P (~2 y) x p p P( y x y) p 2P (0 x y) 196 where the last equality follows from the fact that the standard normal distribution is symmetric around its mean of zero.2 Sampling from the normal distribution We use the notation x N ( .

…rst note that since the sample variance is P (xi x)2 2 s = n 1 2 2 and E[s ] = . x)2 2 n ): 2 n 1: To …gure out the distribution of the sample variance. The next section discusses the sample distribution of s2 when 2 is unknown. 2 ). in which case you don’ really need the sample t 2 variance. 2 . xn are an IID random sample from the general normal distribution N ( . we get that (n 1)s2 2 2 n 1: This looks more useful than it really is.CHAPTER 16. xn be an IID random sample from the standard normal distribution N (0. If x1 . . In order to get it you need to know the population variance. Then 1 (a) x N (0. :::. x Also. n ): P (b) (xi x)2 2 n 1: (c) The random variables x and The above theorem relates to random sampling from a standard normal distribution. The next theorem describes the distribution of the sample statistics of the standard normal distribution. SAMPLING DISTRIBUTIONS 197 2 x ~ ~ n when x has the chi-square distribution with n degrees of freedom. then 2 P (xi x)2 are statistically independent. P (xi N( . Theorem 24 Let x1 . The properties to take away from this section are that the sample mean of a normal distribution has a normal distribution. :::. s . 1). and the sample variance of a normal distribution has a chi-square distribution (after multiplying by (n 1)= 2 ).

The F test is used to test if linear combinations of the coe¢ cients are equal to zero. its density looks like s the normal density but with fatter tails. 2 ). Construct the random variable t x t= : (y=n)1=2 ~ Then t has a t distribution with n degrees of freedom. :::. SAMPLING DISTRIBUTIONS 198 16. 1): = n P 2 We also know that ( (xi x)2 ) = 2 n 1 .3 t and F distributions In econometrics the two most frequently encountered distributions are the t distribution and the F distribution. One uses a t distribution to test whether the individual coe¢ cients b0 . To get a t distribution.1) .t + ::: + bk xk.2. as in Figure 16. If we sample from a standard normal. we get a t distribution when we use n the sample mean in the numerator and something like the sample variance in the denominator. bk are equal to zero. 2 ). If the test rejects the hypothesis. ~ tn N (0. 1) p : 2 =n n Now look back at Theorem 24. 2 =n). If x N ( .CHAPTER 16. bk " in ways we have described earlier in this book. To brie‡ say where they arise. the sample mean has an N (0. Let’ start with the t distribution. :::. and ~t N (0.t + ~t " where t is the observation number. that explanatory variable has a statistically signi…cant impact on the dependent variable. in which case ~ x p N (0. Assume that x and y are ~ ~ ~ according to the formula independent. So. xi. In shorthand. One then estimates the coe¢ cients b0 . The trick is to …gure out exactly what we need. 1) distribution and the sum of the squared deviations has a 2 distribution. Graphically. let x have a standard normal distribution and let y have a chi~ ~ square distribution with n degrees of freedom.t is an observation of the i-th explanatory variable. then we know that x N ( . in y econometrics one often runs a linear regression of the form yt = b0 + b1 x1. Putting this all together yields ~ t= r xp = n P (xi x)2 2 =(n 1) x =p s2 =n : (16.

2 0. Like the t distribution.1) tells us that the following sample statistic . In this case it is the ratio of two chi-square distributions. ~ The random variable t has a t distribution with n 1 degrees of freedom. The thick curve has 1 degree of freedom and the thin curve has 5.4 199 0. The statistic computed in expression (16.3 0.3.n : Applying this to expression (16. Fm.CHAPTER 16. The dashed curve is the standard normal density. It does depend on .1) is commonly referred to as a t-statistic.1 -5 -4 -3 -2 -1 0 1 2 3 4 x 5 Figure 16.2: Density for the t distribution. Also notice that it does not depend on 2 . Because chi-square distributions assign positive probability only to non-negative outcomes. F distributions also assign positive probability only to non-negative outcomes. The formula is 2 m =m . SAMPLING DISTRIBUTIONS y 0. the F distribution is a ratio of two other distributions.n 2 =n n and the density function is shown in Figure 16. ~ The F distribution and the t distribution are related. but we will discuss the meaning of this more in the next chapter. then ~ (tn )2 F1. If tn has the t distribution with n degrees of freedom.

3 0. Notice that x is also the sample frequency.4 Sampling from the binomial distribution Recall that the binomial distribution is used for computing the probability of getting no more than s successes from n independent trials when the probability of success in any trial is p. has an F distribution: )2 ~ (x F = s2 =n F1.7 0.20 .3: Density function for the F distribution. Let xi = 1 ~ if there was a success in trial i and xi = 0 if there was a failure in trial i. the fraction of successes in the sample.4 0.2 0. P P Then n xi is the number of successes in n trials. The sample frequency has the following properties: E[x] = p V ar(x) = p(1 n p) : .5 0. and we can use the random variable x to capture this.8 0.1 0. SAMPLING DISTRIBUTIONS y 0.n 1 : 16.0 0 1 2 3 4 200 x 5 Figure 16.9 0.20 .20 . and the thin curve is F10. A series of random trials is going to result in a sequence of successes or failures.CHAPTER 16. and x = ( xi ) =n is the i=1 average number of successes.6 0. that is. The thick curve is F2. the dashed curve is F5.

CHAPTER

17 Hypothesis testing

The tool of statistical analysis has two primary uses. One is to describe data, and we do this using such things as the sample mean and the sample variance. The other is to test hypotheses. Suppose that you have, for example, a sample of UT graduates and a sample of Vanderbilt graduates, both from the class of 2002. You may want to know whether or not the average UT grad makes more than the national average income, which was about \$35,700 in 2006. You also want to know if the two classes have the same income. You would perform hypothesis tests to either support or reject the hypotheses that UT grads have higher average earnings than the national average and that both UT grads and Vanderbilt grads have the same average income. In general, hypothesis testing involves the value of some parameter that is determined by the data. There are two types of tests. One is to determine if the realized value of is in some set 0 . The other is to compute two di¤erent values of from two di¤erent samples and determine if they are the same or if one is larger than the other.

201

CHAPTER 17. HYPOTHESIS TESTING

202

17.1

Structure of hypothesis tests

The …rst part of testing a hypothesis is forming one. In general a hypothesis takes the form of 2 0 , where 0 is a nonempty set of values for . It could be a single value, or it could be a range of values. The statement 2 0 is called the null hypothesis. The alternative hypothesis is that 2 0 . = We typically write these as H0 (Null hypothesis): 2 0 vs. H1 (Alternative hypothesis): 2 =

0.

The form of the alternative hypothesis is determined completely by the form of the null hypothesis. So, if H0 is = 0 , H1 is 6= 0 . If H0 is > 0 . And so on. 0 , H1 is Another issue in hypothesis testing is that hypotheses can be rejected but they cannot be accepted. So, you can establish that something is false, but not that something is true. Because of this, empirical economists often make the null hypothesis something they would like to be false. If they can reject the null hypothesis, that lends support to the alternative hypothesis. For example, if one thinks that the variance of returns to the Dow-Jones Industrial Average is smaller than the variance of returns to the S&P 500 index, one would form the null hypothesis that the variance is at least as great for the Dow and then try to reject it. When running linear regressions, one tests the null hypothesis that the coe¢ cients are zero. One uses a statistical test to either reject or support the null hypothesis. The nature of the test is as follows. First we compute a test statistic for . Let’ call it T . For example, if is the mean of the population distribution, s T would be the sample mean. As we know, the value of T is governed by a random process. The statistical test identi…es a range of values A for the random variable T such that if T 2 A the null hypothesis is "accepted" and if T 2 A the null hypothesis is rejected. The set A is called the critical region, = and it is important to note that A and 0 are two completely di¤erent things. For example, a common null hypothesis is H0 : = 0. In that case 0 = f0g. But, we do not reject the null hypothesis if T is anything but zero, because then we would reject the hypothesis with probability 1. Instead, we reject the hypothesis if T is su¢ ciently far from zero, or, in our new terminology, if T is outside of the critical region A.

CHAPTER 17. HYPOTHESIS TESTING

203

Statistical tests can have errors because of the inherent randomness. It might be the case that 2 0 , so that the null hypothesis is really true, but T 2 A so we reject the null hypothesis. Or, it might be the case that = 2 0 so that the null hypothesis is really false, but T 2 A and we "accept" = it anyway. The possible outcomes of the test are given in the table below. Value of test statistic T T 2A T 2A = Correctly "accept" null Incorrectly reject null Type I error Incorrectly "accept" null Correctly reject null Type II error

True value of parameter

2 2 =

0

0

A type I error occurs when one rejects a true null hypothesis. A type II error occurs when a false null hypothesis is not rejected. A problem arises because reducing the probability of a type I error generally increases the probability of a type II error. After all, reducing the probability of a type I error means rejecting the hypothesis less often, whether it is true or not. Let F (zj ) be the distribution of the test statistic z conditional on the value of the parameter . The entire previous chapter was about these distributions. If the null hypothesis is really true, the probability that the null is "accepted" is F (Aj 2 0 ). This is called the con…dence level. This probability of a type I error is 1 F (Aj 2 0 ). This probability is called the signi…cance level. The standard is to use a 5% signi…cance level, but 10% and 1% signi…cance levels are also reported. The 5% signi…cance level corresponds to a 95% con…dence level. I usually interpret the con…dence level as the level of certainty with which the null hypothesis is false when it is rejected. So, if I reject the null hypothesis with a 95% con…dence level, I am 95% sure that the null hypothesis is really false. Here, then, is the "method of proof" entailed in statistical analysis. Think of a statement you want to be true. Make this the alternative hypothesis. The null hypothesis is therefore a statement you would like to reject. Construct a test statistic related to the null hypothesis. Reject the null hypothesis if you are 95% sure that it is false given the value of the test statistic.

CHAPTER 17. HYPOTHESIS TESTING

204

For example, suppose that you want to test the null hypothesis that the mean of a normal distribution is 0. This makes the hypothesis H0 : H1 =0 vs. : 6= 0

You take a sample x1 ; :::; xn and compute the sample mean x and sample variance s2 . Construct the test statistic x t= p (17.1) s2 =n

which we know from Chapter 16 has a t distribution with n 1 degrees of freedom. In this case is the hypothesized mean, which is equal to zero. Now we must construct a critical range A for the test statistic t. Our critical range will be an interval around zero so that we reject the null hypothesis if t is too far from zero. We call this interval the 95% con…dence interval, and it takes the form (tL ; tH ). Let Tn 1 (t) be the distribution function for the t distribution with n 1 degrees of freedom. Then the endpoints of the con…dence interval satisfy Tn 1 (tL ) = 0:025 Tn 1 (tH ) = 0:975 The …rst line says that the probability that t is lower than tL is 2.5%, and the second says that the probability that t is higher than tH is 2.5%. Combining these means that the probability that the test statistic t is outside of the interval (tL ; tH ) is 5%. All of this is shown in Figure 17.1. The probability that t tL is 2.5%, as shown by the shaded area. The probability that t tH is also 2.5%. The probability that t is between these two points is 95%, and so the interval (tL ; tH ) is the 95% con…dence interval. The hypothesis is rejected if the value of t lies outside of the con…dence interval. Another way to perform the same test is to compute Tn 1 (t), where t is the test statistic given in (17.1) above. Reject the hypothesis if Tn 1 (t) < 0:025 or Tn 1 (t) > 0:975: If the …rst of these inequalities hold, the value of the test statistic is outside of the con…dence interval and to the left, and if the one on the right holds

CHAPTER 17. HYPOTHESIS TESTING
y

205

2.5% area

2.5% area

tL

0 95% confidence interval

tH

x

Figure 17.1: Two-tailed hypothesis testing

the test statistic is outside of the con…dence interval and to the right. The null hypothesis cannot be rejected if 0:025 Tn 1 (t) 0:975.

17.2

One-tailed and two-tailed tests

The tests described above are two-tailed tests, because rejection is based on the test statistic lying in one of the two tails of the distribution. Twotailed tests are used when the null hypothesis is an equality hypothesis, so that it is violated if the test statistic is either too high or too low. A one-tailed test is used for inequality-based hypotheses, such as the one below: H0 : H1 0 vs. : <0

In this case the null hypothesis is rejected if the test statistic is both far from zero and negative. Large test statistics are compatible with the null hypothesis as long as they are positive. This contrasts with the two-sided tests where large test statistics led to rejection regardless of the sign.

the desired signi…cance level.025. while the one-tailed criterion puts the whole 5% in the lower tail. with Tn 1 (t) either close to zero or close to one. compute the test statistic given in (17.1). we reject the null hypothesis at the 95% con…dence level (or 5% signi…cance level) if Tn 1 (t) < 0:05: There are two di¤erences between the one-tailed criterion for rejection and the two-tailed criterion in the previous section. while the two-tailed criterion has a lower cuto¤ point half as big at 0. .05. The following table gives the rules for the one-tailed and two-tailed tests with signi…cance level and con…dence level 1 . Letting Tn 1 (t) be the cdf of the t distribution with n 1 degrees of freedom. We know from Chapter 16 that it has the t distribution with n 1 degrees of freedom. H1 : 6= 0 H0 : 0 Upper one-tailed vs. The test statistic is z with distribution function G(z). while the two-tailed criterion can be satis…ed two ways. and the hypotheses concern some parameter . The reason for this is that the two-tailed test splits the 5% probability mass equally between the two tails. H1 : < 0 Type of test Reject H0 if G(z) < 2 or G(z) > 1 2 G(z) > 1 p-value 2[1 G(jzj)] 1 G(z) G(z) < G(z) The p-value can be thought of as the exact signi…cance level for the test. The null hypothesis is rejected if the p-value is smaller than . The second di¤erence is that the one-tailed criterion has a cuto¤ point of 0. HYPOTHESIS TESTING 206 To test the null hypothesis that the mean of a normal distribution is nonnegative. H1 : > 0 H0 : 0 Lower one-tailed vs. One di¤erence is that the one-tailed criterion can only be satis…ed one way. Hypothesis H0 : = 0 Two-tailed vs. with Tn 1 (t) small.CHAPTER 17.

7% sure that the null hypothesis is false.27. 42. degrees of freedom. 66. we can reject the null hypothesis at the 5% signi…cance level. (If the sample mean had been above 65. 64. We cannot reject this null hypothesis. In fact.00324 Thus. 47. 68. 24) = 0:99838: The p-value is 2(1 0:99838) = 0:00324. 79. The sequence is: 56.) This is a one-tailed test based on the same test statistic which we have already . 56. Now test the null hypothesis that = 60.3 17. We get x t= p s2 =n = 58:73 65 = 9:59=5 3:27: The next task is to …nd the p-value using T24 (t). which is the same answer we got from Excel.1 Examples Example 1 The following sequence is a random sample from the distribution N ( . but it only allows positive values of t.3. 54. 56. 51. 2 ). Maple allows for both positive and negative values of t. 48. there is no way we could reject the hypothesis.514. 49. 75. degrees of freedom)] TDist(3:27. The task is to test hypotheses about . number of tails) = TDIST(3. 72.CHAPTER 17. 51. This is a two-tailed test based on the statistic in equation (17. So. 2) = 0. the p-value can be found using the formula 2 [1 TDist( jtj . use the command = TDIST( jtj . we are 99. Test the null hypothesis that = 65. 24. so we should still do the test. 74. HYPOTHESIS TESTING 207 17. This time the test statistic is t = 0:663 which yields a p-value of 0. the t distribution with n 1 = 24 degrees of freedom. 62. 57. 55. Using the table above. Excel allows you to do this. 55. and n = 25. s = 9:59. which is less than 65. We compute x = 58:73. 48. 64.1). What about the hypothesis that 65? The sample mean is 58. 59.73. 61.

so we can reject the hypothesis. We have to change the Excel command to reduce the number of tails: =TDIST(3.975. 99) = 0:992 97 and we reject if this number is either less than 0. 24. and we can …nd TDist(2:5. Another way to see it is by computing the p-value p = 2(1 TDist(2:5. as you can …gure out by looking at the above table. 99)) = 0:014 which is much smaller than the 0. 17. You compute the sample mean and sample variance.975.3 Example 3 Use the same information as example 2.3. HYPOTHESIS TESTING 208 computed.27. 17.CHAPTER 17. Notice. It is greater than 0. 1) = 0. however.2 Example 2 You draw a sample of 100 numbers drawn from a normal distribution with mean and variance 2 . This is as it should be.05 required for rejection. but instead test the null hypothesis H0 : = 10:4: .025 or greater than 0. and they are x = 10 and s2 = 16. t = 3:27.00162 Once again we reject the hypothesis at the 5% level. The t-statistic has 100 1 = 99 degrees of freedom. that the p-value is twice what it was for the two-tailed test.3. We must plug it into the appropriate t distribution. The null hypothesis is H0 : =9 Do the data support or reject the hypothesis? Compute the t-statistic t= 10 9 x p =p p = 2:5 s= n 16= 100 This by itself does not tell us anything.

17. Do the data support or reject the = 30. 71 39. 42. It is not. 17. 98. 19. Consider the following random sample from a normal distribution with mean and standard deviation 2 : 134. 43. 34. 44. 67. 13. so we cannot reject the hypothesis. 99)) = 0:32 which is much larger than the 0. 37. 20. 115. 2. 55. 53. 149. 9 44. 30. 65. 21. 99. 86: (a) Test the hypothesis that hypothesis? (b) Test the hypothesis that (c) Test the hypothesis that (d) Test the hypothesis that = 0. 71. HYPOTHESIS TESTING Do the data support or reject this hypothesis? Compute the t-statistic t= We can …nd TDist( 1. 75 78. 47. 51. 61 . 114. 100.975. 26. 98. 30. 97. 26. 9. 4.CHAPTER 17. 12. Answer the following questions based on this random sample generated from a normal distribution with mean and variance 2 . 38. 65. 47. 99) = 0:16 10 10:4 x p =p p = s= n 16= 100 1:0 209 :and we reject if this number is either less than 0. Another way to see it is by computing the p-value p = 2(1 TDist(1.4 Problems 1.025 or greater than 0. 89. 24. 122. 38. 32. 54.05 required for rejection. 87. 52.

Do the data support or reject .CHAPTER 17. Do the data support or reject = 60. HYPOTHESIS TESTING (a) What are the best estimates of (b) Test the hypothesis that the hypothesis? (c) Test the hypothesis that the hypothesis? and 2 210 ? = 40.

CHAPTER 18 Solutions to end-of-chapter problems Solutions for Chapter 2 1. (a) f 0 (x) = 24x (b) g 0 (x) = (c) h0 (x) = 1 4x3 ln 3x = (1 2)=(3x2 x 2 ln 3x)=(4x3 ) 2x + 1)5 4(6x x)e (d) f 0 (x) = (1 211 . (a) f 0 (x) = 72x2 (x3 + 1) + (b) f 0 (x) = (c) f 0 (x) = (d) f 0 (x) = (e) f (x) = 0 20 (4x 2)6 3 x + 20 x5 e2x 9 14x3 (42x2 ln x ax2 ) 2) 2a d xcx x1: 3 c (d cx)2 2: 7 x1: 3 (b 24 1 2x3 2.

Then f 0 (x) = lim = = = = f (x + h) f (x) h!0 h 2 (x + h) x2 lim h!0 h x2 + 2xh + h2 x2 lim h!0 h 2 2xh + h lim h!0 h lim (2x + h) h!0 = 2x: 4. so it is decreasing.CHAPTER 18. (a) Compute f 0 (3) = (b) Compute f 0 (13) = (c) Compute f 0 (4) = (d) Compute f 0 (2) = 18 < 0. so it is decreasing. so it is increasing. . so it is increasing. > 0. Let f (x) = x2 . 4 5:0e 9 16 < 0. 1 13 1 x2 > 0. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 212 (e) g 0 (x) = 9 (9x 8)2 2x2 3 p 5x3 + 6 4 x 9x 8 p 5x3 + 6 15 x2 2x2 3 p 2 9x 8 5x3 + 6 3. Use the formula f (x + h) h!0 h lim f (x) = lim 1 x+h 1 x x+h x(x+h) h!0 h x x(x+h) = lim = lim h!0 h x(x+h) h h h!0 h h!0 xh(x + h) 1 = lim = h!0 x(x + h) = lim 5.

. Also. 3 (c) The FOC is f 0 (x) = 4 x = 0 so x = 3=4. It is decreasing. Compute 0:3 f 00 (141=0:3 ) = 2: 721 7 10 4 < 0. 3 8. f 00 (x) = 20=x2 < 0. so this is a minimum. (c) The foc is f 0 (x) = x+1 (x + 2)2 1 + 6 = 0. It is 1 x ln2 x and f 0 (e) = < 0. (b) f 0 (x) = 3 x2 +4x (3x 2) (x2x+4 2 and f 0 ( 1) = 2 +4x) 1 e 1 9 > 0. (a) The foc is f 0 (x) = 8x 24 = 0. (c) f 0 (x) = 10x + 16 and f 0 ( 6) = 7. and f 00 ( 13 ) = 432 while 6 f 00 ( 11 ) = 432. Compute f 00 (3=4) = 16 > 0. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 213 6. (b) The FOC is f 0 (x) = x84 6 = 0 so x = 141=0:3 . so this is a maximum. (a) f 0 (x) = increasing. 10. Compute f 00 (5=4) = 8 < 0. 6 9. x+2 which has two solutions: x = 13 and x = 11 . (a) The FOC is f 0 (x) = 10 8x = 0 so x = 5=4. which is solved when x = 3. f 00 (x) = 8 > 0.CHAPTER 18. (a) Compute f 00 (x) = 2a. so this is a maximum. 44 < 0. The second 6 6 x+1 2 derivative is f 00 (x) = (x+2)2 2 (x+2)3 . Also. so it is a minimum. Need a < 0: (b) Need a > 0. (b) The foc is f 0 (x) = 20=x 4 = 0. which is solved when x = 5. so it is a maximum. It is decreasing. (a) The problem is max b(m) m c(m) The FOC is b0 (m) = c0 (m) Interpretation is that marginal bene…t equals marginal cost. The function has a local maximum when x = 13 6 6 and a local minimum when x = 11 .

It produces W = 20(1)1=2 = 20 widgets and G = 30(59) = 1770 gookeys. y y = 135. x x = 65. (b) . (a) (26. and (x + y) (x + y) = 354. 25) (b) x (c) 77 (d) Yes. 11. 48. (d) c00 (m) 0. 8. which is smaller than 65 + 26 = 13: 161. This is diminishing marginal bene…t and increasing marginal cost. and (x + y) (x + y) = p 77 = 8: 775 0. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 214 (b) We need b00 (m) c00 (m) 0. y y = 26 p 5: 099. We have p p p p x x + y y = 65 + 135 = 19:681 and p (x + y) (x + y) = p 354 = 18: 815: y and x < y and x y 2. Solutions for Chapter 3 1. Bilco devotes 1 unit of labor to widget production and the other 59 to gookey production. 33. or increasing marginal e¤ort cost.CHAPTER 18. (c) The problem is max wm m c(m) The FOC is w = c0 (m) Interpretation is that marginal e¤ort cost equals the wage. (a) (18. 20) 7 p p p p p x = (c) p x = 65 = 8: 062 3. 3. The …rst-order condition is 0 90 (L) = p L 90 = 0: Solving for L gives us L = 1. A better way is to assume b00 (m) 0 and c00 (m) 0.

SOLUTIONS TO END-OF-CHAPTER PROBLEMS 215 3. y) = 8x (b) fy (x. y) = 16y (b) fy (x.CHAPTER 18. 8 4 12y + 18 12x 4. (a) (q) = p(q)q cq = 120q 4q 2 3=x = 2=y cq 3y 2x (b) The FOC is 120 8q q c = 0 = 15 c 8 1=8 < 0 (c) Using the answer to (b). Plugging this into the second expressions 4 yields 16x 2 = 0 y2 16x = 2 ( 1 )2 4 = 32 x = 2 1 The critical point is (x. we have dq =dc = . y) = 16x 4: 2=y 2 . y) = 6y (c) 9 9 . The …rst one s implies that y = 1 . (a) fx (x. (c) The two foc’ are 16y 4 = 0 and 16x 2=y 2 = 0. (a) fx (x. (a) 3 ln x + 2 ln y = k (b) Implicitly di¤erentiate 3 ln x + 2 ln y(x) = k with respect to x to get 3 2 dy + =0 x y dx dy = dx 6. 4 ). 5. y) = (2.

CHAPTER 18. which is why this works. (a) Implicitly di¤erentiate to get 30x dx dx + 3a + 3x da da 5 dx 5x + = 0: a da a2 Solving for dx=da yields 30x + 3a 5 a dx = da dx = da 5x a2 3x + 5x a2 30x + 3a 3x + 5 a = x 3a2 + 5 a 3a2 + 30xa 5 (b) Implicitly di¤erentiate to get 12xa dx + 6x2 = 5 da 5a2 dx da 10xa: Solving for dx=da yields 12xa + 5a2 dx = 5 10xa 6x2 da dx 5 10xa 6x2 = da 12xa + 5a2 . Note that q is also the partial derivative of (q) = p(q)q cq with respect to c. 7. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 216 (d) Plug q = 15 c 8 into (q) to get c 8 4 15 900 + 15c c2 16 c 8 2 (q) = 120 15 = 1800 = 900 (e) Di¤erentiating yields 0 c 15 c2 16 15c + c 8 c2 8 15c 15c + (c) = 15 + c 8 (f) Compare the answers to (b) and (e).

. (a) Implicitly di¤erentiate to …nd dK=dL: FK (K. This happens for two reasons. (a) The foc is 30 p L and solving it for L gets p 30 = w L p 30 L = w 900 L = w2 (b) Since L = we have 900 w2 w=0 dL 1800 = <0 dw w3 The …rm uses fewer workers when the wage rises. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 217 8. L) = dL FK (K. and the other is that it uses fewer workers (see part b) and produces less output. 9. L) FL =FK < 0 and the (b) Both FL and FK are positive. One is that the …rm must pay workers more.CHAPTER 18. so dK=dL = isoquant slopes downward. (c) Plugging L into the pro…t function yields r 900 900 900 = 30 4 w = 2 2 w w w and from there we …nd d = dw 900 < 0: w2 Pro…t falls when the wage rises. L) dK + FL (K. L) = 0 dL dK FL (K.

SOLUTIONS TO END-OF-CHAPTER PROBLEMS 218 Solutions for Chapter 4 1. y. Here is another. Set up the Lagrangian L(x. There are many ways to do this. Solve the third equation for x: 120 2x 4y = 0 x = 60 2x 4y): 2y Substitute this into the …rst two equations 24xy 4 48x2 y 3 to get 24(60 2y)y 4 48(60 2y)2 y 3 Multiply the top equation by equation to get 48(60 The terms with 2y)2 y 3 4 2 4 = 0 = 0 2 4 = 0 = 0 2 and add the result to the second 48(60 2y)y 4 4 =0 in them cancel out. so now we solve for the values of x. y. and . and we are left with 48(60 2y)2 y 3 48(60 2y)y 4 = 0 . and one of them can be found on page 39.CHAPTER 18. ) = 12x2 y 4 + (120 The foc’ are s @L = 24xy 4 2 = 0 @x @L = 48x2 y 3 4 = 0 @y @L = 120 2x 4y = 0 @ This is three equations in three unknowns.

CHAPTER 18. 000: 2y = 20 Substituting Plugging into the earlier expressions. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 219 Divide both sides by 48(60 (60 2y)y 3 to get 2y) y = 0 60 3y = 0 y = 20 Substitute this back into things we know to get x = 60 and = 12(60 2. The Lagrangian is L(a. b. into the third equation gives us 400 12 3 12 14 400 2 = 0 14 3 2 = 0 400 = = 5 5 1 = 400 80 12a 14b) 2y)y 4 = 12(20)(204 ) = 38. 3 240 3 = = = 20 a= 12 12=80 12 and b= 2 2 160 80 = = = : 14 14=80 14 7 . ) = 3 ln a + 2 ln b + (400 The FOCs are 3 @L = 12 = 0 @a a 2 @L = 14 = 0 @b b @L = 400 12a 14b = 0 @ Solving the …rst two yields a = 3=12 and b = 2=14 . 400.

so the other two FOCs simplify to = 64x and 4 = y: 3 Setting these equal to each other gives us 4 y = 64x 3 y = 48x: Plugging this into the third FOC yields x1=4 y 3=4 = 1 x1=4 (48x)3=4 = 1 x = We can then solve for y = 48x = 2 31=4 and = 8 31=4 4y = : 3 3 1 31=4 = : 483=4 24 x1=4 y 3=4 ): .CHAPTER 18. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 220 3. y. ) = 16x + y + (1 The FOCs are @L 1 y 3=4 x1=4 y 3=4 = 16 =0 = 16 @x 4 x 4 x 1=4 3 x 3 x1=4 y 3=4 @L = 1 =1 =0 @y 4 y 4 y @L = 1 x1=4 y 3=4 = 0 @ From the third FOC we know that x1=4 y 3=4 = 1. The Lagrangian is L(x.

Set up the Lagrangian L(x. ) = 5x + 2y + (80 The foc’ are s @L = 5 3 2y = 0 @x @L = 2 2x = 0 @y @L = 80 3x 2xy = 0 @ 3x 2xy) . Set up the Lagrangian L(x.CHAPTER 18. ) = 3xy + 4x + (80 The foc’ are s @L = 3y + 4 4 = 0 @x @L = 3x 12 = 0 @y @L = 80 4x 12y = 0 @ The …rst equation reduces to y = 4( 1)=3 and the second equation tells us that x = 4 . y. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 221 4. y. Substituting these into the third equation yields 80 4x 12y = 12(4)( 1)=3 = 96 32 = = 0 0 0 3 4x 12y): 80 4(4 ) Plugging this into the equations we already derived gives us the rest of the solution: x = 4 = 12 y = 4( 1)=3 = 8=3: 5.

so x = 4.CHAPTER 18. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 222 5 80 3 2 3x 2y = 0 2x = 0 2xy = 0 Now we solve these equations. Substituting into the other two equations yields y= and = 80 2x 3x = 17 2 1 1 = : x 4 . The third one reduces to 80 3x 2xy = 0 2xy = 80 80 y = 3x 3x 2x and the second one reduces to 2 2x = 0 1 : = x Substitute these into the …rst one to get 5 3 80 3x 2x 2y 1 x = 0 = 0 5 3 1 x 2 Multiplying through by x2 yields 5x2 3x 80 + 3x 5x2 x2 x = 0 = 80 = 16 = 4 Note that we only use the positive root in economics.

especially since L is only equal to 4. This is questionable. is why the lame problem is useful. (a) 0 (x) = 400 + 4x > 0. ) = 30 4L 00 (c) The Lagrangian is 10L + (4 L) . so is increasing and more land is better. But. so this works.CHAPTER 18. (b) (L) = 15=L3=2 which is negative. 00 (c) The Lagrangian is L(x. We would hope for an upward-sloping pro…t function. Pro…t is maximized subject to the constraint. (f) No. Note that the answers to (d) and (e) are the same. which seems sensible when the farm starts o¤ small. (b) (x) = 4. ) = 400x + 2x2 + (10 The FOCs are @L = 400 + 4x @x @L = 10 x = 0 @ =0 x): The second one tells us that x = 10 and the …rst one tells us that = 400 + 4x = 440: (d) It is the marginal value of land. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 223 6. p 7. (a) 0 (L) = 30= L 10 which is positive when L < 9. p L(L. constrained optimization will require a di¤erent set of second order conditions than unconstrained optimization does. by the way. remember that 0 (x) > 0. This. Pro…t grows at a decreasing rate. It says that increasing the size of the farm leads to increased pro…t. it could arise because of increasing returns to scale or because of …xed inputs. (e) That would be 0 (10) = 440. which makes sense. Obviously.

(a) The Lagrangian is L(x. (e) This is 0 p (4) = 30= 4 10 = 5: Note that this matches the answer from (c). SOLUTIONS TO END-OF-CHAPTER PROBLEMS 224 The foc’ are s @L 30 = p 10 =0 @L L @L = 4 L=0 @ The second equation can be solved to get L = 4. y. The …rst derivative of the pro…t function is positive (and equal to 5) when L = 4. which means that pro…t is increasing when L is 4.CHAPTER 18. ) = x y 1 The FOCs are @L = x @x @L = (1 @y @L = M @ 1 1 + (M px x py y): y px = 0 py = 0 py y = 0: )x y px x . In this case the objective function is the pro…t function. Plugging L = 4 into the …rst equation yields 30 p 10 = 0 L 30 p 10 = 0 L 5 = (d) The Lagrange multiplier is always the marginal value of relaxing the constraint. 8. so the Lagrange multiplier measures the marginal pro…t from adding workers. (f) No. The second derivative does not tell us whether we are at a maximum or minimum when there is a constraint. where the value comes from whatever the objective function measures. and the constraint is on the number of workers the …rm can use at one time.

SOLUTIONS TO END-OF-CHAPTER PROBLEMS 225 Rearrange the …rst two to get y 1 x px (1 ) py Set them equal to each other to get y 1 x x y = = : (1 = = (1 ) x y y x 1 px y x y (1 = x (1 y = py ) px py ) px py ) px x: py Now substitute this into the budget constraint to get px x + py y = M (1 ) px px x + py x = M py (1 ) px x + px x = M px x = M ) 1 + (1 M x = : px ) px x py ) px M py px )M : = M Substituting this back into what we found for y yields y = = = (1 (1 (1 py .CHAPTER 18.

Plugging this into the …rst 79 = 0 p 90 = 90 w w = 1 The third equation then implies that g = 60 w = 59. 9. ) = (9)(20w1=2 ) + (3)(30g) The foc’ are s 1 @L = 90w 2 11 =0 @w @L = 90 11 =0 @g @L = 60 w g = 0 @ = >0 px 1 = > 0: py (11)(w + g) + (60 w g): The second equation says that equation yields 90 p w 11 = 79. g. these are easy. @x @M @y @M (c) Again. The Lagrangian is L(w. These are the same as the answers to the question 4 on the …rst homework. @x M = <0 @px p2 x @y = 0: @px The demand curve for good x is downward-sloping. (a) Denote labor devoted to widget production by w and labor devoted to gookey production by g.CHAPTER 18. . and it is independent of the price of the other good. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 226 (b) These are easy.

and is the marginal impact of lengthening the shortest side while keeping the total . 2L + 2W = F W = S (b) The Lagrangian is L(L. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 227 (b) The Lagrange multiplier is the marginal value of adding workers.t.W s. The marginal impact on area comes directly from the Lagrange multipliers. is the marginal impact of having a longer fence while keeping the shortest side …xed. W. (a) The farmer’ problem is s max LW L. .CHAPTER 18. 10. ) = LW + (F The …rst-order conditions are @L @L @L @W @L @ @L @ = W = L = F = S 2 =0 2 2L =0 2W = 0 2L 2W ) + (S W ): W =0 We must solve this set of equations: W = S (from fourth equation) L = F=2 S (from third equation) = S=2 (from second equation) = F=2 2S (from …rst equation) (c) It depends.

2x + 3y = 24 4x + y = 20 This is easy to solve because there is only one point that satis…es both constraints: (x. (a) The solution to the alternative problem is (x. 20 ). so the …rst constraint does not hold. and so 1 > 0. part (b) shows us that the …rst constraint must also bind. Note 3 that 4 8 + 8 = 34 2 > 20. in which case 2 > 0.CHAPTER 18. keeping the total fence length …xed. Now …nd the Lagrange 5 5 . Note 3 3 that 2 10 + 3 20 = 80 > 24. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 228 fence length constant. the second constraint would have been nonbinding and its Lagrange multiplier would have been zero. the problem becomes maxx. S=2 or F=2 2S.y x2 y s. so the second constraint must bind. though. 3 3 (b) The solution to the alternative problem is (x. We can …nd S=2 5S=2 S F=2 2S F=2 F=5: When the shortest side is more than one-…fth of the total amount of fencing. y) = ( 18 . y) = ( 10 . 8 ). so the second constraint does not hold. Solutions for Chapter 5 1. Similarly. the farmer would rather lengthen the fence than lengthen the shortest side. she would rather lengthen that side. 3 3 3 (c) If the solution to the alternative problem in (a) had satis…ed the second constraint. y) = (8. We want to know which is greater. This is not what happened. (d) Because both constraints bind. When the shortest side is smaller than a …fth of the fence lenght. 28 ).t.

SOLUTIONS TO END-OF-CHAPTER PROBLEMS 229 multipliers. Plug those values into the …rst two to get two equations in two unknowns: 2 3 1 4 1 2 = + 2) 2 1008 25 324 = 25 The solution to this is ( 1 . y) = (8. y) = (8.y x2 y s. so the …rst constraint does not hold. (b) The solution to the alternative problem is (x. 3 ). y) = (6. plug 3 x = 8 and y = 8 into the …rst equation to get the same thing. (d) Because only the …rst constraint binds. 1188 ). the problem becomes maxx. Or. (a) The solution to the alternative problem is (x. so the second constraint does hold this time. 2x + 3y = 24 8 We know from part (a) that (x. Therefore 1 > 0 and 2 = 0. = ( 144 . The FOCs for the equality-constrained problem are 2xy 2 1 4 2 x2 3 1 2 24 2x 3y 20 4x y = = = = 0 0 0 0 We already used the last two to …nd x and y. (c) The solution to the alternative problem in (a) satis…es the second constraint.t.CHAPTER 18. 3 ). We also know that To …nd 1 use the FOCs for the equality-constrained 2 = 0. 125 125 8 2. 12). Note that 2 6 + 3 12 = 48 > 24. so the second contrainst is nonbinding. 3 . problem: 2xy 2 1 = 0 x2 3 1 = 0 24 2x 3y = 0 Plug x = 8 into the second equation to get 1 = 64 . Note 2 8 that 4 8 + 3 = 34 3 < 36.

(b) Setting 1 45 2 = 9 2 + 63 4 = 153 4 < 45 and the second constraint = 0 in the original Lagrangian.y= . (a) Setting conditions 2 = 0 in the original Lagrangian we get the …rst-order @L = 4y @x @L = 4x @y @L = 36 @ 1 6x 4 x 1 1 =0 =0 4y = 0 We solve these for x. 2= 13 13 13 45 720 765 We then have x + 4y = 13 + 13 = 13 > 36 and the …rst constraint is not satis…ed. x= (c) Part (a) shows that we can get a solution when the …rst constraint binds and the second doesn’ and part (b) shows that we cannot t. 2 8 We then have 5x + 2y = is satis…ed. So.CHAPTER 18. with 9 63 x= . and 1 and get 1 63 9 x= . the foc’ are s @L = 4y @x @L = 4x @y @L = 45 @ 2 6x 2 5x 2 5 =0 2 =0 2y = 0 45 180 90 . 2 2 = 0: . 2 8 1 The solution is 9 = . SOLUTIONS TO END-OF-CHAPTER PROBLEMS 230 3.y= .y= . get a solution when the second constraint binds but the …rst does not. y. the answer comes from part (a).

(b) The foc’ are s 20 13 . they must both bind. (a) The foc’ are s @L = 3y @x @L = 3x @y @L = 24 @ 1 The solution is 100 3 8 4 x 1 1 =0 =0 4y = 0 x= and since 5x + 2y = satis…ed.y= . SOLUTIONS TO END-OF-CHAPTER PROBLEMS 231 4. The place where they both bind is the intersection of the two "budget lines." or where the following system is solved: x + 4y = 24 5x + 2y = 30 .CHAPTER 18. (c) Since when one constraint binds the other fails.y= . 2= 15 6 10 > 24 the …rst constraint is not satis…ed. 1=5 3 3 26 + 3 = 42 > 30 the second constraint is not @L = 3y @x @L = 3x @y @L = 30 @ 2 3y 8 2 5x 2 5 2 =0 =0 2y = 0 8 5 2 = 0 3x 2 2 = 0 30 5x 2y = 0 The solution is 189 5 x= and since x + 4y = 37 53 37 .

The other. and so > 0: The KuhnTucker conditions reduce to 2xy x2 42 4x 4 = 0 2 = 0 2y = 0 . y = 5. 9 (c) First notice that the objective function is x2 y. Now we have to …nd the values of s 1 and 2 . ) = x2 y + [42 (b) x @K @x @K y @y @K @ = x(2xy = y(x2 = (42 x. budget-like constraint is binding because x2 y is increasing in both arguments. neither x 0 nor y 0 can be binding. y. The solution is = 23 9 and 2 = 8. y. 4 )=0 2 )=0 4x 0 2y) = 0 4x 2y]: and 8 12 4 1 1 5 2 2 2 = 0 = 0 1 1 2. which is zero if either x or y is zero.CHAPTER 18. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 232 The solution is x = 4. go back to the foc’ for the entire original Lagrangian: @L = 3y 8 5 2=0 1 @x @L = 3x 4 1 2 2 = 0 @y @L = 24 x 4y = 0 @ 1 @L = 30 5x 2y = 0 @ 2 Plug the values for x and y into the …rst two equations to get 15 and solve for 5. (a) K(x. To do this. Consequently.

y = 12. and = 40 from the …rst equation. and (iii) both x and y are positive. and = 60 from the second equation.CHAPTER 18. Case (ii): y = 0. We get y + 40 x + 60 12 x = 0 = 0 y = 0 4. which is legal since x > 0. The …nal solution is case (i): x = 0. Divide both sides of the …rst K-T condition by x. 49 ). (ii) x is positive and y is zero. 7. divide the second by y. Then x = 12 from the third equation. 2 6. y = 16. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 233 Solving yields (x. (a) K(x. The value of the objective function is 480. The only thing to do is try them one at a time. though. = 60. Case (i): x = 0. y. The solution to this system of equations is x = This is not allowed. This case is not as good as case (i). Case (iii): x. ) = xy + 40x + 60y + (12 (b) x @K @x @K y @y @K @ = x(y + 40 = y(x + 60 = (12 x. . y. Then y = 12 from the third equation. The value of the objective function is xy + 40x + 60y = 720. y. because we can identify three potential solutions: (i) x is zero and y is positive. and divide the third by . because x < 0. x 0 )=0 )=0 y) = 0 x y) (c) This one is complicated. = 56. ) = (7. y > 0. so it cannot be the answer.

(a) 9 0 (d) 39 0 1 1 2 2 2 A 36 1 1 23 32 A 11 1 7 0 (b) @ 12 14 A 0 9 (c) (d) 84 33 7 13 32 11 36 3. In matrix form the system of equations is 0 10 1 0 1 6 2 3 x 1 @ 2 4 1 A@ y A = @ 2 A: 3 0 1 z 8 .CHAPTER 18. (a) 21 (b) 12 5. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 234 Solutions for Chapter 6 1. (a) 14 (b) 134 4. (a) 19 2 16 3 22 61 1 5 (b) 0 10 0 13 18 7 @ 19 (c) 23 14 @ 21 2.

CHAPTER 18. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 235 Using Cramer’ rule we get s 0 6. In matrix form the system 0 5 2 @ 3 1 0 3 1 1 2 3 1 A det @ 2 4 8 0 1 80 0 1 = x= = 40 2 6 2 3 1 A det @ 2 4 3 0 1 0 1 6 1 3 2 1 A det @ 2 3 8 1 97 0 1= y= 2 6 2 3 @ 2 4 A 1 det 3 0 1 0 1 6 2 1 2 A det @ 2 4 3 0 8 224 0 1= z= = 112 2 6 2 3 1 A det @ 2 4 3 0 1 of equations is 10 1 0 1 1 x 9 0 A@ y A = @ 9 A: 2 z 15 0 1 1 0 A 2 60 1 = 11 1 A 0 2 Using Cramer’ rule we get s 9 2 @ 9 1 det 15 3 0 x= 5 2 @ 3 1 det 0 3 .

(a) 1 14 1 1 1 A 1 4 2 0 1 5 4 3 1 5 A (b) 25 @ 0 15 0 5 10 Solutions for Chapter 7 1. and so there is a unique solution. (a) 1 8 1 2 0 3 2 2 2 0 1 4 1 1 (b) 2 @ 1 1 8. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 236 5 @ 3 det 0 0 y= 5 det @ 3 0 0 5 @ 3 det 0 0 z= 5 @ 3 det 0 0 9 9 15 2 1 3 2 1 3 2 1 3 1 1 0 A 2 81 1= 11 1 0 A 2 1 9 9 A 15 39 1 = 11 1 A 0 2 7.CHAPTER 18. (a) The determinant of the matrix 0 3 6 @ 2 0 1 1 is 1 0 5 A 1 33. .

0 1 4 1 8 160 @ 17 8 10 200 A 3 2 2 40 1 0 160 4 1 8 15 @ 0 480 A 24 4 5 8 160 0 4 0 1 4 1 8 160 15 @ 0 24 480 A 4 0 0 0 0 Since the bottom row is zeros all the way across. get the augmented matrix in row-echelon form. and so there is not a unique solution.CHAPTER 18. there are in…nitely many solutions. (c) The determinant of the matrix 0 1 2 3 0 @ 3 0 5 A 2 6 10 is 0. and so there is not a unique solution. To …nd out whether there is no solution or an in…nite number. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 237 (b) The determinant of the matrix 0 1 4 1 8 @ 17 8 10 A 3 2 2 is 0. To …nd out whether there is no solution or an in…nite number. get the augmented matrix in row-echelon form. 0 1 2 3 0 6 @ 3 0 5 15 A 2 6 10 18 Multiply the …rst row by 2 and add it to the third row: 0 1 2 3 0 6 @ 3 0 5 15 A 6 0 10 30 .

there is no solution. (e) The determinant of the matrix 0 6 1 @ 5 2 0 1 is 1 1 2 A 2 3 and subtract it from the bottom 1 1 8 30 0 2 20 A 0 0 10 27. so there is an in…nite number of solutions. To …nd out whether there is no solution or an in…nite number. 0 1 4 1 8 30 @ 3 0 2 20 A 5 1 2 40 Add the top row to the bottom 0 4 1 @ 3 0 9 0 row: 8 2 6 1 30 20 A 70 Multiply the middle row by row: 0 4 @ 3 0 Since the bottom row is zeros all the way across except for the last column. there is a unique solution. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 238 Multiply the second row by 2 and subtract it from the third row: 0 1 2 3 0 6 @ 3 0 5 15 A 0 0 0 0 There is a row of all zeros.CHAPTER 18. get the augmented matrix in row-echelon form. (d) The determinant of the matrix 0 4 1 @ 3 0 5 1 1 8 2 A 2 is 0. and so there is not a unique solution. .

SOLUTIONS TO END-OF-CHAPTER PROBLEMS 239 2. R) Implicitly di¤erentiate with respect to t to get dY dt dY dR + i0 dt dt dY dR 0 = P mY + P mR dt dt = c0 Y + (1 t)c0 (1 t)c0 i0 mY mR dY dt dR dt Write in matrix form: 1 = c0 Y 0 . which leads to the equation 6a + 2 = 0 a = 1 : 3 (b) Setting the determinant equal to zero and solving for a yields 5a 5 = 0 a = 1 (c) There is no inverse if the determinant is zero. (a) Rewrite the system as Y = c((1 t)Y ) + i(R) + G M = P m(Y.CHAPTER 18. (a) There is no inverse if the determinant is zero. which leads to the equation 5a + 9 = 0 a = 9 : 5 (d) Setting the determinant equal to zero and solving for a yields 20a 35 = 0 7 a = 4 Solutions for Chapter 8 1.

dR = dt = (1 t)c0 c0 Y mY 0 0 0 (1 t)c i mY mR mY c0 Y (1 (1 t)c0 )mR + mY i0 1 which is c0 Y times the derivative from the lecture. so an increase in the tax rate reduces GDP. It is negative. so an increase in the tax rate reduces the interest rate. (b) Implicitly di¤erentiate the system with respect to M to get dY dM dY dR + i0 dM dM dR dY + P mR 1 = P mY dM dM = (1 t)c0 t)c0 mY Use Cramer’ rule to get: s dY dM 0 i0 1 mR = (1 t)c0 i0 mY mR i0 = (1 (1 t)c0 )mR + mY i0 i0 mR dY dM dR dM Write in matrix form: (1 = 0 1 . SOLUTIONS TO END-OF-CHAPTER PROBLEMS 240 Use Cramer’ rule to get: s dY dt c0 Y i0 0 mR 0 (1 t)c i0 mY mR c0 Y (1 (1 mR t)c0 )mR + mY i0 = = which is c0 Y times the derivative from the lecture.CHAPTER 18. It is negative.

(c) Implicitly di¤erentiate the system with respect to P to get dY dP dY dR + i0 dP dP dY dR 0 = m + P mY + P mR dP dP = (1 t)c0 Write in matrix form: 1 (1 t)c0 i0 mY mR dY dP dR dP = 0 m The derivatives are m times the derivatives from part (b). R) Implicitly di¤erentiate with respect to G to get dY dY dR dY dR = c0 + i0 + 1 + xY + xR dG dG dG dG dG dY dR 0 = P mY + P mR dG dG . and so an increase in money supply increases GDP. making the derivative negative. (a) First simplify to two equations: Y = c(Y T ) + i(R) + G + x(Y. and so an increase in the price level reduces GDP and increases the interest rate. R) M = P m(Y.CHAPTER 18. t)c0 0 mY 1 dR = 0 dt (1 t)c i0 mY mR (1 t)c0 = (1 (1 t)c0 )mR + mY i0 (1 The numerator is positive. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 241 Both the numerator and denominator are negative. 2. and so an increase in money supply reduces the interest rate. making the derivative positive.

SOLUTIONS TO END-OF-CHAPTER PROBLEMS 242 Rearrange as dY dG c0 dY dG i0 dR dG dY dR xR = 1 dG dG dR dY + mR = 0 mY dG dG xY dY dG dR dG We can write this in matrix form 1 c0 xY mY i0 x R mR = 1 0 Now use Cramer’ rule to solve for dY =dG and dR=dG: s i0 x R mR dY = 0 dG 1 c xY i0 x R mY mR mR = (1 c0 xY )mR + mY (i0 + xR ) 1 0 The numerator is negative. The denominator is negative. so an increase in tax revenue reduces both GDP and interest rates. Thus. dR=dG > 0. c0 xY 1 mY 0 dR = 0 0 dG 1 c xY i xR mY mR mY = (1 c0 xY )mR + mY (i0 + xR ) 1 The numerator is negative and so is the denominator. . (b) In matrix form we get 1 c0 xY mY i0 x R mR dY dG dR dG = c0 0 Thus. the derivatives are c0 times those in part (a). So. dY =dG > 0.CHAPTER 18. An increase in government spending increases both GDP and interest rates in the short run.

R) + P mY + P mR dG dG dG dY = 0 dG t)c0 The last line implies. let’ go through s dY = (1 dG . SOLUTIONS TO END-OF-CHAPTER PROBLEMS 243 (c) dY dM i0 x R mR = 0 1 c xY i0 x R mY mR 0 i + xR = (1 c0 xY )mR + mY (i0 + xR ) 1 0 Both the numerator and denominator are negative. making the derivative negative. and so an increase in money supply reduces the interest rate. R) Y = Y Implicitly di¤erentiate with respect to G to get dY dR dY dR + i0 + xY + xR +1 dG dG dG dG dP dY dR 0 = m(Y. making the derivative positive. (a) Rewrite the system as Y = c((1 t)Y ) + i(R) + x(Y.CHAPTER 18. Even so. This makes sense because Y is …xed at the exogenous level Y . that dY =dG = 0. obviously. dR dM c0 xY 0 mY 1 = (1 t)c0 i0 mY mR 1 c0 xY = (1 c0 xY )mR + mY (i0 + xR ) 1 The numerator is positive. and so an increase in money supply increases GDP. 3. R) + G M = P m(Y.

SOLUTIONS TO END-OF-CHAPTER PROBLEMS 244 the e¤ort of writing the equation in 0 1 (1 t)c0 xY i0 x R @ P mY P mR 1 0 Use Cramer’ rule to get s matrix notation: 10 1 0 1 0 dY =dG 1 m A @ dR=dG A = @ 0 A 0 dP=dG 0 dY = dG 1 1 i0 x R 0 0 P mR m 0 0 0 0 0 (1 t)c xY i xR 0 P mY P mR m 1 0 0 =0 where the result follows immediately from the row with all zeroes.CHAPTER 18. 1 dP = dG t)c0 P mY 1 (1 t)c0 P mY 1 (1 xY i0 x R P mR 0 0 i xR P mR 0 1 0 0 0 m 0 1 xY = mi0 P mR > 0: mxR An increase in government spending leads to an increase in the price level. 1 dR = dG t)c0 xY 1 0 P mY 0 m 1 0 0 0 0 (1 t)c xY i xR 0 P mY P mR m 1 0 0 (1 1 = m mi0 mxR = i0 1 > 0: + xR Increased government spending leads to an increase in interest rates in the long run. . Finally. As for interest rates. The increase in government spending has no long-run impact on GDP.

R) dM dM dM dY = 0 dM Write the equation in matrix notation: 0 10 1 0 1 1 (1 t)c0 xY i0 x R 0 dY =dM 0 @ P mY P mR m A @ dR=dM A = @ 1 A 1 0 0 dP=dM 0 dY dM = (1 t)c0 Use Cramer’ rule to get s 0 i0 x R 0 1 P mR m 0 0 0 (1 t)c0 xY i0 x R 0 P mY P mR m 1 0 0 dY = dM 1 =0 The increase in money supply has no long-run impact on GDP.CHAPTER 18. either. 1 dP = dM t)c0 P mY 1 (1 t)c0 P mY 1 (1 xY i0 x R P mR 0 0 i xR P mR 0 0 1 0 0 m 0 1 xY = i0 mi0 xR 1 = > 0: mxR m . 1 dR = dM t)c0 xY 0 0 P mY 1 m 1 0 0 0 0 (1 t)c xY i xR 0 P mY P mR m 1 0 0 (1 1 = 0 mi0 mxR = 0: Increasing the money supply has no long-run impact on interest rates. As for interest rates. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 245 (b) This time implicitly di¤erentiate with respect to M to get dY dR dY dR + i0 + xY + xR dM dM dM dM dY dR dP + P mY + P mR 1 = m(Y. Finally.

CHAPTER 18. and Sp > 0. Dp < 0. (b) Implicitly di¤erentiate the system with respect to w: dqD dp = Dp dw dw dqS dp = Sp + Sw dw dw dqD dqS = dw dw Write it in matrix form: 0 10 1 0 1 1 0 Dp dqD =dw 0 @ 0 1 Sp A @ dqS =dw A = @ Sw A 1 1 0 dp=dw 0 . SOLUTIONS TO END-OF-CHAPTER PROBLEMS 246 An increase the money supply leads to an increase in the price level. 4. (a) Implicitly di¤erentiate the system with respect to I: dp dqD = Dp + DI dI dI dqS dp = Sp dI dI dqD dqS = dI dI Write it in matrix form: 0 10 1 0 1 1 0 Dp dqD =dI DI @ 0 1 Sp A @ dqS =dI A = @ 0 A 1 1 0 dp=dI 0 Solve for dp=dI using Cramer’ rule: s 1 0 DI 0 1 0 1 1 0 1 0 Dp 0 1 Sp 1 1 0 dp = dI = DI >0 DP SP where the result follows because DI > 0. That’ the only long-run impact of an increase in money s supply.

SOLUTIONS TO END-OF-CHAPTER PROBLEMS 247 Solve for dp=dw using Cramer’ rule: s 1 0 0 0 1 Sw 1 1 0 1 0 Dp 0 1 Sp 1 1 0 dp = dw = Sw DP SP > 0.CHAPTER 18. and Sp > 0. (a) The matrix is XT X = 2 6 8 24 4 16 0 2 @ 6 4 1 8 24 A = 16 56 224 224 896 146 23 : and its determinant is 0. ^ = (X T X) 1 X T y = 1 138 205 64 . 5. and so the two vectors span the same column space. where the result follows because Sw < 0. but there are in…nitely-many ways to write the resulting projection as a combination of the two column vectors. (b) The second column of x is a scalar multiple of the …rst. We can write the regression as y = X + e where 0 1 0 1 6 1 9 y = @ 2 A and X = @ 1 4 A : 5 1 3 The estimated coe¢ cients are given by ^ = (X T X) 1 X T y = 1 62 6. The regression projects the y vector onto this column space. Dp < 0. 7.

CHAPTER 18. this is 4 1 4 1 v1 v2 = 0 0 : 2 1 v1 v2 = 0 0 : There are many solutions. To see this. Also. All it does is make the solution indeterminant. note that if we add the column 0 1 0 1 1 2 24 4 14 168 B 1 3 36 C T C @ 14 54 648 A X=B @ 1 5 60 A and X X = 168 648 7776 1 4 48 The determinant of X T X is 0. the third column is 12 times the second column. 1 2 1 = 0: Eigenvectors satisfy 5 4 When = 1. the equation is 1 4 1 4 v1 v2 = 0 0 : = 1 4 : . 9. and so the new variable does not expand the space spanned by the columns of the data matrix. x3 = 12x2 . and the matrix X T X will not be invertible. but one of them is v1 v2 When = 6. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 248 8. (a) The eigenvalues are given by the solution to the problem 5 4 Taking the determinant yields (5 )(2 ) 4 = 0 6 7 + 2 = 0 = 6.

3). x2 . The eigenvalues are = 2 and = 41=45. 1). both of which (b) No. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 249 Again there are many solutions. The …rst one is larger than 1. (a) Yes. 2. = 1=3 and = 1=5. Solutions for Chapter 9 1. 1 2x2 + 3x2 8x1 3 2 A 6x1 x2 rf (x1 . 0) = @ 60 A 0 0 . When = 7 an eigenvector is (3. When p = p 2 + 2 13 an eigenvector is ( 2 13 8. so the system is unstable. (c) Yes. (c) The eigenvalues are = 7 and = 0. p p 13 (e) The eigenvalues are = 2 + 2 p and = 2 2 13.CHAPTER 18. 4). 3). both of which (d) No. 1). 2) and when = 0 an eigenvector is (2. = 2=3. The eigenvalues are = 5=4 and = 1=3. 3) and when = 2 2 13 p an eigenvector is (2 13 8. 10. so the system is unstable. 0) and when = 2 an eigenvector is ( 4. When = 3 an eigenvector is (1. x3 ) = @ 4x1 x3 0 1 28 rf (5. and when = 6 an eigenvector is (1. The eigenvalues are = 1=4 and have magnitude less than one. but one of them is v1 v2 = 1 1 : (b) Use the same steps as before. The eigenvalues are are less than one. The …rst one is larger than 1. (d) The eigenvalues are = 3 and = 2. The eigenvalues are = 7 and = 6: When = 7 an eigenvector is (1.

SOLUTIONS TO END-OF-CHAPTER PROBLEMS 250 2.CHAPTER 18. (a) The second-order Taylor approximation is f (x0 ) + f 0 (x0 )(x We have f (1) f 0 (x) f 0 (1) f 00 (x) f 00 (1) = 2 = 6x2 = 11 = 12x = 12 x0 ) + f 00 (x0 ) (x 2 x0 )2 5 and so the Taylor approximation at 1 is 2 (b) We have f (1) = 30 20 1 p + x x 1 x2 11(x 1) 12 (x 2 1)2 = 6x2 + x + 7 f 0 (x) = 10 f 0 (1) = f 00 (x) = 9 10 3 2 x f 00 (1) = 9 and the Taylor approximation at 1 is 30 (c) We have f (x) = f 0 (x) = f 00 (x) = ex f (1) = f 0 (1) = f 00 (1) = e and the Taylor approximation is e + e(x e 1) + (x 2 1 1)2 = e x2 + 1 2 9(x 9 1) + (x 2 9 1)2 = x2 2 18x 33 2 .

f (x) f (x0 ) + f 0 (x0 )(x = c + bx + ax2 The second-degree Taylor approximation gives you a second-order polynomial. jA2 j = 8 > 0. and 4 0 1 0 3 2 a12 a21 = a12 a21 = 0. 5. and if you begin with a second-order polynomial you get a perfect approximation. and jA3 j = fx = y fy = 8y x and the matrix of second partials is fxx fxy fyx fyy = 0 1 1 8 : 44 < 0: 6.CHAPTER 18. the …rst partials are This matrix is inde…nite and so the second-order conditions for a minimum are not satis…ed. (a) Negative de…nite because a11 < 0 and a11 a22 a12 a21 = 7. y) denote the objective function. 4 0 0 3 = 12. 1 x0 ) + f 00 (x0 )(x 2 x0 )2 2x 4x2 1 x0 ) + f 00 (x0 )(x 2 x0 )2 (b) Positive semide…nite because a11 > 0 but a11 a22 (c) Inde…nite because a11 > 0 but a11 a22 (d) Inde…nite because ja11 j > 0. f (x) f (x0 ) + f 0 (x0 )(x = 12 4. (a) Letting f (x. . SOLUTIONS TO END-OF-CHAPTER PROBLEMS 251 3. 25: 1 2 1 25: (e) Positive de…nite because jA1 j = 6 > 0 and jA2 j = 17 > 0: (f) Inde…nite because jA1 j = (g) Negative de…nite because jA1 j = 4 < 0 but jA2 j = 240 < 0: 2 < 0 and jA2 j = 7 > 0: (h) Inde…nite because jA1 j = 3 > 0.

SOLUTIONS TO END-OF-CHAPTER PROBLEMS 252 (b) Letting f (x. If it is a convex combination there must be some number t 2 [0. 7. (c) We have rf = and H= 0 5 5 4 25 < 0: 5y 5x 4y This matrix is inde…nite because jH1 j = 0 but jH2 j = The second-order condition is not satis…ed. the …rst partials are fx = 8 fy = 6 and the matrix of second partials is fxx fxy fyx fyy = 2 0 0 2 2x 2y which is negative de…nite. y) denote the objective function. The second-order conditions are satis…ed. (d) We have rf = and H= 12 0 0 6 12x 6y This matrix is positive de…nite because jH1 j = 12 > 0 and jH2 j = 72 > 0: The second-order condition is satis…ed.CHAPTER 18. 1] such that 6 11 1 =t + (1 t) : 2 4 0 Writing these out as two equations gives us 6 = 11t + ( 1)(1 t) .

(d) 0:03 + 0:17 + 0:00 + 0:05 + 0:04 + 0:20 = 0:49. we want to show that f (txa + (1 t)xb ) tf (xa ) + (1 t)f (xb ) Looking at the left-hand side. These are not the same so it is not a convex combination. t. Let t be a scalar between 0 and 1. (b) There are two of them: x = 15 and y = 4. (c) Sum the probabilities along the row to get 0:35. f (txa + (1 t)xb ) = (txa + (1 = (xb + txa t)xb )2 txb )2 Looking at the right-hand side. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 253 and 2 = 4t + 0(1 t): Solving the …rst one yields t = 7=12 and solving the second one yields t = 1=2. and x = 15 and y = 5. tf (xa ) + (1 t)f (xb ) = tx2 + (1 a 2 = xb + tx2 a t)x2 b 2 txb Subtracting the left-hand side from the right-hand side gives us x2 + tx2 b a tx2 b (xb + txa txb )2 = t(1 t)(xa xb )2 which has to be nonnegative because t. (a) x = 20 and y = 4.CHAPTER 18. 8. (e) P (y 2jx 20) = P (y 0:23 23 2 and x 20) = = P (x 20) 0:79 79 . Given xa and xb . 1 all nonnegative. and anything squared are Solutions for Chapter 10 1.

SOLUTIONS TO END-OF-CHAPTER PROBLEMS 254 (f) Bayes’rule says P (y = 4jx = 20) = P (x = 20jy = 4) P (y = 4) : P (x = 20) We have P (y = 4jx = 20) = 0:20=0:44 = 5=11. (c) 0:73. 2. 4g) = 0:36 0:73 = 0:2628 6= 0:14: They are not statistically independent. We have P ((a 2 f1. 3g)P (b 2 f1. 3). (b) The numbers in parentheses are (a. 4g) = (0:65)(0:42) = 0:273 P (x 20 and y 2 f1. 3)g. 3g and b 2 f1. 1).CHAPTER 18. 4g) = 0:45 P (b 2 f1. 4g) = 0:73. . b) pairs: f(4. 4g) = 0:42 P (x 20) P (y 2 f1. 2. 4g) = 0:14. 2. P (x = 20jy = 4) P (y = 4) (0:20=0:27) (0:27) 20 5 = = = : P (x = 20) 0:44 44 11 (g) Two events are statistically independent if the probability of their intersection equals the product of their probabilities. 4g) = 0:58 P (a 3jb 2 f1. 2. 4g) = 0:45 = 0:77586 0:58 (f) P (a 2 f1. (a) P (A) = 0:26 and P (B) = :18. Also. but P (a 2 f1. 3g) = 0:36 and P (b 2 f1. so A is more likely. (4. We have P (x 20) = 0:65 P (y 2 f1. (d) P (b = 2ja = 5) = P (b = 2 and a = 5)=P (a = 5) = 0:06=0:32 = 3=16 = 0:1875: (e) P (a and so 3 and b 2 f1. 4g) = 0:24 They are not statistically independent. (5.

But she didn’ tell you P (old). P (positive) In spite of the positive test. it is still very unlikely that Max has the disease. 000 = 0:0000475 + 0:0499975 = 0:050045 Now we get P (disease j positive) = P (disease and positive) P (positive) 0:0000475 = 0:050045 = 0:000949 P (disease and positive) . 4. We want P (disease j positive). so you must t . SOLUTIONS TO END-OF-CHAPTER PROBLEMS 255 3.CHAPTER 18. 000 20. 000 = 0:0000475 and P (positive) = P (positive j disease) P (disease) + P (positive j healthy) P (healthy) 19. 999 1 + 0:05 = 0:95 20. Use Bayes’rule: P (entrepreneur j old) = P (old j entrpreneur)P (entrepreneur) P (old) Your grad assistant told you that P (old j entrepreneur) = 0:8 and that P (entrepreneur) = 0:3. which is P (disease j positive) = Note that P (disease and positive) = P (positive j disease) P (disease) 1 = 0:95 20.

t)dx = dx + b0 (t)f (b(t). (a) Plugging in f (x) = 1=6 gives us Z Z 8 1 8 xdx xf (x)dx = 6 2 2 8 1 2 = x 12 2 64 4 = =5 12 12 (b) Following the same strategy.CHAPTER 18. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 256 calculate it: P (old) = P (old j doctor)P (doctor) +P (old j lawyer)P (lawyer) +P (old j entrpreneur)P (entrpreneur) = (0:6)(0:2) + (0:3)(0:5) + (0:8)(0:3) = 0:51 Plugging this into Bayes’rule yields P (entrepreneur j old) = 47% of old people are entrepreneurs. Leibniz’rule says Z Z b(t) d b(t) @f (x. t): . (0:8)(0:3) = 0:47 0:51 Solutions for Chapter 12 1. Z 8 Z 1 8 2 2 x f (x)dx = x dx 6 2 2 8 1 3 = x 18 2 512 8 = = 28 18 18 2. t) dt a(t) @t a(t) a0 (t)f (a(t). t) f (x.

b). t) a0 (t)f (a(t). t) = t (t2 )2 = t5 . SOLUTIONS TO END-OF-CHAPTER PROBLEMS 257 Here f (x. Then 8 x<a < 0 x a for a x < b F (x) = : b a 1 x b . and f (a(t). t) = x2 . 1). t) = tx2 . f (b(t). t) dt a(t) @t a(t) Z 4t2 Z 4t2 d t2 x3 dx = 2 tx3 dx + (8t) t2 (4t2 )3 ( 3) t2 ( 3t)3 dt 3t 3t = 2 4 tx 4 4t2 + 512t9 3t 81t5 81t5 = 128t9 = 640t9 81 5 t + 512t9 2 243 5 t 2 4. t)dx = dx + b0 (t)f (b(t). Use Leibniz’rule: Z Z b(t) d b(t) @f (x. t) = t ( t2 )2 = t5 . and a(t) = so @f (x. and let G(x) denote the distribution function for U (0. b(t) = t2 . t) f (x. Let F (x) denote the distribution function for U (a. @t Leibniz’rule then becomes Z Z 2 d t 2 tx dx = dt t2 = = t2 t2 x2 dx + (2t)(t5 ) t2 t2 ( 2t)(t5 ) x3 3 6 + 2t6 + 2t6 t2 t6 t + 4t6 3 3 14 6 = t: 3 3.CHAPTER 18.

if 0 a 1 b we can write 8 0 > > > > x < a x x a for G(x) F (x) = b > > 1 x a > b a > : 0 Note that x x b bx a = a 0 a 1 ax x + a a(1 x) (b 1)x = + b a b a b a 1 b a = a x a b a 0. and x+a b = a b x a which is positive when b 1 x b which is positive when b x: So G(x) F (x) as desired. from looking at the x<0 x<a x<1 x<b x b First-order stochastic dominance requires F (x) ments on a and b are G(x). Solutions for Chapter 13 1. The require- The easiest way to see this is by graphing it.CHAPTER 18. equations. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 258 and 8 x<0 < 0 x for 0 x < 1 G(x) = : 1 x 1 a b 0 1 But. (a) = (:10)(7)+(:23)(4)+(:40)(2)+(:15)( 2)+(:10)( 6)+(:02)( 14) = 1: 24 (b) = (:10)(7 1: 24)2 + (:23)(4 1: 24)2 + (:40)(2 (:15)( 2 1: 24)2 + (:10)( 6 1: 24)2 + (:02)( 14 16: 762: 2 1: 24)2 + 1: 24)2 = .

SOLUTIONS TO END-OF-CHAPTER PROBLEMS 259 2. 0 2 x t 0 2 .CHAPTER 18. (a) The means are f = (10)(:15) + (15)(:5) + (20)(:05) + (30)(:1) + (100)(:2) = 33 and g = (10)(:2) + (15)(:3) + (20)(:1) + (30)(:1) + (100)(:3) = 41: 5: (b) The variances are 2 f = (10 33)2 (:15) + (15 33)2 (:5) + (20 33)2 (:2) 33)2 (:05) +(30 33)2 (:1) + (100 = 1148:5 and 2 g = (10 41:5)2 (:2) + (15 41:5)2 (:3) + (20 41:5)2 (:3) 41:5)2 (:1) +(30 41:5)2 (:1) + (100 = 1495:3 (c) The standard deviations are p 1148:5 = 33: 890 f = and g = p 1495:3 = 38: 669 3. (a) F (x) = = = Z x f (t)dt x Z 0 2tdt = x: (b) All those things hold.

= 2 Then note that 2 = E[~2 ] x 2 = 1 2 4 1 = : 9 18 4. (a) For x 2 [0.CHAPTER 18. . SOLUTIONS TO END-OF-CHAPTER PROBLEMS 260 (c) = Z 1 x 2xdx Z 1 0 = 2 x2 dx 1 0 0 2 3 = x 3 2 = : 3 (d) First …nd E[~ ] = x 2 Z 1 x2 2xdx Z 1 0 = 2 = x3 dx 1 0 0 2 4 x 4 1 . 4] we have F (x) = 1 2 = t 16 0 1 2 = x 16 Z x 0 1 tdt 8 x Outside of this interval we have F (x) = 0 when x < 0 and F (x) = 1 when x > 1.

the second factors out the a. We know that E[(x We want to …nd 2 y 2 x) ] = 2 x = E[(y 2 y) ] Note that yields y =3 x 1. 6. and the third is the de…nition of 2 . where ~ The variance is V ar(a~) = E[(a~ a )2 ] x x = E[a2 (~ x )2 ] a2 E[~ x ]2 = a2 2 : The …rst line is the de…nition of variance. and F 0 (x) = x=8 0. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 261 (b) Using F (x) = x2 =16. (c) = (d) 2 Z 4 x 0 1 8 x dx = 8 3 8 3 2 = Z 4 x 0 1 8 x dx = 8 9 is the mean of x. Substituting these in = E[(y = = = = = E[(3x E[(3x E[9(x 9E[(x 9 2: x 2 y) ] 1 3 (3 2 x) ] 2 x) ] 2 x) ] x 1))2 ] . F (4) = 1.CHAPTER 18. and that y = 3x 2 y 1. we get F (0) = 0. The mean of the random variable x is a . ~ 5. the third works because the expectations operator is a linear operator.

fx (4) = :31. and f (4j~ = 20) = 12=23. We have [F n (x)] G(1) (x) = nF n 1 (x)(1 F (x)) + F n (x) = nF n 1 (x)(1 F (x)) 0: Solutions for Chapter 14 1. All we have to do is show that G(2) (x) G(2) (x) 3: G(1) (x) for all x. fy (30) = :63: ~ ~ ~ Similarly. fy (20) = :23. the mean is y = (:14)(10) + (:23)(20) + (:63)(30) = 24:9 . ~ ~ ~ ~ fy (10) = :14. y y f (3j~ = 20) = 11=23. 20)=fy (20). fx (2) = :25. (c) fx (1) = :24. y ~ which gives us f (1j~ = 20) = 0=:23 = 0. and Fy is given by the ~ ~ bottom row of part (a).CHAPTER 18. fx (3) = :20. f (2j~ = 20) = 0. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 262 7. (a) F (x. (d) The formula for conditional density is f (xj~ = 20) = f (x. The variance is therefore 2 2 1 [6 2 1 2 = y 4 = 1 1 (3 + y)]2 + [y 2 2 3y + 9: 1 (3 + y)]2 2 The derivative with respect to y is d 2 1 = y dy 2 8. y) y = 10 y = 20 y = 30 ~ ~ ~ x=1 ~ :04 :04 :24 :11 :11 :49 x=2 ~ :13 :24 :69 x=3 ~ x=4 ~ :14 :37 1:00 (b) Fx is given by the last column of part (a). The mean is = 3 + 1 y. y y (e) Using the marginal density from part (c).

Fx (4) = 1:00 and ~ ~ ~ ~ Fy (3) = 0:17.44 0. For example. ~ ~ (d) f (~ = 3j~ = 1) = 0:12.71 1.43 y = 10 ~ 0.19 0. fy (8) = 0:26. x It works. (a) f (x.CHAPTER 18. Fx (3) = 0:71. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 263 (f) Using part (d). y) = fx (x)fy (y).05 0. using the marginal density from part (c) yields Ex [~] = (:24)(1) + (:25)(2) + (:20)(3) + (:31)(4) = 2:58.03 0. ~ ~ ~ (c) fx (1) = 0:25. (e) x = 2:6 and y = 8:29. ~ This does not hold. y) x=1 ~ x=2 ~ x=3 ~ x=4 ~ y=3 ~ 0. we have f (3.17 y=8 ~ 0. Fx (2) = 0:44. Fy (10) = 1:00. fx (4) = 0:29 and fy (3) = ~ ~ ~ ~ ~ 0:17. 2. and f (~ = 10j~ = y x y x y x 1) = 0:80.25 0. fy (10) = 0:57. Fy (8) = 0:43. 20) = :11. .25 0. fx (3) = ~ :20.10 0. and fy (20) = :23.05 0. which makes fx (3)fy (20) = :046 6= :11. For the two to be independent we need f (x. fx (2) = 0:19. xjy=20 = (0)(1) + (0)(2) + (11=23)(3) + (12=23)(4) = 81=23 (g) No. fx (3) = 0:27.00 (b) Fx (1) = 0:25. f (~ = 8j~ = 1) = 0:08. ~ ~ ~ (h) We have :04(1) + :07(2) + :02(3) + :01(4) = 2:0 :14 0(1) + 0(2) + :11(3) + :12(4) = 3:52 Ex [~j~ = 20] = xy :23 :2(1) + :18(2) + :07(3) + :18(4) Ex [~j~ = 30] = xy = 2:36 :63 Ey [Ex [~jy] = (:14)(2:0) + (:23)(3:52) + (:63)(2:36) = 2:58 x Ex [~j~ = 10] = xy Finally.

046 0. (h) (j) 2 x = 1:32 and = :176: 2 y = 6:45: (i) Cov(~. c].143 0. y ) = x ~ xy 0:514 (k) We have Ex [~j~ = xy Ex [~j~ = xy Ex [~j~ = xy (:03)(1) + (:02)(2) + (:05)(3) + (:07)(4) = 2: 94 :17 (:02)(1) + (:12)(2) + (:01)(3) + (:11)(4) 8] = = 2: 81 :26 (:2)(1) + (:05)(2) + (:21)(3) + (:11)(4) 10] = = 2:40 :57 3] = Ey [Ex [~jy]] = (:17)(2:94) + (:26)(2:81) + (:57)(2:40) = 2:6 x which is the same as the mean of x found above. it is 1 for x > c.043 0.049 0. . ~ 3.CHAPTER 18. b] is F (x) = x b a a The conditional when x 2 [a.032 0. it is 1 for x > b.075 y = 10 ~ 0. y) table. The following table shows the entries for fx (x)fy (y): ~ ~ fx (x)fy (y) ~ ~ x=1 ~ x=2 ~ x=3 ~ x=4 ~ y=3 ~ 0.049 y=8 ~ 0. But this is just the uniform distribution over [a.108 0. and 0 for x < a. and 0 for x < a. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 264 (f) E[~j~ = 3] = [(0:03)(1) + (0:02)(2) + (0:05)(3) + (0:07)(4)]=0:17 = xy 2: 94: (g) No. distribution is F (xjx F (x) c) = = F (c) x b c b a a a a = x c a a for x 2 [a.154 0. c].065 0. b].070 0. The uniform distribution over [a.165 None of the entries are the same as those in the f (x.

(b) The t-statistic is 3. Use the Excel formula =TDIST(7. and the hypothesis is rejected.706. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 265 4.614.27. (c) The t-statistic is -0. 29.0000000365.000882.1 Solutions for Chapter 15 1. But x y f (~ = x 1j~ = 20) = y a : a+b We also know that a + b must equal 0.6 so that the probabilities sum to one. 2) to get the p-value of 0. This answer uses the latter formulation. We can see that f (~ = 1j~ = 10) = 1=4. x and y are independent if f (x. Solutions for Chapter 17 1. .CHAPTER 18. and for x and y to be independent it x y ~ ~ must also be the case that f (~ = 1j~ = 20) = 1=4. Thus. y) = fx (x)fy (y) or.41. and the hypothesis is supported. equivalently. if ~ ~ ~ ~ f (xjy) = f (x). (a) Compute the t-statistic t= x 60:02 0 p = p = 7:41 s= n 44:37= 30 which has 29 degrees of freedom. a a 1 = = a+b 0:6 4 :6 = 0:15 a = 4 b = 0:6 a = 0:45: 18. The data reject the hypothesis. the p-value for the one-tailed test is 0. x = 4 and s2 = 24. the p-value is 0.

. 2.CHAPTER 18. SOLUTIONS TO END-OF-CHAPTER PROBLEMS 266 (d) The sample mean is 60. x = 44:2 s2 = 653:8 (b) Compute the t-statistic t= 44:2 40 x p = p = 0:73 s= n 25:6= 20 and the t-statistic has 19 degrees of freedom. so the hypothesis is supported. 19)) = :01 25 < 0:05 and the data reject the hypothesis.02 which is smaller than 100. (c) Compute the t-statistic t= 44:2 60 x p = p = s= n 25:6= 20 2: 76 and again the t-statistic has 19 degrees of freedom. From here compute the p-value of 2(1 TDist(2:76. 19)) = 0:47 > 0:05 and the data support the hypothesis. From here compute the p-value of 2(1 TDist(0:73. (a) The best estimate of is the sample mean x and the best estimate of 2 is the sample variance s2 .

29 asymptotic theory. 5. 145 Excel command. 78. 202 astrology. 192 Central Limit Theorem. 138 posterior. 43 production function. 145 better-than set. 89. 46 utility function. 31. 87 Bayes’rule. 194 relationship to normal distribution. 192 Law of Large Numbers. 193 chain rule for functions of one variable. 147 sampling from. 200 capacity constraint. 82 coin ‡ ipping. 137 likelihood. 192 almost sure convergence. 43 cofactor. 29. 54 binomial distribution. 54 cdf. 100 comparative statics analysis. 45. 126 binding constraint. 138 prior. 193 convergence in probability. 161 augmented matrix. 25 chi-square distribution. 15. 138 Bernoulli distribution. 96 267 . 145 column space.INDEX alternative hypothesis. 192 auctions. 13 for functions of several variables. 193 convergence in distribution. 145 Central Limit Theorem. 195 Cobb-Douglas function.

122 convex set. 97 s critical region. 15 for equality-constrained optimization. 104 e. 192 almost sure. 143 distribution function. 199 …rst-order condition for multidimensional optimization. 24 product rule. 124 coordinate vector. 165 conditional. 17 econometrics. 106 Euclidean space. 10 continuous random variable. 202 cumulative density function. 81 correlation coe¢ cient. 104 in matrix form. 99 eigenvalue. 40 for inequality-constrained optimization.INDEX implicit di¤erentiation approach. 144 dot product. 179. 25. 203 constant function. 120 convex function. 180 covariance. 18 268 F distribution. 30 total di¤erential approach. 177 con…dence level. 120 de…nition. 177 general formula. 164 conditional. 132. 199 relation to t distribution. 79. 147 t distribution. 23 event. 32 complementary slackness. 22 concave function. 164. 144 derivative. 9 chain rule. 29 discrete random variable. 136. 14 partial. implicit. 177 discrete case. 149 exponential function. 179 Cramer’ rule. 65 Kuhn-Tucker conditions. 28 …rst-order condition (FOC). 98. 181 conditional probability. 181 expectation operator. 181 experiment. 191 density function. 165 expected value. 122 de…nition. 78 di¤erentiation. 121 conditional density. 67 . 177 continuous case. 193 in probability. 105 eigenvector. 12 determinant. 192 convex combination. 17 derivative. 57. 57 component of a vector. 131 exponential distribution. 23 dynamic system. 102. 177 conditional expectation. 143 convergence of random variables. 192 in distribution. 81. 145 degrees of freedom. 13 division rule. 207 expectation. 143 Excel commands binomial distribution.

37. 31. 177 inner product. 23 martrix integration by parts. 66 alternative. 117 joint distribution function. identically distrib. 160. 91. 87. separation from. 39 hypothesis Kuhn-Tucker. 102 Kuhn-Tucker conditions. 101 p-value. 156 binding. 151 inequality constraint lower animals. 59. 73 existence. 92 formula. 187 implicit di¤erentiation. 53. 202 Law of Large Numbers. 175 idempotent. 75 IID (independent. 192 critical region. 202 Weak Law. 97 linearly dependent vectors. 184 hypothesis testing. 55 marginal density function. 99. 72 2 2. 140 logarithm. identically distributed. 158 diagonal elements. 92 independence. 138 two-tailed. 43. 205 linear approximation. 17 independent random variables. 117 .linear operator. 72 Hessian. 151 187 lognormal distribution. 17 independent. 29. 39 interpretation. 89 identity matrix. 81 augmented. 103 null. 162 s signi…cance level. 54 nonbinding. 203 likelihood. 192 con…dence level.INDEX …rst-order stochastic dominance. 55 Hessian. 207 slack. 192 errors. 62 uted). 55 Maple commands. 78 IS-LM analysis. 91 linearly independent vectors. 67 gradient. 159 Kuhn-Tucker Lagrangian. 82 addition. 95 dimensions. 203 Strong Law. 100 one-tailed. 66 269 Lagrange multiplier. 206 Leibniz’ rule. 82 determinant. 203 least squares. 73 inverse matrix. 76 matrix. 114 linear combination. 202 lame examples. 178 derivative. 48. logistic distribution. 202 Law of Iterated Expectations. 205 projection matrix. 157. 98. statistical. 117 Lagrangian. 155 linear programming.

187 standard normal. 145 probability distributions binomial. 143 null hypothesis. 148 t. 175 expectation. 76 left-multiplication. 132 orthogonal. 24 quasiconcave. 101 outcome (of an experiment). 92 row-echelon form. 126 normal distribution. 206 partial derivative. 118. 187 positive semide…nite matrix. 88. 164 variance. 88 right-multiplication. 119 nonsingular. 118. 143 objective function. 193 Monty Hall problem. 139 multivariate distribution. 3 row-echelon decomposition. 87. 55 probability measure. 75 scalar multiplication. 30 one-tailed test. 199 logistic. 168 continuous. 143. 151 lognormal. 143 nonconvex set. 25 second partial. 127 mean. 172 IID. 81 square.INDEX identity matrix. 138 prior probability. 87 . 170 for uniform distribution. 148 random variable. 132 norm. 24 cross partial. 188 standardized. 119 uniform. 147 nonbinding constraint. 124 properties. 145 population. 194 F. 169 …rst order statistic. 81 positive semide…nite. 92 realization of a random variable. 138 probability density function. 187 rank (of a matrix). 75 mean. 75 inverse. 172 second order statistic. 88. 75 multiplication. 143 rigor. 131 270 p-value. 169. 73 negative semide…nite. 198 negative semide…nite matrix. 119 posterior probability. 178 mutually exclusive events. 205 order statistics. 132. 148 quasiconvex. 170. 165 sample mean. 151 normal. 73 transpose. 171. 165 sampling from. 202 discrete. 73 singular. 197 random sample. 119 rank. 145 chi-square. 25 pdf.

191 uniform distribution. 161. 86 Cramer’ rule. 187 sample frequency. 91 graphing in (x. 75. 172 variance. 76. 167 univariate distribution function. 88 t distribution. 145 system of equations. 23 scalar multiplication. 116 for a function of m variables. 198 relation to F distribution. 140. 78 support. 181 second-order condition (SOC). 199 Taylor approximation. 103 using eigenvalues. 24 worse-than set. 31 transpose. 165 second order statistic. 119 second-price auction. 92 stability conditions. 179 statistical test. 97 inverse approach. 89 in matrix form. 205 type I error. 73 inequalities for ordering. 170 mean.INDEX sample. 188 sample space. 103 standard deviation. 117 test statistic. 16. 200 sample mean. 199 t-statistic. 193 statistic. 23 in matrix notation. 127 Young’ Theorem. 98 trial. 89 graphing in column space. 193 variance of. 147 …rst order statistic. 178 and correlation coe¢ cient. 22 coordinate. 25 dimension. 89. 203 type II error. 131 sample variance. 87 271 row-echelon decomposition approach. 166 vector. 11. 191 scalar. 203 span. 175 variance. 202 told you so. 115 for a function of m variables. 23 search. 179 and covariance. 97 s existence and number of solutions. 118. 131 two-tailed test. 109 stable process. y) space. 163 signi…cance level. 25 s . 48 total di¤erential. 81. 190 mean of. 79. 76. 202 submatrix. 189. 203 unbiased. 167 standardized mean. 187 statistical independence. 188 standardized.