c 2012 Je¤rey A.

Miron

Review of Calculus Tools

Outline
1. Derivatives
2. Optimization of a Function of a Single Variable
3. Partial Derivatives
4. Optimization of a Function of Several Variables
5. Optimization Subject to Constraints

1

Derivatives

The basic tool we need to review is derivatives. The basic, intuititive de…nition of a
derivative is that it is the rate of change of a function in response to a change in its
argument. Let’s take an example and look at it more slowly.
Say we have some variable y that is a function of another variable x, e.g.,
y = f (x)
For example, we could have
y = x2
or
y = 7x + 3
or
y = ln x
Graphically, I am just assuming that we have something that looks like the following:

1

Graph: A Standard Di¤erentiable Function with a Maximum
y = (x 3)2 + 8

y

10
9
8
7
6
5
4
3
2
1
0
0

1

2

3

4

5

6

7

8

9

10

x

Now say that we are interested in knowing how y will change if we change x.
Let’s say that y is test scores, and x is hours of studying.
Assume we are initially at some amount of x, e.g., you have been in the habit of
studying 20 hours per week. You want to know how much higher your test scores
would be at some other amount of x, x + h.
One thing you could do, if you know the formula, is take this alternate x + h, and
compute f (x) as well as f (x + h). You could then look at the di¤erence:
f (x + h)

f (x)

This would be the change in y. For some purposes, that might be exactly what you
care about.
In other instances, however, you might care about not just how much of a change
there would be, but how much per amount of change in x, i.e., per h:
That is also easy to calculate:
2

f (x + h)
h

f (x)

Now look at this graphically:

3

Graph: Calculating the Rate of Change in f Over a Discrete Interval
y = 4x1=2

y

10
9

f(x+h)8
7

f(x+h) - f(x)

6
5

f(x) 4
h

3
2
1
-1

0

1

x

2

3

4

5

x+h

6

7

8

9

10

x

As you can see, we are just calculating the ratio of two legs of a triangle; that
ratio is the slope of the line that connects the two points, as seen above.
The problem is that this calculation would have a di¤erent answer if we calculated
it at a di¤erent point:

4

Graph: Calculating the Rate of Change at a Di¤erent Point
y = 4x1=2

f(x+h)10
y
9

f(x+h') - f(x)

8

f(x) 7
h'

6
5
4
3
2
1
-1

0

1

2

3

x

4

5

6

x+h'

7

So, what if we calculated all this, but for a smaller h?
Then, we would get a di¤erent rate of change/slope:

5

8

9

10

x

Graph: Rate of Change as h Shrinks
y = 4x1=2

y

10
9
8
7
6
5
4

h

3
2
1
0
0

1

2

3

4

5

6

7

8

9

10

x

So, let’s think about the limiting case of this. Say we examine
f (x + h)
h!0
h
lim

f (x)

At one level, this thing might seem a bit confusing or ill-de…ned. The numerator
obviously goes to zero as h gets small. The denominator also goes to zero. So,
why should we expect the limit to converge to anything?
The proof is outside this course. But, looking at the graph, we can see that it
seems “plausible”that as we let h go to zero, the ratio should approach the slope of
the line that is tangent to the function.
This is indeed the case, and it can be proven, but we will just accept it as
reasonable.
To summarize, we have “shown”that the rate of change of a function at a given
point (assuming it has a well-de…ned rate of change) is equal to the slope of a line
that is tangent to the curve at that point.
6

So, we simply want to de…ne the derivative as
f (x + h)
h!0
h

dy=dx = f 0 (x) = lim

f (x)

The key thing to keep in your head is that the derivative is both:
1) the rate of change of the function at that point, and
2) the slope of the tangent line at that point.
Here are a few additional things to consider:
1) The derivative is usually di¤erent at di¤erent points.
2) Some functions do not have derivatives at all points:

7

Graph: Functions with Non-Di¤erentiabilities

y

10
9
8
7
6
5
4
3
2
1
0

y

0

1

2

0

1

2

3

4

5

6

7

8

9

4

5

6

7

8

9

x'

10

x

10
9
8
7
6
5
4
3
2
1
0

y=

3

x'

(x + 3)=x + 4

8

10

x

y

4

x' = 0
2
0
0.5

1.0

1.5

2.0

2.5

3.0

3.5

-2
-4
-6
-8
-10

3) We know the formula for the derivatives of a lot of functions:
constant
linear
polynomial
x to any power
ln x
ex
and many more, but we will only need the ones above.
4) We also know some rules about “combinations of functions.”
The product rule: if
f (x) = g(x)h(x)
then
9

4.0

x

f 0 (x) = g(x)h0 (x) + h(x)g 0 (x)
The chain rule: if
f (x) = g(h(x))
then
f 0 (x) = g 0 (h(x))h0 (x)

2

Optimization of a Function of a Single Variable

So far we have talked about the idea that the change in a variable y that depends
on a variable x, per unit of x, might be a useful thing to measure in some settings.
And, we have seen that the derivative we have de…ned –the change in y per unit
of x, for small changes in x –seems to measure that concept.
But we have not been that explicit about why derivatives are useful in economics.
We’ll take a step in that direction now.
So, imagine that we have some y that depends on x, and we control x. We know
that di¤erent values of x lead to di¤erent values of y, and we want to choose the x
that gives us the highest y.
For example, assume y is a measure of “happiness,”and x is the number of pints
of Ben and Jerry’s that a consumer eats each night. You might think that for
small values of x, y increases with x. But at some point, as x increases, happiness
decreases (because you can feel your arteries clogging as you eat your 8th pint that
night).
Graphically, we have

10

Graph: A Single-Peaked Function
y = (x 3)2 + 8

y

10
9
8
7
6
5
4
3
2
1
0
0

1

2

3

4

5

6

7

8

9

10

x

So, graphically, it’s easy to pick the right point.
The key thing about this point, other than the fact that it seems to be where y
is highest, is that the slope at that point, i.e., the derivative, is zero.
So, this suggests a strategy for …nding the x that leads to the maxiumum y: take
the derivative, set it equal to zero, and then solve.
That is, compute
f 0 (x)
set this to zero
f 0 (x) = 0
and solve for x.
This kind of equation is known as the …rst-order condition (FOC) for a maximum.

11

The phrase “…rst-order”is important; it suggests that this is not the whole story,
and that there may be “second-order” things we have to worry about. Let’s leave
that aside for a second.
Intuitively, it seems clear (and one can prove rigorously under some assumptions)
that the x that satis…es this condition is the x at which the maximum y occurs.
There are some caveats, but ignore them for a moment and look at an example.
Let’s say
y = f (x) =

x2 + 6x + 4:

Then the problem we want to solve can be written as
max
x

x2 + 6x + 4

We therefore compute the derivative
2x + 6
set this to zero
2x + 6 = 0
and solve for x; x turns out to be 3.

2.1

Caveats

The graph that I drew, and the example I considered, had “nice”features:
1. exactly one peak
2. de…nitely had a max
3. everywhere di¤erentiable
This is not true for all functions:

12

Graph: Functions Without Well-De…ned Maxima

y

10
8
6

no max or min
4
2

-3

-2

-1

1

2

3

4

5

6

7

8

9

7

8

9

10

x

-2
-4

y

10

8

infinitely many
max = min

6

4

2

-3

-2

-1

1

2

3

4

-2

13

5

6

10

x

y

10
9
8
7
6
5

min but no max

4
3
2
1
0
0

1

2

3

4

5

6

7

8

9

10

x

So, the condition we have stated, the FOC, is not su¢ cient for a point to be a
maximum.
Indeed, it is not even necessary, if we allow for functions that are not di¤erentiable.
There is a standard approach to dealing with this that handles these weird cases
for di¤erentiable functions. This method is known as the second-order conditions.
It basically says that the second derivative has to be negative for a maximum.
What is a second derivative? It’s just a derivative of a derivative. And you
probably remember, or can at least see intuitively, why this makes sense: If the
second derivative is negative, the derivative is getting smaller.
Don’t worry about this for now. I will review it again in a few examples where
it is relevant later.
Most, although not all, of the problems we examine are “nice.” For now, I want
you to be aware of the fact that some problems are not "nice." We will see some
examples where it is relevant later. But, it’s not the key thing to focus on now - just
be sure to understand the intuition and mechanics of the FOC.
To be clear, it is very important that you be aware that the FOC is not a su¢ cient
condition; there are special cases where the point that satis…es the FOC is not the
14

maximizing point.
But we’re not going to worry about the details yet or to a
signi…cant degree in this course overall.
NB: everything I’ve said is applicable for …nding minima instead of maxima. That
is one reason we have to check the SOC’s. But again, in most applications that we
will consider, this will take care of itself.

3

Partial Derivatives

The next, and basically last, calculus topic that we need is partial derivatives.
The reason is that many interesting economics examples relate one variable, say
y, to two (or more) other variables, say k and l. A common example can be found
in a production function:
y = f (k; l)
or, in a utility function,
u = u(x1; x2 )
So, the standard calculus of one variable is not su¢ cient.
Imagine that we have a function of two variables, e.g.,
y = f (x; z)
Now, this is a bit more of a pain graphically.

15

But, in principle, we can draw this:

Graph: A Function of Two Variables
z = (x 4)2 =8 (y 4)2 =8 + 8
10

z

5
0
0 0

5

5

x

y

10

10

So, y changes in responses to both x and z.
If we held one variable constant – that is, looked at a particular slice of this
picture in either the x or z direction –we would see a univariate function.
If we were only working with that, then we might just apply the standard approach from before.
So, we might consider the rate of change of y with respect to either one of those
variables.
It is therefore natural to de…ne what are called partial derivatives:
f (x + h; z)
@y
= lim
h!0
@x
h

f (x; z)

Now, this might look messy. But it simply treats z as a constant, and then takes
a standard derivative.
This is easiest to see by considering examples. Assume
16

y = xz
Then
@y
= z:
@x
Why? Because if we treat z as a constant, then y equals just a constant times x,
and we know how to take that derivative.
What exactly is this partial telling us? It is telling us the rate at which y changes
as we change x, holding z constant.
Furthermore, it makes sense that this depends on the value of z. Take z = 0 then changing x has no e¤ect on y.
Of course, we could also think about the e¤ect of z on y. To calculate that, we
take the derivative of y with respect to z, treating x as a constant:
@y
= x:
@z
So, if we have a function
y = f (x1; x2 ; : : : xn )
i.e., a function of n variables, there will be exactly n partial derivatives.
More examples: Let
y = ax + bz + cq
Then
@y
=a
@x
@y
= b:
@z
@y
= c:
@z

17

Now say
y = x2 z 3
Then
@y
= 2xz 3
@x
@y
= 3x2 z 2 :
@x
Or, let
u(x1; x2 ) = x1 x2
Then
@u(x1 ; x2 )
= x1
@x1

1

@u(x1 ; x2 )
= x 1 x2
@x2

3.1

x2
1

Discussion:

You need to know two things about partials.
First, given a general function or some speci…c function, you should know how to
calculate them.
That should be pretty straightforward, since once you understand the approach –
treat all other variables as constants, and then apply standard rules from univariate
calculus –it’s a totally mechanical application of univariate calculus.
Second, you need to know how to interpret partials.
This again should not be hard; it is just a tad di¤erent than the univariate case,
but in a way that matters.
18

In words, the partial of a function with respect to one argument is the rate of
change in the function in response to a small change in the argument, holding the
other arguments …xed.
This is di¤erent than adjusting both arguments.
For example, increasing a consumer’s consumption of goods 1 and 2 is normally
going to have a di¤erent e¤ect on utility than just increasing, say, good 1.
As a second example, increasing both K and L will have a di¤erent e¤ect than,
say, increasing L and holding K constant.
We’ll see this in practice soon.

4

Optimization of Functions of Several Variables

The last topic we need to consider is how to …nd the maximizing values for functions
of several variables.
Indeed, this is the case of real interest, since key examples in economics are of
this variety.
That is what creates all the tension about how much math to use in intermediate
courses.
Everyone agrees that it’s nice to be able to use calculus. But it turns out that
we need just a little bit of multivariate calculus.
Virtually all basic calculus courses, however, focus only on univariate, rather than
multivariate, calculus; in particular, they do not teach partial derivatives. Thus, in
most sequences, you do linear algebra and then multivariate calculus. This makes
sense, since you need linear algebra (but only a tiny amount) for some parts of
multivariate calculus. But this standard approach makes life di¢ cult.
So, the key tool we need to do micro theory with calculus is partial derivatives.
That means that if we cannot use partials, the bene…ts of using calculus are not
large; that’s why most books put it in an appendix, or skip it entirely.
19

That’s also why many departments do not require calculus for an econ major;
Harvard did not until 10 or 15 years ago.
But this seems nutty to me: for good students who have had some introduction
to basic calculus, learning a partial derivative is not a big deal; it’s really just a baby
step away from what you already know. Indeed, if you think about it the right way,
you already know what a partial is, as we have seen.
Now we can see why it is useful.
Let’s …rst consider an abstract example, because there’s one small wrinkle that
I want to leave aside for the moment that comes in when we get to the economics
examples.
Say we have
y = f (x1; x2 )
We know that if f is a “smooth”function, it could look something like:

20

Graph: A Smooth Function of Two Variables
z = (x 4)2 =8 (y 4)2 =8 + 8
10

z

5
0
0 0

5

5

x

y

10

10

We also know we could think about this in only one of two dimensions.
Then this would look like:

21

Graph: A Slice of the Graph Above
z = (x 4)2 =8 (4 4)2 =8 + 8

z

10
8
6
4
2

-10

-8

-6

-4

-2

2

4

6

-2

8

10

12

14

16

18

20

x

-4
-6
-8
-10

So, intuitively, we want to make sure we’re at a peak from either angle.
Well, looking from either angle is like holding one of x1 or x2 …xed.
So, say we do the following: calculate
@y=@x1
and
@y=@x2
set both to zero, and …nd the combination of x1 and x2 that simultaneously
solves the two equations.
I am assuming you can see intuitively that this is analogous to the univariate
case.
In words, if we are at the combination of x1 and x2 that produces the maximum
y, then two things must be true:
22

1) A small change in x1 leads to a decrease in y, whichever direction we go in.
2) A small change in x2 leads to a decrease in y, whichever direction we go in.
Thus we have two conditions involving two unknowns, and we can solve these.
These two equations need not be linear.
Under some regularity assumptions, there will be an x1 and an x2 that works.
Take for an example
y=

3x21

2x22 + 5x1 x2 + x1 + x2

Then, the FOC’s are
@y=@x1 =

6x1 + 5x2 +

=0

@y=@x2 =

4x2 + 5x1 +

=0

This is just two linear equations in two unknowns. We can easily solve for x1 and
x2 .
Of course, these are just …rst-order conditions. As with the univariate case, we
have to worry about whether we’re getting a max or min; we also have to worry
about kinks and boundaries, etc.
Ignore all this for the time being, but be aware that there could be an issue. We’ll
look more carefully at some particular cases as needed.

5

Constraints

So far, we have not talked about the multivariate case under the assumption that
there might be constraints.
We are going to …nesse that issue for the most part, in ways you will see shortly.
So, it is again something we will have to worry about a bit, but it’s best handled
case-by-case with speci…c examples, rather than with general theory.
23