Math Language

Fundamentals of Mathematical Economics
Muhammed Halim Dalgin

March 15, 2007
Chapter 1
Mathematics as a Language
Mathematics is a deductive scienceie conclusions are drawn from what have
been assumed. While doing this, mathematics uses symbols which represent
dierent things in dierent contexts. But symbols have to be put together
or operated consistently in order to make meaningful statements about some
given premises. Therefore, mathematics has its own rules to convey information
and meaning. In this class we will learn about these rules as well as logical
processes to make meaningful statements in the context of economics. Mathe-
matics greatly helps with the process of building theories.
1.1 Building Theories
What is a theory? It is a thing with the following three parts:
1. denitions
2. assumptions (have to be consistent with one another)
3. hypotheses (if ... then statements; that is, predictions). These follow
from the assumptions.
For example one might have a theory to explain the eects of scal policy.
The following is not a theory, just examples of denitions, assumptions, and
predictions.
e.g. of a denition: aggregate income is the $ value of everything produced
in the year
e.g. of an assumption: tax revenues are an increasing function of aggregate
income
e.g. of a prediction: if government expenditures, G, increase, aggregate
income, Y, will increase.
A theory must have all three parts and the hypotheses must follow logically
(can be deduced) from the denitons and assumptions. The above example
denition, assumption and hypothesis is not a theory, the hypothesis does not
follow from the denition and one assumption.
1
Consider the following example of a theory. Denitions of men, cry and
Rambo. Assume all men cry. Also assume Rambo is a man. What logically
follows? Rambo cries. This is a theory. Note that predictions do not follow
from one and only one assumption, it is just a restatement of that assumption.
For example, the statement it will rain tomorrow is not a theory.
Consider the Rambo example again.
dene Rambo, men, cry, etc.
Assumptions: (1) All men cry (2) Rambo is a man
prediction: Rambo cries
This is a theorynote that the prediction required more than just one of
the assumptions. Consider the alternative theory. Assumption: Rambo cries.
Prediction: Rambo cries. Is this a theory? No, because it is just a restatement
of the assumption. Each prediction must require more than one assumption.
Is the following statement a hypothesis (prediction in scientic sense of the
word)? The end of the world is coming
Consider now the process of deriving predictions from the assumptions and
denitions. The process of deduction is often dicult, but can be made eas-
ier by describing the assumptions mathematically. In which case, the body of
mathematical logic can be applied to the problem.
1. mathematical symbolism is precise
2. things are expressed neatly and compactly, so its easier not to get confused
3. there are all these math theorems out there to help us mathematically
deduce the predictions
E.g. of an assumption in words: the aggregate level of consumption increases
as the level of aggregate income increases. To mathematically express this as-
sumption we will use the symbol C to represent aggregate consumption; Y to
represent aggregate income. So mathematically we will write our assumption as
C = f(Y ),
dC
dY
= f
(Y ) > 0
1.2 Necessary and Sucient Conditions
An understanding of these two terms is a necessary condition for understanding
economic models and theories. Consider two things denoted by A and B.
A is necessary for B if the existence of B requires the existence of A.
Said in symbols: (A is necessary for B) (notA notB) (B A)
(Bis sucient for A). What does mean? It means implies. means the
implication goes both directions. If A B, A and B are equivalent. Said
another way, (A B) A iff B.
A is sucient for B if the existence of B requires the existence of A.
Said in symbols: (A is sucient for B) (notB notA) (A B)
(Bis necessary for A).
2
1.3 Set Theory
A set is a collection of things. We use uppercase letters to denote a set.
For example, S {the set of all dogs who live on the Yeditepe campus}. Some
symbols are reserved for special sets: denotes the empty set (a set with no
members), and denotes the universal set (the set that includes everything).
Now consider the following set that might be important to a rm that uses
only two inputs, l and k, to produce its output:
A {(l, k) : wl +rk m, l 0, k 0}
where l is the quantity of labor, w, the wage rate, is its price, k is the quantity
of capital, r is the rental price of capital, and m is some amount of money.
Dene this set in words all those bundles of labor and capital that the rm can
purchase for $m or less. Graph the set. What do we call this set? To picture
this set, solve wl +rk = m. Solution is: k =
1
r
(mlw) if r = 0. Grahp this set
for values m = 100, w = 25, and r = 40, k =
1
40
(100 l25)
Now, we are going to look at a more general input requirement set. Assume a
rm produces product x using the inputs l and k. Further assume the production
function
x = f(k, l)
which identies the maximum number of units of output that can be produced
using k units of capital and l units of labor. Consider the following sets
I(x) = {(l, k) : f(k, l) x, l, k 0}
Contrast the above set with
I
x
(x) = {(l, k) : f(k, l) = x, l, k 0}
This set is the isoquant for output level x. Dene the isoquant in words. For
example, if x = f(k, l) = kl
.5
There are operations with sets such as union and intersection but I assume the
reader is already familiar with all these.
1.4 Functions
What is a function? What does it mean to say that y = f(x), where x and y
are variables. A function associated with each value of the variable x a unique
value of the variable y. We will mostly stick with the variables x and y but keep
in mind that nothing is special about those letters. We could write a = h(b).
To dene a function we need three things. (1) The domain of the function, say
X; (2) the range of the function, say Y ; (3) a rule that associates with each
member of X a unique member of Y , say f. Hence, y = f(x) is a mapping
from X to Y , and the value that y takes for a given x is called the image of x
3
shown by f(x). If the mapping is not a unique mapping, it is not a function.
For example, y = f(x) = x
2
is a function. In contrast,
y = f(x) =
_
3 if x 5
7 if x 5
is not a function
All functions are relationships, but all relationships are not functions. Being a
function is sucient to be a relationship. Being a relationship is necessary but
not sucient to be a function. For example, {(x, y) : y x} is a relationship,
but does not dene a function y = f(x). Why? Graph this relationship with x
on the horizontal axis and y on the vertical axis. Does this relationship dene a
function x = g(y)? How about x as a function of y? Is the following a function?
{(x, y) : y = 5 if x 5 and y = 2x if x > 5}
.
4
Chapter 2
Matrix Algebra
2.1 Matrices
Consider a system of m linear equations in n unknowns:
y
1
= a
11
x
1
+a
12
x
2
+ +a
1n
x
n
,
y
2
= a
21
x
1
+a
22
x
2
+ +a
2n
x
n
, (2.1)
.
.
.
y
n
= a
m1
x
1
+a
m2
x
2
+ +a
mn
x
n
,
There are three sorts of elements here:
The constants: {y
i
: i = 1, . . . , m},
The unknowns: {x
j
: j = 1, . . . , n},
The coecients: {a
ij
: i = 1, . . . , m, j = 1, . . . , n};
and they can be gathered into three arrays:
y =
_
_
y
1
y
2
.
.
.
y
m
_
_
, A =
_
_
a
11
a
1n
a
21
a
2n
.
.
.
.
.
.
.
.
.
a
m1
a
mn
_
_
, x =
_
_
x
1
x
2
.
.
.
x
m
_
_
The arrays y and x are column vectors of order m and n, respectively whilst
the array A is a matrix of order mn, which is to say that it has m rows and
n columns. In matrix terms the above system can be represented as follows:
y = Ax (2.2)
There are two objects on our initial agenda. The rst is to show, in detail,
how the summary matrix representation corresponds to the explicit form in
(2.1). For this purpose we need to dene, at least, the operation of matrix
multiplication. The second object is to describe a method for nding the values
of the unknown elements. Each of the m equations is a statement about a linear
relationship amongst the n unknowns. The unknowns can be determined if and
only if there can be found, amongst the m equations, a subset of n equations
which are mutually independent in the sense that none of the corresponding
statements can be deduced from the others.
5
Elementary Matrix Operations
It is often useful to display the generic element of a matrix together with the
symbol for the matrix in the summary notation. Thus, to denote the m n
matrix A, we write A = [a
ij
]. Likewise, we can write y = [y
i
] and x = [x
j
] for
the vectors. In fact, the vectors y and x may be regarded as degenerate matrices
of orders m 1 and n 1 respectively. The purpose of this is to avoid having
to enunciate rules of vector algebra alongside those of matrix algebra.
Matrix Addition If A = [a
ij
] and B = [b
ij
] are two matrices of order mn,
then their sum is the matrix C = [c
ij
] whose generic element is c
ij
= a
ij
+c
ij
.
The sum of A and B is dened only if the two matrices have the same order
which is m n; in which case they are said to be conformable with respect to
addition. Notice that the notation reveals that the matrices are conformable by
giving the same indices i = 1, . . . , m; and j = 1, . . . , n to their generic elements.
The operation of matrix addition is commutative such that A+B = B +A
and associative such that A+(B+C) = (A+B)+C. These results are, of course,
trivial since they amount to nothing but the assertions that a
ij
+b
ij
= b
ij
+a
ij
and that a
ij
+ (b
ij
+c
ij
) = (a
ij
+b
ij
) +c
ij
for all i, j.
Scalar Multiplication of Matrices The product of the matrix A = [a
ij
] with
an arbitrary scalar, or number, is the matrix A = [a
ij
].
Matrix Multiplication The product of the matrices A = [a
ij
] and B = [b
ij
] of
orders mn and np respectively is the matrix AB = C = [c
ij
] of order mp
whose generic element is c
ij
=

n
k=1
a
ik
b
kj
= a
i1
b
1j
+a
i2
b
2j
+ +a
in
b
nj
.
The product of AB is dened only if B has a number n of rows equal to the
number of columns of A, in which case A and B are said to be conformable
with respect to multiplication. Notice that the notation reveals that the matrices
are conformable by the fact that j is, at the same time, the column index of
A = [a
ij
] and and the row index of B = [b
jk
].
The operation of matrix multiplication is not commutative in general. Thus,
whereas the product of A = [a
ij
] and B = [b
ij
] is well-dened by virtue of the
common index j, the product BA is not dened unless the indices i = 1, , m
and k = 1, , p have the same range, which is to say that we must have m = p.
Even if BA is dened, there is no expectation that AB = BA, although this is a
possibility when A and B are conformable square matrices with equal numbers
of rows and columns.
Transposition The transpose of the matrix A = [a
ij
] of order m n is the
matrix A
= [a
ji
] of order n m which has the rows of A for its columns and
the columns of A for its rows.
The basic rules of transposition are as follows
1. The transpose of A
is A, that is (A
= A
2. If C = A+B, then C
= A
+B
3. If C = AB, then C
= B
Matrix Inversion If A = [a
ij
] is a square matrix of order n n, then its
inverse, if it exists, is a uniquely dened matrix A
1
of order n n which
6
satises the condition AA
1
= A
1
A = I, where I = [
ij
] is the identity matrix
of order n n which has units on its principal diagonal and zeros elsewhere. A
matrix that has an inverse is said to be nonsingular.
In this notation,
ij
is Kroneckers delta dened by
ij
=
_
0, if i = j;
1, if i = j.
The basic rules aecting matrix inverses are as follows
1. The inverse of A
1
is A itself, that is (A
1
)
1
= A
2. The inverse of the transpose is the transpose of the inverse, that is,
(A
)
1
= (A
1
)
3. If C = AB, then C
1
= B
1
A
1
To prove above statements note that the rst one comes from the denition
of the inverse. The second one comes from from comparing the equations I =
(AA
1
)
= (A
1
)
and I = (A
1
A)
= A
(A
1
)
with the right and left

inverses of A
. The third rule, which is the reversal rule of matrix inversion, can
be understood by considering the two equations which dene the inverse of the
product AB for the right and left inverse of AB.
One application of matrix inversion is to the problem of nding the solution
of a system of linear equations such as the system under (2.1) which is expressed
in summary matrix notation under (2.2). Here we are looking for the value of
the vector x of unknowns given the vector y of constants and the matrix A of
coecients. The solution is indicated by the fact that
If y = Ax and if A
1
exists, then A
1
y = A
1
Ax = x.
Now we are going to give a technique to obtain the inverse of a matrix. But in
order to do that we will need a grounding in determinants. But as a last item
lets note two things about vectors: linear independence and vector multiplica-
tion. A set of vectors is said to be linearly dependent if (and only if) any
one of them can be expressed as a linear combination the remaining vectors;
otherwise, they are said to be linearly independent.
Example 2.1 The three vectors v
1
=
_
2
7
_
, v
2
=
_
1
8
_
, andv
3
=
_
4
5
_
are linearly dependent because v
3
is a linear combination of v
1
and v
2
:
3v
1
2v
2
= 3v
1
=
_
2
7
_
2v
2
=
_
1
8
_
= v
3
=
_
4
5
_
Note that we can also express the last equation as 3v
1
2v
2
v
3
= 0 where 0
is the null vector. Therefore we can, alternatively, dene linear independence
as follows: given a set of vectors v
1
, v
2
, . . . , v
n
; if in their linear combination,
c
1
v
1
+c
2
v
2
+ +c
n
v
n
= 0, all the coecients have to be zero then the vectors
are said to be linearly independent, otherwise they are said to be linearly depen-
dent. Consequently, the rank of matrix is dened as the maximum number
7
of linearly independent rows or columns if we think each row or column of the
matrix as a vector. Note that a matrix has a unique rank because maximum
number of linearly independent rows must be equal to the linearly independent
columns. Moreover, a square matrix is said to be full rank if all the rows, or
columns of it are linearly independent, which determines whether a matrix is
invertible or not as we will later see. Finally, we are going to state an important
theorem about the existence of the inverse of a matrix.
Theorem 2.1 A matrix is nonsingular if and only if it has full rank.
2.2 Determinants
Informally the determinant of a square matrix can be dened as the scalar
representing information in the matrix. So, if we view determinant as a function,
its domain will be a subset of all square matrices and its range will be the real
line. For a 2 2 matrix A its determinant is dened as follows:
det(A) =
a b
c d
= ad bc
Similarly, the determinant of a 3 3 matrix A can be found as follows:
det(A) =
a b c
d e f
g h i
= a
e f
h i
d f
g i
+c
d e
g h
Note that the smaller order determinants on the right hand side are called
minor of a, b, and c, respectively. In general, the subdeterminant given a
matrix A = [a
ij
] obtained by deleting ith row and jth column of A is called the
ijth minor of A and it is denoted by |M
ij
|. Another concept related to minor
is that of cofactor, which is a signed minor. More formally it is dened as
|C
ij
| = (1)
i+j
|M
ij
|
In terms of cofactor expansion we can rewrite the 3rd order determinant calcu-
lations as follows
det(A) =
3
j=1
a
ij
|C
ij
|
In the calculation above we calculated the determinant expanding it by the
cofactor of its rst row. As a matter of fact, the determinant can be calculated
expanding it by the cofactor of any of its rows. This particular technique of
calculating determinants is called the Laplace expansion. It is easy to extend
this technique to the nth order, although it gets quite messy after 3rd order
determinants. Some of the basic properties of determinants are as follows
1. Property I The interchange of rows and columns does not aect the value
of the determinant. In other words, det(A) = det(A
).
2. Property II The interchange of any two rows (or any two columns) will
alter the sign but not the numerical value of the determinant.
8
3. Property III The multiplication of any one row (or any one column) by
a scalar k will change the value of the determinant k-fold.
4. Property IV The addition (subtraction) of a multiple of any row to
(from) another row does not change the value of the determinant. The
same is true for columns as well.
5. Property V The expansion of a determinant by alien cofactorsthe
cofactors of a wrong row or columnalways yields a value of zero.
Note that from the last property we can deduce that if one row is a multiple of
another row this implies that the value of the determinant will be zero. Since,
if a row is a multiple of another row or it is a linear combination of other
rows, then not all the rows of the matrix are linearly independent; hence the
matrix is not of full rank. In the last section the theorem we stated about the
existence of the inverse of a matrix can now be stated in terms of determinants
as follows: a matrix is nonsingular if and only if its determinant is not equal
to zero. Furthermore, because |A| = |A
| it is really not important whether we

are talking about column dependence or row dependence, which must mean the
same thing anyway.
Finally, a geometric interpretation of determinant can be given if we think of
the columns of the determinant as vectors in their proper space, then the de-
terminant will be the area (in a two- dimensional space) of the parallelogram
enclosed by the column vectors. Similarly, it can be dened for more than two
dimensions.
2.3 Finding The Inverse Matrix
Now we know how to test whether a given matrix A has an inverse or not by
the criterion |A| = 0, the next question to answer is that: how can we nd the
inverse of A? To be able to do we need to introduce a few more concepts in
order to facilitate our language and to talk about it properly.
Let A be a given square matrix, then if we replace each entry of A by its
cofactor then we will obtain the cofactor matrix of A, call it C. Therefore,
the cofactor matrix will look like C = [c
ij
= |C
ij
|]. Indeed, the transpose of the
cofactor matrix of A is called the adjoint of A. If A = [a
ij
] is given then its
adjoint will look like
C
adjA
_
_
|C
11
| |C
n1
|
|C
12
|C
n2
|
.
.
.
.
.
.
.
.
.
|C
1n
| |C
nn
|
_
_
According to the last property of determinants we listed above, expansion
by an alien cofactor is zero. Symbolically we can write that as
n
j=1
a
ij
|C
i
j
| = 0 (i = i
)
9
or
n
i=1
a
ij
|C
ij
| = 0 (j = j
)
where the rst expansion is by the ith row and cofactors of i
th row; the second

expansion is by the jth column and cofactors of j
th column. Note that in the

above formulation expansion either by an alien row cofactor or column cofactor
yields the same result. The reason for that simply because expansion by an
alien cofactor is like expanding a determinant whose two rows or columns are
identically the same. Now lets try to nd out what happens when we multiply
a matrix by its adjoint.
AC
=
_
n
j=1
a
1j
|C
1j
|

n
j=1
a
1j
|C
2j
|

n
j=1
a
1j
|C
nj
|
n
j=1
a
2j
|C
1j
|

n
j=1
a
2j
|C
2j
|

n
j=1
a
1j
|C
nj
|
.
.
.
.
.
.
.
.
.
.
.
.
n
j=1
a
nj
|C
1j
|

n
j=1
a
nj
|C
1j
|

n
j=1
a
nj
|C
nj
|
_
_
= |A|I
n
But this result is indeed very suggestive of a method to nd the inverse of a
matrix. Because AC
= |A|I
n
We can divide both sides by |A|, then we get
AC
|A|
= I A
1
=
1
|A|
adjA.
Therefore, we have found a way to convert a matrix.
Example 2.2 Find the inverse of A =
_
3 2
1 0
_
2.4 Cramers Rule
There is a convenient method to solve systems of linear equations, which is
known as Cramers rule. Since we have done much of the grunt work, we can
easily derive the method. Given a system of linear equations Ax = y, where A
is an n n nonsingular matrix, we can write the solution as
x = A
1
y =
1
|A|
(adjA)y
Once we carry out the matrix multiplication we get the solution for each com-
ponent of x as
x
i
=
1
|A|
n
j=1
y
j
|C
ji
|
As a matter of fact we can simplify the solution by writing it as
x
i
=
|A
i
|
|A|
where A
i
is the system matrix whose ith column has been replaced by the
column vector y.
10
Example 2.3 (National Income Determination) Using the simplest macro-
economic model where behavioral equations are given as follows
Y = C +I
0
+G
0
(National Income Identity)
C = a +bY (a > 0, 0 < b < 1)
nd the equilibrium values of Y and C using the Cramers rule.
Note on Homogeneous Equation Systems
Suppose
Ax = 0
If A is of full-rank and y = 0, then only the trivial solution, x = 0, exists (why?).
The only way to get a non-trivial solution is to have |A| = 0, ie A must have less
than full-rank. In general if number of unknowns are greater than the number
of linearly independent rows of the system or matrix, then there will exist a
non-trivial solution, in fact innitely of them.
2.5 Equilibrium
Equilibrium is dicult to dene. In the text we are using, the author uses
the denition which is equilibrium is a constellation of selected interrelated
variables so adjusted to one another so that no inherent tendency to change
prevails in the model which they constitute. This denition looks a little bit
mystical and poetical but it is not of much help if we want to operationalize
it. He also says that equilibrium for a specic model is a situation that is
characterized by a lack of tendency to change. That is more straight forward,
but what he is really talking about is static equilibrium, not equilibrium in
general. Let us begin by explaining how we will view the concept of static
equilibrium. We will do it in the context of models.
Static Equilibrium and Comparative Statics
Variables are things that vary. The variables whose values are determined within
our model (the variables we are trying to explain) are called the endogenous
variables. The variables whose values are determined outside of our model
have their values determined exogenously, are called exogenous variables.
Their values are given from the perspective of our model (models have to start
somewhere, that is they start by taking some things as given). The values
of the endogenous variables are related to some of the exogenous variables by
behavioral assumptions, which can be described by mathematical functions.
Denition 2.1 A system is in static equilibrium if, for given values of the
exogenous variables, there is no tendency for any of the endogenous variables to
change values.
That is, if the endogenous variables are all at their equilibrium values, for the
given values of the exogenous variables, then they will remain at those values
11
until the system is perturbed (one or more of the signicant exogenous vari-
ables change values). We will refer to this type of equilibrium as a static
equilibrium, because the equilibrium values of the endogenous variables are
staticthey do not change. If a system is in equilibrium and one of more of the
exogenous variables change, the equilibrium values of the endogenous variables
will often change; that is, there will be a new equilibrium.
One of the things that we are often trying to do in economics is to gure out
how the equilibrium values of the endogenous variables change when the values
of the exogenous variables change, e.g. what happens to the equilibrium level
of aggregate income if the exogenous level of government expend increases. The
process of doing this is called comparative statics, because one is comparing
two static equilibriums. For example, if we determine that whenever a particu-
lar exogenous variable increases in value, the equilibrium value of a particular
endogenous variable decreases in value, we have derived a qualitative compar-
ative static prediction from our model/theory. It is a hypothesis that can be
tested. Now that we understand the notion of a static equilibrium, lets contrast
static with dynamic equilibrium.
Dynamic equilibrium
In a dynamic equilibrium, the equilibrium values of the endogenous variables do
no have to be stationarystaticbut rather just have to be on their equilibrium
paths. For example, the planets are in motion around the sun; one can be in
dynamic equilibrium while biking or skiing (sometimes not). If for given values
of the exogenous variablesor given paths of the exogenous variables there
is no tendency for the paths of the endogenous variables to change, then the
system is in equilibrium.
Partial Market Equilibrium vs General Market Equilibrium
Suppose in an economy we have only two goods. Equilibrium in any of these
markets would mean that
S
i
= D
i
, i = 1, 2
or let E
i
be the excess demand in the ith market, then the ith market is in
equilibrium i E
i
= 0. Lets rst consider the partial market equilibrium. In
this kind of equilibrium, we focus only on one market as if it is in isolation from
the rest of the economy so that demand and supply only include the price of
the good which is traded in the market. So the market demand and supply are
given as follows
S
1
= a +bP
1
D
1
= c dP
1
_
Behavioral Equations
And the market equilibrium condition is given as
S
1
= D
1
or E
i
= 0
Equilibrium condition should yield the price and quantity. The elimination
method yields that
P
1
=
c +a
b +d
, Q
1
=
bc ad
b +d
12
General Market Equilibrium
In this context we will suppose that demand for one good will depend on the
price of the other good as well as on its own price.
S
1
= a
0
+a
1
P
1
+a
2
P
2
D
1
= b
0
+b
1
P
1
+b
2
P
2
S
2
=
0
+
1
P
1
+
2
P
2
D
1
=
0
+
1
P
1
+
2
P
2
_
_
Behavioral Equations
The market equilibrium conditions are given as
E
i
= S
i
D
i
= 0, i = 1, 2.
To nd the equilibrium in both markets we need to solve for price and quantity
in both markets simultaneously. Let c
i
= b
i
a
i
, i = 0, 1, 2 and
i
=
i
i
, i =
0, 1, 2, then we can write the equilibrium condition in matrix terms as follows,
_
c
1
c
2
1

2
_ _
P
1
P
2
_
=
_
c
0
0
_
We will use the Cramers rule to solve the above system. Note that rst of all
in order to be able to solve the system we need to have the determinant of the
coecient matrix dierent than zero.
c
1
c
2
1

2
= c
1
2
c
2
1
= 0
As a matter of fact we will impose this nonzero determinant requirement. Note,
this is more less a condition on the slopes. Hence, using the Cramers rule we
obtain that
P
1
=
c
0
2
+
0
c
2
c
1
2
c
2
1
and
P
2
=
c
1
0
+
1
c
0
c
1
2
c
2
1
13
Chapter 3
Calculus
So far we covered some topics from linear algebra. In this chapter we will
summarize some topics from calculus that will be important for us in our study
of mathematical economics. First of all we can comfortably say that calculus
is the mathematics of change, indeed innitesimal change as we will see. This
may not be so surprising as it was invented by Isaac Newton.
3.1 Limits and Continuity
If x R
n
and y R
n
, then the Eucledean distance between x = (x
1
, . . . , x
n
)
and y = (y
1
, . . . , y
n
), denoted d(x, y), is given by
d(x, y) =
_
n
i=1
(x
i
y
i
)
2
_
1/2
.
Given this denition of distance, for any > 0 we can dene neighborhood
of a particular x
0
R
n
, denoted N
(x
0
), as all the points in R
n
less than > 0
from x
0
in distance. Thus an neighborhood of x
0
R
n
is given by
N
(x
0
) = {x
0
R
n
|d(x
0
, x) < }.
Hence, we can easily dene the limit of a function y = f(x) as variable x
approaches a particular value x
0
, written lim
xx
0
f(x), is the value that y ap-
proaches as x gets closer to x
0
. More formally, y
0
is the limit of f(x) as x
approaches x
0
i for any > 0, however small, there exists a > o such that
x N
(x
0
)/{x
0
} implies that f(x) N
(y
0
).
Once we have the denition of limit we can dene continuity in terms of it as
follows: the function f : R
n
R is continuous at x
0
R
n
i lim
xx
0
f(x) =
f(x
0
). A function is continuous i it is continuous at every point in its domain.
3.2 Calculus of One Variable
If f : R R is a function of one variable, then the derivative of the function,
denoted by df/dx or f (x), is dened as
f
(x) = lim
x0
f(x + x) f(x)
x
.
14
If this limit exits at a particular x
0
R, then the function is dierentiable
at x
0
and f
(x
0
) is the derivative of the function evaluated at the point x
0
.
Geometrically the derivative is the slope of the tangent to the function at the
point. A function is continuously dierentiable on a certain domain if it is
dierentiable and if f
(x) is continuous for all the points in its domain.

Some Rules of Dierentiation
da
dx
= 0 a is a constant
d(x
b
)
dx
= ba
b1
a and b are constants (b = 0)
d(logx)
dx
=
1
x
d(e
x
)
dx
= e
x
d(e
f(x)
)
dx
= e
f(x)
f
(x)
d(a
x
)
dx
= a
x
logx a is a constant
d[f(x) +g(x)]
dx
= f(x) +g(x)
d[f(x)g(x)]
dx
= f
(x)g(x) +f(x)g
(x) product rule

d[f(x)/g(x)]
dx
=
g(x)f
(x) f(x)g
(x)
[g(x)]
2
quotient rule
y = f(x) andx = g(z) imply that
dy
dz
=
dy
dx
dx
dz
= f
(x)g
(z) chain rule

Higher Order Derivatives
The derivative of a function that is already a derivative is a second order
derivative. If y = f(x), then f
(x) is the rst derivative and f
(x) = d[f
(x)]/dx
is the second derivative, denoted either f
(x) or d
2
f/dx
2
. Derivatives of a higher
order (third, fourth, etc.) are computed in a similar manner.
Taylor Series
Let f : R R be a one variable function with continuous derivatives of all
orders. The function f(x) can always be written as the following Taylor poly-
nomial expansion around any point x
0
R:
f(x) = f(x
0
) +
f
(x
0
)
1!
(x x
0
) +
f
(x
0
)
2!
(x x
0
)
2
+ +R
n
(x),
where R
n
(x) is the remainder of the series. For some special type of functions,
called analytic functions, the remainder term R
n
(x) will approach zero as n
15
. However, most of the time we will be using only the linear approximation
to f:
f(x)
= f(x
0
) +
f
(x
0
)
1!
(x x
0
).
Example 3.1 Find the Taylor series expansion of f(x) = e
x
around x = x
0
and x = 0.
3.3 Maxima and Minima
Throughout this section, we will assume that function f is a one variable func-
tion dened on a subset of the real numbers, that is, f : D R with D R:
A point x
D is a global maximum of f if
f(x
) f(x) for all x D.

A point x
D is a strict (unique) global maximum of f if

f(x
) > f(x) for all x D, x = x
.
A point x
D is a local maximum of f if it is a maximum for all x within

some neighborhood of x
, that is, if
f(x
) f(x) for all x N
(x
).
A point x
D is a strict (unique) local maximum of f if it is strict

maximum for all x within some neighborhood of x
, x = x
that is, if
f(x
) > f(x) for all x N
(x
).
Global minima, strict global minima, local minima, and strict local minima are
all dened by reversing the inequalities in the above denitions.
If f is dierentiable, the the rst-order (necessary) condition for x
to
be an interior optimum (maximum or minimum) is that x
be a critical point of
the function, ie f
(x
) = 0. However,this is only a necessary condition, it does

not guarantee that x
is either a maximum or minimum. Actually, a graphical

argument shows that a sucient condition for strict local maximum (second
order condition is that f
(x
) < 0. Similarly, a sucient condition for strict

local minimum is given by f
(x
) > 0.
3.4 Multivariate Calculus
Partial Derivatives
If f : R
n
R is a real valued function of n real variables
y = f(x
1
, x
2
, . . . , x
n
),
then the partial derivative with respect to variable x
i
, denoted f/x
i
is
given by
f
x
i
= lim
x
i
0
f(x
1
, . . . , x
i
+ x
i
, . . . , x
n
) f(x
1
, . . . , x
n
)
x
i
.
16
Thus the partial derivative is the instantaneous rate of change in the function
when variable x
i
changed and all other variables in the function are held con-
stant. Geometrically the partial derivative is the slope of the tangent to the
function parallel to yx
i
plane.
Example 3.2 Find the partial derivatives of z = f(x, y) = x + 2x
1/2
y
1/2
.
It is often useful to characterize the n partial derivatives of an n-variable function
in vector notation. Such a vector of partial derivatives is called a gradient of
the function , and it is denoted f. Thus,
f =
_
f
x
1
, . . . ,
f
x
n
_
is the gradient of the function f : R
n
R. If the partial derivatives are to be
evaluated at a particular point x
0
R
n
, then we write
f(x
0
) =
_
f(x
0
)
x
1
, . . . ,
f(x
0
)
x
n
_
.
First-order (necessary) conditions for the maximum or minimum of an n-
variable function are perfectly analogous to the one-variable case: vanishing
derivatives. If x
= (x
1
, . . . , x
n
) is a maximum (local or global) of an n-variable
dierentiable function, then f/x
i
= 0 i = 1, . . . , n. Given the gradient
notation above, if x
R
n
is a maximum of f : R
n
R, then
f(x
) = 0
As in one variable-case, this rst-order condition applies to either a maximum
or a minimum.
The Total Dierential
When f(x) is a dierentiable function of one variable, the dierential is given
by
dy = f
(x)dx
The dierential dy is the vertical change along the tangent to the function. The
change in the independent variable x = dx causes the value of the function to
change by y, while dy is the estimate of this change along the tangent line.
The smaller x is, the closer approximation (or total dierential) dy is to y.
If f is a dierentiable function of n variables, then the dierential (or total
dierential) dy is
dy =
f
x
i
dx
i
+ +
f
x
n
dx
n
=
n
i=1
f
x
i
dx
i
.
If the gradient notation is used, then the total dierential of an n-variable func-
tion is
dy = f(x) dx
where dx = (dx
1
, . . . , dx
n
)
is a column vector.
17
Chapter 4
Economic Applications of
One Variable Calculus
In economics use of marginal analysis is paramount. In fact during 1870s we
can talk about about a marginal revolution that changed how economics was
done afterwards. In the following we will see how this marginalism is applied in
economics.
4.1 Introductory Economics Applications
Demand and Marginal Revenue
Consider the specic linear demand curve given by
q = p + 5, (4.1)
where p is the market price and q is the quantity demanded. Total revenue (TR)
is given by TR = pq. To be able to nd the total revenue function we need to
rst nd the inverse demand function that is price as a function of quantity.
Hence, p = q + 5. Now we can nd the total revenue function as
TR = TR(q) = p(q)q = (q + 5)q = q
2
+ 5q.
In introductory economic courses, the marginal revenue (MR) is usually
dened as the change in total revenue caused by 1 more unit of output. Thus
for this denition, the marginal revenue is
MR = TR(q + 1) TR(q) =
TR
q
q=1
,
For instance, for the demand function in (4.1), the marginal revenue associated
with increasing output from 4 to 5 units is -4.
This 1-more-unit denition serves the purpose for an introductory class but
lets consider a denition based on the derivative. We will dene marginal
revenue not for 1-more-unit of output but rather for innitesimal change in
output. Lets dene then the MR as
MR =
dTR(q)
dq
18
Geometrically MR is the slope of the tangent to the TR curve at q. For our
specic demand function in (4.1)
MR =
dTR(q)
dq
= 2q + 5
Example 4.1 Find the MR curve for q = aP +b, a > 0, b > 0. In general
nd it for TR = p(q)q.
Elasticity
Consider the production function
y = f(k, l), k, l 0
One of the problems with marginal product functions is that the marginal prod-
uct is sensitive to the units in which x, l, and k are measured. For example,
changing the output from pounds of stu to tons of stu will drastically change
the numerical values of the marginal products. It would be nice to be able to
denote the productivity of the a marginal unit of input in a way that was not
sensitive to the units that the inputs and output were measured. Can we come
up with such a measure? Yes.
Consider expressing changes in percentage terms. E.g.,
%y
%l
is the percentage change in output y given a one percent change in the input
l, holding k constant. We call this the elasticity of output with respect to
labor. Also note that
y
y
= %y, therefore
%y
%l
=
y/y
l/l
=
y
l
y
l
Denition 4.1 The point elasticity of y with respect to l is
%y
%l
= lim
l0
y
l
l
y
=
l
y
lim
l0
y
l
Example 4.2 What is the point elasticity of output with respect to labor and
capital for the Cobb-Douglas with constant returns to scale?
Now let me give you an alternative way to calculate elasticities. It is the case
that
%y
%l
=
l
y
y
l

logy
logl
Use this formula to nd the two Cobb-Douglas output elasticities.
Example 4.3 Assume that
Y = C +I
0
+G
C = a +b(Y T
0
) a > 0, 0 < b < 1
G = gY 0 < g < 1, (g +b) < 1
Determine what happens to the equilibrium level of income if g increases by 1%.
19
Recall that price elasticity of demand is a measure of the relative price
sensitivity of the demand curve. It is measured by using the elasticity coecient
, where
=
%q
%p
.
Since most demand curves slope downward, the numerator and denominator
will have opposite signs, making < 0. For a particular demand function given
in (4.1), lets calculate the price elasticity of demand if we start at the quantity-
price combination (1,4) and move down to (2,3). We get = 4; but if we
reverse the direction of the calculation, ie, if we go from (2,3) to (1,4) then
= 3/2. Hence this arc denition of elasticity is not so precise therefore we
usually use the point elasticity of demand or output.
Actually the concept of elasticity is so general in economics that we can give it
a generic mathematical denition. If m = f(x, y, z), then the point elasticity of
m with respect to x is
%m
%x
=
m
x
x
m
Example 4.4 In macroeconomic monetary theory, it has been suggested that
the demand for real money balances m is given by
m = e
where is the expected rate of ination and is a strictly positive parameter

1. Find the elasticity of the demand for real cash balances with respect to the
expected rate of ination
M,
2. If the demand for real cash balances is unit-elastic, then what is the rela-
tionship between the parameter and the expected rate of ination .
Cost Functions
A rms total cost function TC relates its total cost of production to various
levels of output (y or q). Usually this cost function is divided into variable costs
VC, which are a function of the level of output, and xed costs FC, which are
independent of the level of output. According to the way long- and short-run
periods are dened, in the long run there are no xed costs although short run
involves variable as well as xed costs. Accordingly, we will dene short-run
total-cost function as
TC(y) = V C(y) +FC
where, y is the rms output level. Lets us consider a specic example of a
short-run total-cost function. Suppose V C = 4y
2
and FC = 100, then the
total-cost function is TC = 4y
2
+ 100. You may also recall that there three
types of average short-run cost functions:
ATC =
TC(y)
y
, AV C =
V C(y)
y
, and AFC =
FC
y
For our specic example these functions become
ATC = 4y +
100
y
, AV C = 4y, and AFC =
100
y
.
20
Now lets consider marginal cost (MC). In introductory economics, marginal
cost dened as the cost associated with 1 more unit of output. But we will use
derivative denition
MC =
dTC(y)
dy
Since TC(y) = V C(y) +FC and dFC/dy = 0, hence we can write the MC as
MC =
dV C(y)
dy
Lets look at the relationship between average total cost and marginal cost.
Since ATC = TC/y, by the quotient rule we have
dATC(y)
dy
=
dTC/dy TC/y
y
2
=
MC ATC
y
.
From this result we obtain the following relations between ATC and MC when
y > 0;
MC > ATC
dATC
dy
> 0,
MC < ATC
dATC
dy
< 0,
MC = ATC
dATC
dy
= 0.
Example 4.5 If the short-run total cost is given by the general cubic function
TC(y) = ay
3
+by
2
+cy +d
where a, b, c, d are scalars, nd restrictions that make MC and AVC U-shaped
and that make both MC and AVC strictly positive.
4.2 Optimization Examples from Introductory
Economics
Prot-Maximizing Output: Perfect Competition
Consider a perfectly competitive rm that sells its output y at market price p > 0
and has the dierentiable total-cost function TC(y). Since the rm operates in
perfectly market, it is a price taker. The rm has to decide about the right
output, ie, its prot maximizing output. The prot function for a perfectly
competitive competitive rm with price p and the total cost function TC(y) is
given by
(y) = py TC(y)
Formally, the perfectly competitive rms problem is
Maxy(y), (4.2)
21
where (y) is the prot function. The rst order (necessary) condition for y
to
solve the problem is that y
be a critical point of the prot function, ie, to

(y
) = p
dTC
dy
= 0.
Since we know from earlier discussion of costs that dTC/dy = MC, the rst
order condition can be written in a more familiar form as
p = MC(y
) (4.3)
We need to properly imply this equation. We already said that FOC, ie, (4.3),
yields a necessary condition. In other words p = MC does not yield a output
maximizing output. It simply says that if prot is being maximized at some
output y
, then it must be the case that p = MC(y
). Hence, the solution

implied by this condition does not guarantee that it is a maximizer. It is a
maximizer only when it is combined with the sucient condition (second order)
for the problem (4.2), which is
(y
) < 0,
but clearly this means that
dMC
dy
> 0.
To summarize the FOC and the SOC, if p = MC(y
) at a particular y
and MC
slopes upward at that y
, then y
solves (4.2) and it is the prot maximizing

output for the perfectly competitive rm.
Example 4.6 A perfectly competitive rm both the product an input markets
can produce output y by using labor l according to the production function y =
l
2
+10l. If the rms output sells for $10 a unit and it pays a wage of $40 per
unit, what is the prot maximizing level of employment l
for this rm?

Prot Maximizing Output: Monopoly
Now consider the case of a prot maximizing monopolist rm. For it, as well as
for a perfectly competitive rm, prot function is given as
(y) = py TC(y).
Given this, how does the behavior of a monopolist is dierent from a perfectly
competitive rm? A perfectly competitive rm faces a small fraction of the
market demand curve and can sell as much as it wants at the going market price,
namely a competitive rm is a price taker. On the other hand, a monopolist
faces the entire market demand curve. Assuming a downward sloping demand
curve, the monopolist has to choose between lower output and higher price or
higher output and lower price. The process through which the monopolist makes
his decision is as follows, rst he has to decide on the prot maximizing output.
So far a competitive rm and the monopolist behave the same way; but at this
point although the competitive rm is done, the monopolist has to set the price
himself so that buyers will demand his prot maximizing output. Lets now
mathematically derive this prot maximizing output.
22
Since the price will be determined by the quantity demanded, the monopolist
will have the market demand function p = p(y) and the monopolist will try to
maximize his prot function
(y) = TR(y) TC(y),
where TR(y) = p(y)y. Supposing all the functions above are dierentiable. The
rst order condition for the monopolist is
(y
) =
dTR(y
)
dy

dTC(y
)
dy
= 0.
or it can be written as
MR(y
) = MC(y
).
Note that the case of the perfectly competitive rm is a special case of the above
condition.
Example 4.7 Suppose a monopolistic rm has the short-run production func-
tion q = l
1/2
where q is the output and l is the labor input. If the wage w per
unit of labor is 4 and xed cost is $100, nd the following:
1. Total cost (as a functionof q, not l)
2. Average total cost
3. Marginal cost
4. The prot maximizing output q
when demand is given by P = 8q + 96.

Consumer Choice
Suppose a consumer purchases two goods, x and y and she has money income M
to spend on these goods. The prices of the goods given are p
x
and p
y
. Hence,
the consumer is not completely free to buy whatever the amount she pleases
to buy because of money constraint on her spending. Therefore the consumer
problem will be presented as
Max U(x, y)
Subject to M = p
x
x +p
y
y. (4.4)
As a matter of fact we are not prepared to handle such constrained optimization
problems. However, we can still solve it by a substitution trick. Clearly, the
constraint gives us a relationship between y and x that must be satised by
the maximizing values of y and x; hence we can eliminate one of these variables
from the utility function by the virtue of this constraint. Lets work with a more
concrete example. Suppose the problem is given as follows
Max U(x, y) = xy
Subject to M = p
x
x +p
y
y. (4.5)
Now although the problem looks like a two-variable maximization, in the util-
ity function we can substitute for y from the constraint. First, rearrange the
constraint so that y is a function of x.
y =
p
x
x
p
y
+
M
p
y
23
So, the utility function becomes
U(x) =
p
x
x
2
p
y
+
M
p
y
The FOC is given as
U(x) =
2p
x
x
p
y
+
M
p
y
= 0
That yields the solution for x
and y
as
x
=
M
2p
x
, y
=
M
2p
y
Notice, because the specic form of the utility function, both consumption goods
x and y enter the utility symmetrically. Hence it is natural that the consumer
will divide her money income between the two goods equally.
Example 4.8 Consider the consumer choice problem given in (4.4) for the case
where U(x, y) = x + log y.
1. Use the substitution technique discussed in the chapter to nd the utility-
maximizing consumption of both goods (x
) and (y
).
2. From the expressions found in (1) nd the own price, cross price, and
income elasticities for x
.
3. From the computation in (2), determine the circumstances under which
the demand for good x will be unit elastic.
4.3 Introduction to Concavity and Convexity
Some class of functions are very useful in economics because they provide mathe-
matical characterization of important economic properties. Concave and convex
functions are such two classes.
A function of a single y = f(x), f : R R, strictly concave i for all
x
0
, x
1
R with x
0
= x
1
and all of 0 < < 1,
f( x) > f(x
0
) + (1 )f(x
1
),
where x = x
0
+(1)x
1
. If the strict inequality above is replaced by then the
function is merely concave rather than strictly concave. Obviously, all strictly
concave functions are concave, but not vice versa. Although the denition above
was given for only one-variable case, this denition can easily be extended to
the case when f : R
n
R and x
0
, x
1
R
n
. This denition has a nice geometric
interpretation. A similar denition can be given for convex functions where
we need to reverse the inequality above. But more importantly if f(x) is a
concave function then f(x) is a convex functions. So all the properties we say
about concave functions are also valid for convex functions.
There is a second way to characterize dierentiable functions through their
rst derivative. A function of a single variable, y = f(x), f : R R, is strictly
concave if for all x
0
, x
1
R with x
0
= x
1
f
(x
0
)(x
1
x
0
) +f(x
0
) > f(x
1
).
24
Yet it is easier to see the above condition if we write it as
x
1
x
0
> 0 implies f
(x
0
) >
f(x
1
) f(x
0
)
x
1
x
0
,
x
1
x
0
< 0 implies f
(x
0
) <
f(x
1
) f(x
0
)
x
1
x
0
.
The above characterization is also valid for concave functions if we replace strong
inequalities by weak ones.
Third and nal characteristic of strict concavity involves the second deriv-
ative of the function. Suppose that a function f : R R is at least twice
dierentiable and has a strictly negative derivative (f
(x) < 0) for all x R.

Note that because the second derivative is the rate of change of the rst deriva-
tive and because it is always negative this means that the slope of the function
is decreasing all the time. Furthermore, this means that the tangent is always
above the function and so the function must be strictly concave. Likewise, if
(f
(x) > 0) for all x R, then the function is strictly convex. Finally, if weak
inequalities are allowed on the second derivative ( or ), then the functions
are concave and convex, respectively.
Concavity and Convexity in Economics
Concavity and convexity can often be given an economic interpretation. For
instance, consider a short run production function y = f(L) and suppose that
it is twice dierentiable. The rst derivative f
(L) will give us the marginal

product of labor, MP
L
. If f
(L) < 0, then this means the marginal product of

labor is decreasing. Economically the property of decreasing marginal product
is called diminishing returns and it is a standard assumption on short run pro-
duction functions. A similar interpretation can be applied to utility functions.
We will end this section by giving a theorem called local global theorem.
This theorem signies the importance concave functions in terms of maximum
problems.
Theorem 4.1 Point x
is the global maximum of the dierentiable concave

function f : R R i f
(x
) = 0. When f is strictly concave, the global

maximum is also unique.
Proof Since theonly ifpart of the theorem holds for all dierentiable functions
whether the function is concave or not we only need to prove it for the if part.
Because f is concave, we know from our second characterization of concave
functions that
f
(x
)(x x
) +f(x
) f(x) x R.
Now if f
(x) = 0 this condition implies that f(x
) f(x), x R, proving
that x
is a global maximum of f. If f is strictly concave the above inequality

is strict proving that x
is a unique global maximum of f.

This is a very powerful theorem; it states that FOC is not only a necessary
condition for x
to be a maximum but also it is a sucient condition. By

reversing inequalities and changing concave to convex, maximum to minimum
we can state the same theorem for a global minimum.
25
Example 4.9 Suppose that a particular economy produces only two outputs,
good x and good y. The only input in the economy is labor L, and the total labor
supply is xed at 1; so L
x
+L
y
= 1, where L
x
is the labor used in the production
of x and L
y
is the labor used in the production of y. The production functions
for goods x and y are given by x = 2L
1/2
x
and y = 2L
1/2
y
, respectively. Find the
equation for the production possibilities curve of this economy and graph it. Is
this production possibilities curve a concave function?
Exercises
1. Consider the rm trying to minimize its long run total cost function and
its problem is given by
Min
L,K
wL +rK
Subject to y = LK/100.
(a) For the values w = r = $1, use the substitution technique discussed in
the chapter to nd the cost-minimizing quantities of labor an capital
L
and K
.
(b) The long-run total-cost is TC = wL
+ rK
, that is the total cost

of employing the cost-minimizing quantities of the two inputs. Find
TC for this rm, using your answers to (a).
(c) Find the long-run average total cost (ATC) and long-run marginal
cost for this rm. Sketch the shapes of these curves.
26
Chapter 5
Multivariate Optimization
The Prot Maximization Problem for the Competitive Firms
max
wrt x
(x) = px c(x, w, r)
The solution is of the form
x = x(p, w, r)
which is the rms supply function. The supply function identies the amount
of output the rm wants to produce (and sell) to maximize its prots, given p, w
and r. The functional form of the supply function is completely determined by
the functional form of the cost function.
In Terms of Cost Function
max
wrt k,l
(k, l) = pf(k, l) wl rk
The solution is of the form
l = l(p, w, r)
k = k(p, w, r)
These are rms input demand functions. They identify the amounts of labor
and capital the rm will want to hire to maximize its prots as a function
of p, w and r. The functional form of these demand functions are completely
determined by the functional form of the production function.
Production Managers Problem
The production managers problem is to
min
wrt k,l
e = wl +rk
subject to
y = f(k, l)
That is the production function manager wants to minimize expenditures e sub-
ject to a number of constraints. The solution to this problem is the conditional
input demand functions for a labor and capital
l
c
= l
c
(y, w, r)
27
and
k
c
= k
c
(y, w, r)
these two demand identify the amount fo labor and capital the production man-
ager wants to minimize the cost of producing x units of output given input prices
w and r. They are conditional because they are conditional on given output
level. Therefore,
e
= wl
c
= l
c
(y, w, r) +rk
c
(y, w, r) = c(y, w, r)
is the rms cost function.
5.1 Economic Applications of Multivariate Cal-
culus
Total dierentials are very useful for getting a better understanding of a lot of
the graphs that we are familiar with in economics. Graphs such as isoquants,
indierence curves, and isocost lines.
5.1.1 Production Theory
Lets start with production function and isoquants. Consider the production
function y = f(k, l). Visualize it in three-dimensional space. Now lets dene
an isoquant
Denition 5.1 The isoquant for the output level, y
0
, identies all those com-
binations of k and l that are just capable of producing y
0
units of output.
Note that the isoquants are the boundaries of input requirement sets. The
input requirement set for producing y
0
units of output is all combinations of
k and l that are capable of producing y
0
. That is
I(y
0
) {(k, l) : f(k, l) y
0
, l 0, k 0}
Note that as one moves along an isoquant, dy = 0. Because of this the concept
of a total dierential is very useful for deriving the isoquant from a production
function. If y = f(k, l)
dy = f
k
dk +f
l
dl
But along the isoquant dy = 0, so along the isoquant
0 = f
k
dk +f
l
dl
Therefore as one moves along the isoquant
f
k
dk = f
l
dl
Rearranging
dk
dl
dy=0
=
f
l
f
k
=
MP
l
(k, l)
MP
k
(k, l)
= MRTS
lk
28
MRTS
lk
is the marginal rate of technical substitution of labor for capital. That
is, the rate as which one can substitute capital for labor holding output constant.
dk
dl
dy=0
is the slope of the isoquant, so MRTS
lk
is the negative of the slope.
For example, if y = f(k, l) = Al
A > 0, > 0, > 0. Given that it is

always the case that
dk
dl
dy=0
=
f
l
f
k
=
MP
l
(k, l)
MP
k
(k, l)
One just need to nd the MP
l
(k, l) and MP
k
(k, l) then divide and simplify
f
l
= MP
l
(k, l) = Al
1
k
and
f
k
= MP
k
(k, l) = Al
k
1
so
dk
dl
dy=0
=
MP
l
(k, l)
MP
k
(k, l)
=
Al
1
k
Al
k
1
=
k
l
That is, the slope of the isoquant for this Cobb-Douglas production function a
the point (k
0
, l
0
) is
k
0
l
0
and the MRTS
lk
=
k
0
l
0
.
What have we determined about the slope of this isoquant? By assumption,
it has a negative slope for positive values of k and l.
Is it a straight line? No. How do we know this. The slope is a function of k
and l.
What happens to the slope of the isoquant as l increases? To answer this
take the partial derivative of the slope with respect to l.
(
dk
dl
dy=0
)
l
=
(
k
l
)
l
= (
)kl
2
> 0
The slope of the isoquant becomes atter as l increases.
Lets do a numerical example. Assume
y = f(k, l) = 5l
.3
k
.7
We know that for Cobb-Douglas in general
dk
dl
dy=0
=
k
l
so
dk
dl
dy=0
=
.3k
.7l
If k
0
= 5 and l
0
= 2, one unit of labor substitutes for how many units of capital
MRTS
lk
(k
0
, l
0
) = MRTS
lk
(5, 2) =
.3(5)
.7(2)
= 1.07
At MRTS
lk
(k
1
, l
1
) = MRTS
lk
(2, 5) =
.3(2)
.7(5)
= .171 That is, at this more
labor intensive combination of labor and capital, one unit of labor substitutes
for only .171 units of capital.
29
5.1.2 Consumer Theory
Suppose a consumer is maximizing her utility by choosing among n dierent
commodities. The problem can be written as
Max u(x)
Subject to w =
n
i=1
p
i
x
i
.
(5.1)
Marginal utility of good i is dened as
mu
i
=
u
x
i
Example 5.1 Find the marginal utility of Cobb-Douglas utility u(x
1
, x
2
) =
x
1/2
1
x
1/2
2
Another property usually attributed to the economic agents is the diminishing
marginal utility. The utility function is said to exhibit this property if
u
ii
=
mu
x
i
=

2
u
x
2
i
< 0 for all i.
Example 5.2 Does Cobb-Douglas utility function exhibit diminishing marginal
utility property.
Indierence Curves
Initially assume a world of only two goods x
1
and x
2
. Further assume that Adas
preference can be represented with the utility function u = u(x
1
, x
2
). Now lets
dene an indierence curve
Denition 5.2 The indierence curve for utility level u
0
consists of all the
combination of x
1
and x
2
that, if consumed, will achieve the utility level u
0
.
Formally
I(u
0
) = {(x
1
, x
2
) : u(x
1
, x
2
) = u
0
}
Consider the total dierential of the utility function
du = f
x
1
dx
1
+f
x
2
dx
2
But as one moves along the indierence curve, du = 0. So
0 = f
x
1
dx
1
+f
x
2
dx
2
Rearranging,
dx
2
dx
1
du=0
=
f
x
1
f
x
2
MRS
x
1
x
2
where MRS
x
1
x
2
is the marginal rate of substitution of x
1
for x
2
. It tells us
how much the consumption of x
2
must decrease to hold utility constant when
x
1
increases by one unit. It is the rate at which the individual is willing to
substitute one good for another. In other words, MRS gives consumers private
valuation of good x
1
in units of good x
2
.
30
Example 5.3 Suppose Adas utility function is given by u(x
1
, x
2
) = x
1
x
2
.
Hence
dx
2
dx
1
du=0
=
x
2
x
1
Note that her MRS
x
1
x
2
is dependent on the amount of the two goods she is
currently consuming. For example, if x
0
1
= 4 and x
0
2
= 2, MRS
x
1
x
2
(4, 2) = 1/2.
That is she would substitute 1/2 units of x
2
for one unit of x
1
. However, if
x
0
1
= 2 and x
0
2
= 4, MRS
x
1
x
2
(2, 4) = 2. That is she would substitute 2 units of
x
2
for one unit of x
1
.
Example 5.4 Now consider another example. Freyas is a little more compli-
cated girl than Ada and her utility is given by
u(x
1
, x
2
) = a
1
x
1
+a
2
x
2
What is her MRS
x
1
x
2
?
Now lets consider one more consumer application of total dierentials. Consider
the consumers budget constraint assuming two goods and exogenous income
and prices
w p
1
x
1
+p
2
x
2
If more is always preferred to less, the less individuals will operate on the bound-
ary of his budget set. Solve for x
2
as a function of x
1
x
2
=
w
p
2
_
p
1
p
2
_
x
1
For example, if y = 10, p
1
= 5, and p
2
= 10, x
2
= 10 0.5x
1
(draw the budget
line). One can derive the slope of the budget line by taking the derivative of the
last function wrt x
1
. This will tell us the rate the market allow the individual
to substitute (trade) x
1
for x
2
.
dx
2
dx
1
dw=0
=
p
1
p
2
It is the negative of the price ratio. This makes sense. For example, if p
2
= 2
and p
1
= 1, one unit of x
1
trades for 1/2 units of x
2
.
So one could also derive the slope of the budget line using the concept of a
total dierential. Remember that p
1
and p
2
are exogenous, so along the budget
line dw = 0, hence
0 = dw = p
1
dx
1
+p
2
dx
2
Rearranging, one gets
dx
2
dx
1
dw=0
=
p
1
p
2
Compare
dx
2
dx
1
dw=0
and
dx
2
dx
1
du=0
. The rst is the rate at which the market
allows the individual to substitute one good for another. This rate is exogenous
to the individual and we will call it market valuation of good x
1
in units of
x
2
. The second is the rate at which the individual is willing to substitute one
good for another and this is private valuation as we interpreted MRS above.
This is a function of individual preferences.
31
5.2 Homogenous Functions
The class of homogenous functions because of certain mathematical properties
are very important in economics. A real valued function of n variables y =
f(x), f : R
n
R is homogenous of degree r (HDr) i for all x R
n
and
for all R
+
f(x) =
r
f(x).
Note that multiplying x by we multiply each component of x so that x varies
along a line in R
n
. The most useful theorem about the homogenous functions
is the so called Eulers Theorem which we state next.
Theorem 5.1 A dierentiable function of n variables f(x), f : R
n
R, is
HDr i for all x R
n
rf(x) =
n
i=1
f
i
(x)x
i
where f
i
=
f
x
i
Our second theorem is about the derivative of homogenous functions, which
can be directly obtained from the Eulers theorem.
Theorem 5.2 If a twice dierentiable function of n variables f(x), f : R
n
R, is HDr, then its partial derivatives f

i
(x) = f(x)/x
i
are homogenous of
degree r 1 for all j = 1, 2, . . . , n.
Homogeneity and Production Theory
Note that the production function is said to exhibit CRS if it is HD1; DRS if
it is of lower order than HD1; and IRS if it is of higher order than HD1. In
the rst case doubling the input will exactly double the output; in the second
it will less than double the output; and in the third case will more than double
the output.
Homogenous production functions are also closely related to factor income
distribution. Suppose the production function y = f(l, k) is HD1. Given this
homogeneity, Euler theorem implies that
y = f
l
l +f
k
k
If the factors get paid the value of their marginal product and if the price of the
rms output is p, then multiplying both sides by the price of the rms output
we see that the rms total revenue is divided among the factors as follows
py = pf
l
l +pf
k
k
In the above production function each factors share of income depends on the
level of production.
Example 5.5 What are the factor shares for Cobb-Douglas production func-
tion? Do they depend on the level of production.
32
Homogeneity and Demand Theory
The most important use of homogeneity in this context is the zero degree ho-
mogeneity of consumer demand functions. The general form of the consumer
choice problem is given in
Max u(x)
Subject to w =

n
i=1
p
i
x
i
.
Suppose the general solution to the above problem is given as
x
i
= x
i
(p, w) i = 1, . . . , n
Now consider increasing the prices and wealth by a scalar > 0. The impact
of a such a scalar on the budget constraint is
w =
n
i=1
p
i
x
i
which implies w =
n
i=1
p
i
x
i
This shows that the budget constraint is not altered by a scale increase in all
the parameters. Since budget constraint and the utility function remains the
same this implies that utility-maximizing demands above will remain the same:
x
i
(p, w) = x
i
(p, w) i = 1, . . . , n, > 0.
That is consumer demand functions are HD0 in prices and money income. For
instance, doubling all the prices and income does not change consumers optimal
choices. In micro- and macroeconomics this important result is known as no
money illusion
Homothetic Functions
This is a class of another useful functions. Their usefulness come from the fact
that although they share similar properties with homogenous functions but they
are mathematically less restrictive.
A function is homothetic if it can be written as as a monotonic transforma-
tion of a homogenous function. In the language of mathematics, the function
z = F(x), F : R
n
R, is homothetic if there exists two functions f and G,
where f : R
n
R is homogenous at degree r and G : R R with G
> 0, such
that F(x) = G[f(x)].
First of all we should note that every homogenous function is homothetic.
Since identity transformation is a monotonic transformation we can see that
every homogenous function is also a homothetic function. However, the con-
verse is not true. As another example take Cobb-Douglas and see that its log
transformation is a homothetic function, which is not a homogenous function.
Another important application of homotheticity is that level sets (isoquants
for production functions, indierence curves for utility functions) are radial
blowups of each other. If a production is homothetic, then the slope of the
tangent to any isoquant at a point x
0
, MRTS(x
0
), is the same as the slope of
the tangent at x
0
, MRTS(x
0
), for all > 0. Similarly, for homothetic utility
functions.
33
Exercises
1. Consider the utility function u(x, y) = x
1/2
y
1/2
(a) Does this utility function exhibit diminishing marginal utility?
(b) Does this utility function exhibit diminishing MRS?
2. Suppose a particular good is produced by using capital K and labor L
inputs according to the production function
y = f(k, l) = AL
2
K
2
BL
3
K
3
,
where A > 0, B > 0, and f : D R
2
(a) How should D be restricted to make economic sense?
(b) Find the following:
AP
L
=
y
l
, MP
l
=
f
l
, AP
k
=
y
k
, MP
k
=
f
k
.
(c) Does the MP cuts the AP at its maximum point?
(d) Find the MRTS and interpret it.
(e) If k is xed at some k, for what values of l does f exhibit diminishing
returns to labor? If L is xed at some l, for what values of k does f
exhibit diminishing returns to capital?
3. Are the following production functions homogenous? If so, to what degree?
(a) y = f(l, k) = Al
a
k
b
with A > 0, a > 0 and b > 0
(b) y = f(l, k) = (Al
1+a
k
1+b
/(Bl +Ck) with A > 0, B > 0, C > 0, 0 <
a < 1, and 0 < b < 1.
4. You are given the demand functions x = (p
x
, p
y
, w) and y = (p
x
, p
y
, w)
that solve
Max u(x, y)
Subject to w = p
x
x +p
y
y
for the case where u(x) is homothetic.
(a) What can be said regarding the shape of the Engel curves associated
with x and y?
(b) Find the income elasticities of demand for two goods.
5. Assume that Adas preferences can be described by
u(x, y) = ax +by
1/2
subject to the usual budget constraint. Assume also that more is always
preferred to less. What does this assumption imply about a and b? Derive
Adas demand function for y. What happens to the demand for y if the
price of x increases? Does it increases or decrease and by how much?
34
6. Assume that an individuals preferences can be described by
u(x, y) = ax +by
subject to the usual budget constraint. Assume more is always preferred
to less. Derive indiviudals demand functions. Explain, in words what you
are doing and the the logic of what you are doing. Think before you leap.
Use a graph or graphs to make your argument.
7. Assume a competitive rm produces product y using k and l. Further
assume that y is sold at the parametric price p and that r and w are the
parametric prices of k and l.
(a) Derive the rms long run demand function for labor, l
d
, and for
capital services, k
d
, assuming
y = f(k, l) = k
1/2
l
1/2
.
Do not worry about the second order conditions. Is there something
wrong? Explain.
(b) Now use the production
y = f(k, l) = k
.5
l
.2
to derive the long run demand for labor and capital.
8. Assume a competitive rm sells its output y at the parametric price p and
that it can purchase labor and capital at the parametric prices w and r.
Further assume that the rms cost function is
c = c(x, w, r) = x
.5
wr.
(a) Determine the prot maximizing level of output, y
. Derive a quanti-
tative static prediction about what happens to y
if r increases. Now
is your solution an interior solution? Explain.
(b) Now use the following production function
c = c(x, w, r) = x
2
wr
Determine the prot maximizing level of output, y
. Derive a quanti-
tative static prediction about what happens to y
if r increases. Now
is your solution an interior solution? Explain.
35
Chapter 6
Constrained Optimization
In this chapter we will discuss Lagrange multiplier technique for solving con-
strained optimization problems, which are fundamental in economics. Utility
maximization, cost minimization are such fundamental problems and the La-
grangian technique is a very useful tool to analyze them.
6.1 The Lagrangian Technique:
First- and Second-Order Conditions
Consider the n-variable constrained maximization problem
Max
x
u(x)
Subject to g(x) = 0.
(6.1)
In (6.1) x R
n
+
, and we assume both f and g to be at least twice dierentiable.
Previously we solved this kind of problem by converting it into unconstrained
optimization by substituting for one of the variables from the constrained equa-
tion. The Lagrangian technique also achieves the same goal but in a more
symmetric way and at the same time this technique reveals, as we will see, very
important information about constraints.
We dene the following Lagrange function (or simply Lagrangian) in
n + 1 variables
L(x, ) = f(x) +g(x). (6.2)
In the above equation we introduced a new variable called the Lagrange
multiplier. The lagrangian is a function in n + 1 variables.
Now consider the problem of nding a critical point of the lagrangian func-
tion.
L(x, ). (6.3)
From our previous notes we know that if (x
) is a critical point of (6.3) in

the interior of the domain, then the following necessary conditions must hold:
L
i
(x
) = f
i
(x
) +
g
i
(x
) = 0 for all i = 1, 2, . . . , n,
L
(x
) = g(x
) = 0
(6.4)
The fundamental result of the lagrangian multiplier technique is that a solution
to (6.1) also satises the conditions in (6.4).
36
Theorem 6.1 If x
R
2
+
, is a local maximum for (6.1), then there exists a
such that the conditions in (6.4) hold.

This Lagrangian result is very useful as we will see later. It provides the rst
order (necessary) conditions for a local constrained maximum of the objective
function. Formally, the result states that if f(x
) f(x) for all x within an

neighborhood of x
satisfying g(x) = 0, then there exists
such that con-

ditions (6.1) hold. As for unconstrained problems, these rst-order conditions
hold for a constrained minimization problem as well. The dierences between
minimization and maximization are contained in the second-order conditions,
as we will discuss them later. The necessary conditions are the same for both
problems.
This result can be extended to the case of m independent equality constraints
(where n > m). For instance, the rst-order condition for x
to solve
Max
x
u(x)
Subject to g
1
(x) = 0
.
.
.
g
m
(x) = 0
(6.5)
is that there must exist
R
m
such that
f
i
(x
) +
m
j
g
j
i
(x
) = 0 for all i = 1, 2, . . . , n,
g
j
(x
) = 0 for all j = 1, 2, . . . , m
Example 6.1 Consider the utility function u(x, y) = log(x)+ log(y), subject
to the usual budget constraint. What is the utility maximizing demand functions
for x and y.
Solution Lets form the lagrangian
L = log(x) + log(y) +(w p
1
x +p
2
y)
Now we form the FOC
L
x
=
1
x
p
1
= 0
L
y
=
1
y
p
2
= 0
The above equations give us two equations in three unknowns. We can solve
these equations in terms of y =
p
1
p
2
x. Plugging this into the budget constraint

for y will yield x =
w
p
1
and y =
w
p
2
.
Second Order Conditions
Theorem 6.2 Suppose x
is a critical point for the problem

Max
x
f(x
1
, x
2
)
Subject to g(x
1
, x
2
) = 0.
(6.6)
37
ie x
is a solution of L(x
1
, x
2
, ) = 0. Then x
= x
1
, x
2
is a local maximizer
if the bordered Hessian
H =
_
_
L
|C
n1
|
|C
12
|C
n2
|
.
.
.
.
.
.
.
.
.
|C
1n
| L
_
Economic Interpretation of Lagrange Multipliers
We solved the consumers problem above analytically. But we can also propose
an intuitive solution using the marginal principle. Suppose you derive utility
from only two things: drinking tea and going to movies. Let x denote number
of cups of tea, and y the number of movies you go, u(x, y) the utility function.
Since you have a budget, you face a trade o between these two. Now how
should you decide on the right amount of each of these consumption activities.
The marginal principle says that when you want to decide about the level of a
variable, choose the level at which marginal cost is equal to marginal benet.
Now, for example, take the problem of drinking one more cup of tea. What
is the benet and cost of drinking one more cup of tea, ie marginal cost and
benet. Note that this marginal cost and benet will also depend on the level
of tea you have already consumed. According to the marginal principle you will
drink another cup of tea if the marginal benet of one more cup of tea is greater
than the cost of it. The cost of one more cup of the is clear: it is its price, say
P
x
. But how can put a price on this wonderful zest that is left after drinking
a good glass of tea? How can we measure it with impersonal dollars. We need
to nd some measure that will convert dollars into units of utility. Now there
is a theoretical way, if not very practical. Suppose someone gives a dollar to
spend. Because you now have an extra dollar to spend certainly your utility has
increased. Lets call this marginal change in utility, in units of utility, . Hence,
is the utility of a dollar or as it is usually called marginal utility of income.
Now if we want to know the utility of P
x
we will simply multiply it with . Now
our decision is very simple, choose the level of x such that
u
x
= P
x
. We can
similarly derive the same result for y, the number of movies you should go in
a month. How does this compare with the result we have analytically derived
above? Now it is easy to see that why indeed sometimes Lagrange multipliers
are called shadow prices: they are conversion factors for prices in dollars to be
converted into prices in units of utilities.
Now lets try our hand on a similar problem but this time from the pro-
duction theory. Suppose a competitive rm, ie a rm that takes product and
factor prices given, trying to maximize its short run prot by deciding on the
number of workers to hire. Now, what is the marginal benet of hiring one more
worker? Its marginal product, ie
F
L
. And what is its marginal cost? It is just
the wage, w, the rm pays to the worker. But the question here is how can we
compare the physical marginal product with the wage in dollars? We can turn
the marginal product into value marginal product by multiplying the marginal
product by its price. Hence this will let us compare dollars with dollars.
38
Chapter 7
Calculus of Variations
Well indeed I am not sure what should be going in there now. But you know
this is the craze of the day. Once upon upon a time I was telling myself that we
need to write essays on everything and take down our notes. Then certainly I
did not dream of writing these things on the computer and that wonderful thing
called L
A
T
E
X. Yes indeed I would like to see what outcome will be coming this
supposedly a big endeavor for us. Soon we will see. Here in this chapter I am
hoping to collect the results on that will be useful. I have done all these once
upon a time and I will not go over the examples once again. I will try to compress
a lot chapters into sections; the goal is to reach the Maximum Principle. Yet
there are some important chapters such as Innite Planning Horizon etc. I
should be paying special attention. We will pay due respect but I will make this
mostly a review. Everything is to understand only the maximum principle. We
are going to state the fundamental problem:
max(min) V [y] =
_
T
0
F[t, y(t), y
(t)]dt
subject to y(0) =A (A given)
and y(T) =Z (T, Z given)
(7.1)
The problem here is to choose the optimal path y such that V [y] is maxi-
mized. Here y is called an extremal. To attack the problem we will use the so
called variational technique. Indeed the solution found by this technique is a
necessary condition. But before I get there let me put down here an aside which
will be very useful.
Leibniz Rule for Dierentiating a Denite Integral Consider the denite
integral
I(x) =
_
b
a
F(t, x)dt
where F(t, x) is supposed to have continuous derivative in the interval of the
integral. Because the integral is a function of x, the eect of a change in x is
given by the derivative formula knows as the Leibnizs rule:
dI
dx
=
_
b
a
F
x
(t, x)dt
39
The intuition behind this formula can be seenthe simplest is to think of area
as if we were to change x a little bit then I =
_
b
a
F(t, x)dt. But we can also
approximate F(t, x)dt by F
x
(t, x)x, so the formula is clear. The rule can
be generalized if we were to make the limits of the integral depend on x as
well. But this time the limit as well as the variable of the denite integral will
change. In terms of area denition this would imply that change would be in two
dimensions. First dimension is as we discussed above; the second term would be
displacement in the upper limit times height. Hence F = F[b(x)]b. But this
again be approximated by its dierential. So we get the more general Leibnizs
rule as
dI
dx
=
_
b(x)
a
F
x
(t, x)dt +F[b(x), x]b
(x)
Now, once this preliminary is over we can continue with our variational problem.
7.1 The Euler Equation
There is no question indeed how ingenious Eulers method is. Yet it is so funda-
mental, so commonplace from another point of view. I guess this is what genius
is to use old techniques in quite new ways. Euler says that suppose we already
have one extremal say y
(t). Then he wonders what kind of properties this

extremal could have. To compare it with others then he devises a perturbation
curve p(t) which will have p(0) = p(T) just to comply with the initial conditions
given in the fundamental problem. Then he generates a whole neighborhood of
paths as
y(t) = y
(t) +p(t) [implying y
(t) = y
(t) +p
(t)]
with the property that as 0, y(t) y
(t). As we see here for each there

is only one y(t) and instead of thinking V [y]as a functional we can think of it
as simple function of . Therefore, V () must have its maximum at = 0 and
it should satisfy the rst order condition:
dV
d
=0
= 0
Well the method is ingenious as it is but it is not much more use simply because
the result depends on two arbitrary elements: the perturbation curve itself and
. But surely Euler knows how to get around this as well. The argument from
here is very standard and we can safely write the nal product.
F
y
d
dt
F
y
= 0 for all t [0, T] [Euler equation]
It is alright if we use the general form but sometimes it is easier if we appeal
to special cases. It might also be very useful if we reproduce the open form of
Euler equation:
F
y
y
y
(t) +F
yy
y
(t) +F
ty
y(t) F
y
= 0 for all t [0, T]
40

Math Language

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Math Language

Uploaded by

Copyright:

Available Formats

Fundamentals of Mathematical Economics

Muhammed Halim Dalgin

with the right and left

| it is really not important whether we

th row; the second

th column. Note that in the

(x) is continuous for all the points in its domain.

(x) product rule

(z) chain rule

(x) is the rst derivative and f

) f(x) for all x D.

D is a strict (unique) global maximum of f if

) > f(x) for all x D, x = x

D is a local maximum of f if it is a maximum for all x within

) f(x) for all x N

D is a strict (unique) local maximum of f if it is strict

) > f(x) for all x N

) = 0. However,this is only a necessary condition, it does

is either a maximum or minimum. Actually, a graphical

) < 0. Similarly, a sucient condition for strict

where is the expected rate of ination and is a strictly positive parameter

be a critical point of the prot function, ie, to

, then it must be the case that p = MC(y

). Hence, the solution

solves (4.2) and it is the prot maximizing

for this rm?

when demand is given by P = 8q + 96.

(x) < 0) for all x R.

(L) will give us the marginal

(L) < 0, then this means the marginal product of

is the global maximum of the dierentiable concave

) = 0. When f is strictly concave, the global

(x) = 0 this condition implies that f(x

is a global maximum of f. If f is strictly concave the above inequality

is a unique global maximum of f.

to be a maximum but also it is a sucient condition. By

, that is the total cost

A > 0, > 0, > 0. Given that it is

R, is HDr, then its partial derivatives f

) is a critical point of (6.3) in

such that the conditions in (6.4) hold.

) f(x) for all x within an

satisfying g(x) = 0, then there exists

such that con-

x. Plugging this into the budget constraint

is a critical point for the problem

(t). Then he wonders what kind of properties this

(t) +p(t) [implying y

(t). As we see here for each there

You might also like