You are on page 1of 51

EC 720 - Math for Economists

Lecture Notes
Samson Alva∗
Department of Economics, Boston College

Fall 2011


e-mail: samson.alva@bc.edu.

1
1 Overview of Course
This is a course on optimization, with an emphasis on applications. We will under-
stand the important theorems, and sketch heuristic proofs when illuminating, whose results
you should understand and be comfortable using. The relevant mathematical tools will be
developed as needed.
Optimization underpins the modern economic theory. Actors in economic models are
assumed to be optimizers, and the canonical approach to solving models involves formulating
a concept of equilibrium, whereby the actors solve their particular optimization problems in
a mutually consistent manner as prescribed by the equilibrium concept.
The essential element of an optimization problem is the objective function f , which maps
a domain U to a range V , where the range space has associated with it some notion of
an ordering so that the values of the function f at different points in the domain can be
compared. Most commonly, this range space V is a subset of the real line R, which I will
assume henceforth. I will couch results in terms of a maximization problem, but the results
for minimization problems can easily be extracted from these. There is little we can say at
this level of generality. Perhaps the most famous result is the Weierstrass Theorem, which
ensures that global extrema exist if f is continuous and U is compact.
In the context of a differentiable f with the U ⊂ R and V ≡ R, the quintessential
result from optimization theory is the First Order Necessary Condition (FONC) for local
extrema: the value of the derivative at any (interior) local extremum must be zero. This
condition generalizes to differentiable finite-dimensional real-valued functions, as one would
have seen in a course on multivariable calculus. Other important conditions include the First
Order Sufficient Condition (FOSC), the Second Order Necessary Condition (SONC), and the
Second Order Sufficient Condition (SOSC) for extrema (see Simon, Blume for details).
These introductory optimization problems are called unconstrained optimization prob-
lems, because the choice variables are independent of each other. However, most interesting
optimization problems involve (one of more of the following): constraints, implying that the
choice variables are not independent; time, implying that the optimum sought involves a
sequence of interlinked choices; uncertainty, implying that the outcome from a particular
choice is not determined by that choice. As we will see, all three of these wrinkles can han-
dled by the Karush-Kuhn-Tucker (KKT) approach1 to constrained optimization, which is a
very general theory that we shall begin with in this course. However, optimization problems
with time and with uncertainty often have a great deal of structure that can be used to
obtain sharper (and more useful) results; we will explore these later in the course.

1
You may have heard of the Kuhn-Tucker Theorem, but the necessary condition described by the the-

2
Another important set of theorems (other than the First/Second Order Neces-
sary/Sufficient Conditions) are the various Envelope Theorems that provide machinery for
the comparative statics2 of equilibrium. The distinction between endogenous and exogenous
variables in an economics model is represented by the distinction between choice variables
and parameters in optimization theory. One could think of there being two main questions
of any optimization problem: 1) Given a point in the parameter space, what are the optima
of the problem? 2) What are the properties of the set of optima/policy function? Envelope
Theorems are an answers to the second question, while FONC, etc. are answers to the first
question.
In summary, as we study optimization, it is useful to understand the assumptions that
are used for results under each of the following categories:

• Existence: Does the optimization problem have a solution?

• Uniqueness: Does the optimization problem have multiple or exactly one solution?

• Characterization: Are there conditions that solutions satisfy that are both necessary
and sufficient?

• Regularity: Do the solutions depend continuously on the parameters of the problem?

• Sensitivity: Does the optimal value (and the optimal choices) depend smoothly on
the parameters of the problem i.e. what is the effect of a marginal change in the
parameters?

orem was actually discovered by William Karush many years before (see the Wikipedia article: http://
en.wikipedia.org/wiki/Karush-Kuhn-Tucker_conditions). Given the customary omission of Karush’s
name, this could be considered an example of Stephen Stigler’s Law of Eponymy: “No scientific discovery is
named after its original discoverer.” (see http://en.wikipedia.org/wiki/Stigler’s_law_of_eponymy).
2
Paul Samuelson was the early proponent of the argument that the meaningfulness of models come from
its comparative statics properties.

3
Part I

Mathematical Preliminaries
Notation

⊂ - subset (or set inclusion)


( - strict subset (or strict set inclusion)
⊃ - superset
) - strict superset
∈ - element of
3 - contains
N - the set of natural numbers
Z - the set of integers
R - the set of real numbers

2 Sets, Functions, and Proofs


2.1 Sets and Functions
A set is a collection of objects. We will work with this intuitive definition of a set without
limiting what a set may contain3 .
A set that contains no elements is called the empty set and is denoted by ∅ or {}. The
union of two sets A and B, denoted A ∪ B, is a set that contains all elements that are con-
tained in A and all elements that are contained in B, without repetition. The intersection
of two sets A and B, denoted A ∩ B, is a set that contains only those elements that are
contained in both A and in B. A set C is a subset of another set A if every element in C
is also an element in A. The universal set U is the set containing all elements of every
possible set. Note well that the universal set can be different depending on the context,
and that the universal universal set is an ill-defined concept that leads to paradoxes (see
previous footnote). Given two sets A and B, we can define the set difference A − B or A\B
to be the set of all elements in A that are not in B. Then, A = (A − B) ∪ (A ∩ B). The
complement of a set A, denoted Ā, is the set of all elements not in A, that is, the set of
all elements that are in the universal set but not in A. Then, Ā = U − A. The power set
3
This is naı̈ve set theory, and suffers from the (Bertrand) Russell paradox: consider the set of all sets
that do not contain themselves. Now does this set contain itself?

4
of a set A, denoted P (A), is the collection of all subsets of A. Notice that if the cardinality
(see below for definition) of the set A is finite (and equal to a), then the number of subsets
of A, i.e. the cardinality of the power set of A, is 2a .
Next, we (intuitively) define a map from one source set (the domain) to another target
set (the codomain). Imagine the two sets written as a list, with one list written above the
other. Now, imagine that there are some element-to-element links between the two sets but
not within a set, where links are unique. This is the intuitive definition of a map. There are
four types of maps, which describe the nature of the links between the two sets. A map is
one-to-one if every element in the domain has no more than one link. A map is many-
to-one if the domain has the property that every element of that set has no more than one
link while the codomain contains at least one element that has more than one link. A map
is one-to-many if every element in the codomain has no more than one link, but there is at
least one element in the domain that has more than one link. A map is many-to-many if
both domain and codomain contain at least one element that has more than one link. Given
some map f , with domain A, the range of f is the set of all element in the codomain that
have a link to some element of the domain. Thus, the range is a subset of the codomain.
A function is a map from one set, called the domain of the function, to another set,
called the codomain of the function, with the restriction that each element in the domain
be mapped to exactly one element in the codomain. A function can be one-to-one or many-
to-one (but not many-to-many or one-to-many).
The cardinality of a set is the mathematical equivalent of the “size” of a set. For finite
sets (set containing a finite number of elements) the cardinality is the same as the number
of elements. For non-finite sets, the notion of size has to be extended to the realm of infinite
numbers. In order to do this, we need to have a formal way of counting the elements in a
set.
Consider pairing an element from the set of unknown cardinality with an element of the
set of natural numbers (the counting numbers 1, 2, 3...) denoted by N, following the natural
order of the counting numbers. For example, if I had a basket of apples and I wished to
know how many I had, I could have written down in ascending order the natural numbers
and placed one and only one apple on each number starting from 1. The number of apples
(that is, the cardinality of the set of apples) would then be given by the last number upon
which I placed an apple. This pairing of an element from one set to another is captured by
the definition of a function.
Finally, we can formalize the notion of counting, and thus of cardinality. For any set A,
we can define a counting function as a one-to-one function with domain A and codomain N
with the restriction that for any element of the natural numbers to which there is a link, every

5
lesser natural number has a link. Now, if the counting function is surjective, in addition to
being injective, i.e. bijective, then we say that the set is countably infinite. If the counting
function has a range with a least upper bound, then the set is finite and has cardinality
equal to the least upper bound. The empty set is considered to have cardinality 0. If there is
no way to construct a counting function, because no one-to-one function exists, then the set
is uncountably infinite. A set is countable if it is finite or countably infinite. Otherwise
it is uncountable.
Suppose A and B are subsets of some set X. Denote as A − B the set of all points in A
that are not contained in B i.e. A − B ≡ {a ∈ A : a 6∈ B}.

Definition 2.1. Suppose X is the universal set. If A ⊂ X, the complement of A, denoted


Ac , is the set X − A.

2.2 Proofs
In mathematics, a statement is a sentence that is either true or false. A proof is a
sound argument for the truth of a particular statement expressed in mathematical language.
An implication is a statement of the form “If A is true, then B is true”, where A and B
are statements. Here, A is the hypothesis and B is the conclusion. Implications are often
written as “If A, then B” or “A implies B”. There is a frequently used symbolic notation
for the implication: A =⇒ B.
A proposition is a true statement of interest to be proved – the proof would accept the
truth of some number of statements (the premises) and logically and cogently argue for
the truth of the proposition. A truth table is a useful method for determining the truth
value of complex statements. A theorem is a proposition that is subjectively considered
to be of great import or value. Sometimes, because of the length of an argument for a
theorem, the proof is broken into stages, with each linking proposition being proved as a
lemma. So, lemmata (plural of lemma) are propositions whose subjective import derives not
necessarily from its statement but from its role as a stage in the overarching construction
of a proof of a theorem. However, occasionally a lemma has importance independent of
the theorem for which it was constructed. Lastly, corollaries are propositions that follow
almost immediately from a theorem; the proof of such a statement is usually trivial, but the
subjective value of the knowledge of its truth is not.
A little reflection will reveal that to be able to employ mathematical logic fruitfully
one needs to know the truth value of some statements. Logic describes the relationships
between statements, and describes the rules by which the truth value of a statement can be
ascertained given a particular set of premises. However, to ground a particular systematic

6
body of knowledge one needs axioms. An axiom is a statement whose truth value is accepted
without formal proof. The defense for the choice of a particular axiom, and the consequences
of acceptance or rejection of a particular statement as axiomatically true is the bread and
butter of theoretical economics.4 Axioms are the atoms of a particular knowledge system,
just as certain mathematical concepts that are without formal definition (see for example
the definition of a set above) are atoms of a mathematical system.
By definition, every statement is either true or false. Then, we can define logical operators
on statements, analogous to the arithmetic operators plus and multiply. The operator AND
denoted ∧ is a binary operator such that A ∧ B is true if and only if A is true and B is
true. The operator OR denoted ∨ is a binary operator such that A ∨ B is true if and only
if at least one of A and B is true. The operator NOT denoted ¬ is a unary operator such
that ¬A is true if and only if A is false. Using the language of operators, we can see now
that implies, denoted =⇒ , is a binary logical operator. What is the truth table for this
operator?
Now, consider an arbitrary implication A =⇒ B. We can define three operations that
take an implication and produce another implication. The converse of A =⇒ B denoted
CON V is B =⇒ A. The inverse of A =⇒ B denoted IN V is ¬A =⇒ ¬B. The
contrapositive of A =⇒ B denoted CON T R is ¬B =⇒ ¬A. What is the converse of
the inverse of an implication? What is the inverse of the contrapositive? What is the inverse
of the inverse of an implication?
Suppose we have to prove that the implication A =⇒ B is true. We could attempt a
direct proof, where we would assume A holds true and produce a chain of implications ending
with the desired outcome i.e. A =⇒ C =⇒ D =⇒ . . . =⇒ E =⇒ B. Alternatively
we could attempt an indirect proof, which comes in two varieties. First, we could directly
prove the contrapositive ¬B =⇒ ¬A, which is equivalent to A =⇒ B. Second, we could
assume that A =⇒ B is false i.e. A ∧ ¬B is true, and then show that this assumption
leads to a contradiction of a previously proved (or assumed) statement, a technique known
as reductio ad absurbdum or proof by contradiction.
A statement that is true or false conditional on the value of one or more variables is a
conditional statement. E.g. x2 + 3y = 5. Most statements one encounters are conditional
statements. It is important to note that a conditional statement has a determinate truth
value, conditional on the values of each of the variables upon which it depends.
Intimately connected with conditional statements and implications are quantifiers, which
4
John von Neumann is often credited with introducing the axiomatic method in economic theory (for
example, expected utility theory), particularly due to his previous work on the foundations of logic and set
theory in mathematics and on the foundations of quantum mechanics in physics. Kenneth Arrow was an
early (and successful) proponent of this approach, made especially famous in his work on social choice theory.

7
delineate the scope or domain in which the truth of a conditional statement or implication
holds. There are two types of quantifiers: existential and universal. The existential
quantifier can be recognized by the use of words such as “there exists” or “there is/are”, and
can be denoted by ∃. When such a quantifier is present, the truth of the (sub-)statement
to which it is attached is determined by the possibility of constructing or otherwise proving
the existence of at least one object satisfying the conditions of the statement. E.g. (There
exist x, y such that x2 + 3y = 5) is a statement (and not a conditional one), and furthermore
is a true statement, since x = 1, y = 34 allows for the truth of the conditional statement.
Therefore, the only way for a statement with a single, existential quantifier to be false is for
there to be no object that satisfies the conditional statement. Notice that there is an hidden
assumption in the previous example. I argued that the statement is true by constructing
an example (proof by construction). However, I assumed that x, y ∈ R. This is neither
allowed nor disallowed by the statement, which implies that the truth value of the statement
is itself conditional on the domain of the variables x and y. Suppose the statement read
“There exist x, y such x2 + 3y = 5 and x, y ∈ N”. Then, the statement would be false, since
there is no pair of values for the variables that would satisfy the expression and the domain
restrictions.
The universal quantifier can be recognized by the use of words such as “for all/every/any”
and can be denoted by ∀. N.B. “For some” is not a universal quantifier, but an existential
quantifier, even though the word “for” appears. When a universal quantifier is present,
the truth of the (sub-)statement to which it is attached is determined by the possibility
of constructing or otherwise proving the existence of at least one object not satisfying the
conditions of the statement. Any such object would prove the statement false. E.g. (For all
x, y, x2 + 3y = 5) is false since x = 1, y = 1 yields a conditionally false statement.
Sometimes, theorems or other statements involve the negation of quantifiers. Any state-
ment of the form “¬(∀x, A(x))”, where A(x) is a conditional statement, can be written as
“∃x, such that ¬A(x)”. Thus, when negated, the universal quantifier becomes an existential
quantifier, with the attached conditional statement becoming negated. A similar algorithm
allows for the negation of an existential quantifier: “¬(∃x, such that A(x))” is equivalent to
“∀x, ¬A(x)”.
One common method of proof that may be employable is the proof by induction. The
fundamental principle behind induction is that if S ⊆ N such that (S 3 1) ∧ (n ∈ S =⇒
(n + 1) ∈ S), then S = N. Thus, proof by induction can be used whenever the statement
to be proved has a universal quantifier with domain N. The proof requires two steps. The
first step is to show the truth of the conditional statement for some particular m ∈ N. The
second step is to show that the conditional statement is true for some n ∈ N if it is true for

8
n − 1 (or any m < n, m ∈ N).

2.3 Useful Facts


Theorem 2.1 (DeMorgan’s Laws). Let X be some set, and suppose Va ⊂ X for every a ∈ A,
where A is some index set. Then,
c T
≡ a∈A Vac
S
1. a∈A V a

c S
≡ a∈A Vac
T
2. a∈A Va

Definition 2.2. Let f : X → Y .

1. The function f is injective if given x, x0 ∈ X, f (x) = f (x0 ) implies x = x0 .

2. The function f is surjective if for all y ∈ Y , there exists an x ∈ X such that f (x) = y.

3. The function f is bijective if it is injective and surjective.

4. If A ⊂ X, denote by f (A) the set of all images of points in A i.e. f (A) ≡ {y ∈ Y :


f (a) = y, a ∈ A}; we call f (A) the image of A.

5. If B ⊂ Y , denote by f −1 (B) the set of all points in X whose images are in B i.e.
f −1 (B) ≡ {x ∈ X : f (x) ∈ B}; we call f −1 (B) the preimage of B.

Do not be confused by the notation f −1 (B). In particular, it is not a function like an


inverse function. However, if f is bijective, then the preimage notation can be interpreted as
an inverse function. It is best to think of f −1 as an operator acting on subsets of the range
space.

Proposition 2.2. Let f : X → Y , A, A0 ⊂ X, and B, B 0 ⊂ Y . Also, let A be an arbitrary


collection of subsets of X and B be an arbitrary collection of subsets of Y. Then

1. f −1 satisfies the following:

(a) B ⊂ B 0 implies f −1 (B) ⊂ f −1 (B 0 ).


(b) f −1 (B − B 0 ) = f −1 (B) − f −1 (B 0 ).
(c) f −1 ( B∈B B) = B∈B f −1 (B).
S S

(d) f −1 ( B∈B B) = B∈B f −1 (B).


T T

2. Also, f satisfies the following:

(a) A ⊂ A0 implies f (A) ⊂ f (A0 ).

9
(b) f (A − A0 ) ⊃ f (A) − f (A0 ); equality obtains if f is injective.
S S
(c) f ( A∈A A) = A∈A f (A).
T T
(d) f ( A∈A A) ⊂ A∈A f (A); equality obtains if f is injective.

3. Finally, f and f −1 satisfy:

(a) A ⊂ f −1 (f (A)); equality holds if f is injective.


(b) B ⊃ f (f −1 (B)); equality holds if f is surjective.

Proof. We will use an element argument to demonstrate that B ⊂ B 0 =⇒ f −1 (B) ⊂


f −1 (B 0 ). Suppose B ⊂ B 0 and let x ∈ f −1 (B) be some arbitrary element. Then f (x) ∈ B,
by definition of f −1 , and so f (x) ∈ B 0 , since B ⊂ B 0 . Thus, x ∈ f −1 (B 0 ), by definition of
f −1 .
The rest of the proof is left as an exercise. See Exercise 2.1.

Exercise 2.1. Prove all the items in Proposition 2.2. For those statements that don’t hold
with with equality (items 2b, 2d, 3a, 3b), provide examples to show why equality fails to
hold.

3 Real Vector Spaces


Let V be a nonempty set.

Definition 3.1. A real vector space is a set V together with two binary operators (+ and
.) that satisfy the follow axioms (where u, v, w ∈ V and α, β ∈ R):

1. Associativity of vector addition: u + (v + w) = (u + v) + w

2. Commutativity of vector addition: u + v = v + w

3. Identity element of vector addition: There exists an element o ∈ V , the zero vector,
such that v + o = v

4. Inverse element of vector addition: For all v there exists an element w, the additive
inverse, such that v + w = o

5. Distributivity of scalar multiplication over vector addition: α.(u + v) = (α.u) + (α.v)

6. Distributivity of scalar multiplication over field addition: (α + β).u = (α.u) + (β.u)

7. Consistency of scalar multiplication with field multiplication: α(β.u) = (αβ).u

10
8. Identity element of scalar multiplication: 1u = u where 1 ∈ R is the identity element
of the field R

Vector spaces are also called linear spaces. The most familiar real vector space is R,
which is also a field. The Euclidean spaces Rn for finite n are also frequently encountered
real vector spaces. A perhaps less familiar real vector space is the space of all real-valued
continuous functions on the interval [0, 1].

Exercise 3.1. Show that the C([0, 1]), the set of all real-valued continuous functions on the
interval [0, 1] is a real vector space.

Definition 3.2. A normed vector space is a vector space V together with a function
ν : V → R, called a norm, such that for all u, v ∈ V , α ∈ R:

1. ν(u) ≥ 0 and ν(u) = 0 ⇐⇒ u = o

2. ν(α.u) = |α|ν(u)

3. ν(u + v) ≤ ν(u) + ν(v)

The norm formalizes the notion of a “length” of a vector. Consider the Euclidean space
Rn . The adjective “Euclidean” derives from the use of the Euclidean norm, which for a
pPn
vector v ∈ Rn is defined to be 2
i=1 vi , where the vi is the i-th component when written
as a linear combination of the standard orthonormal basis vectors.

Definition 3.3. An real inner product space is a real vector space V together with a
function h·, ·i : V × V → R, called the inner product or dot product, such that for all
u, v, w ∈ V , α ∈ R:

1. Positive-definiteness: hu, ui ≥ 0 and hu, ui = 0 ⇐⇒ u = o

2. Symmetry: hu, vi = hv, ui

3. Linearity: hαu, vi = αhu, vi and hu + v, wi = hu, wi + hv, wi

For ease of notation, I will generally denote hu, vi as u · v.


Inner products formalize the notion of “angles” between vectors. For our spaces of choice,
the Euclidean spaces Rn , for n ∈ N, the usual inner product is defined by u · v ≡ ni=1 (ui vi ).
P
√ p
Thus, ν(u) = u · u. In fact, if h·, ·i is an inner product, then ν(·) ≡ h·, ·i is a norm.

11
4 Metric Spaces
Let X be a nonempty set.

Definition 4.1. A function ρ : X × X → R is a metric if, for any x, y, z ∈ X,

1. Non-negativity and properness: ρ(x, y) ≥ 0 and ρ(x, y) = 0 ⇐⇒ x = y

2. Symmetry: ρ(x, y) = ρ(y, x)

3. Triangle Inequality: ρ(x, y) ≤ ρ(x, z) + ρ(z, y)

A metric space is a nonempty set X together with a metric ρ, denoted (X, ρ).

Metrics formalize the notion of “distance” between points or elements of a set, and metric
spaces allow for notions of convergence and of continuity, as we shall seen soon, though they
are not the most basic way to formalize these notions.
The (finite-dimensional) Euclidean spaces Rn are metric spaces (in fact, they are complete
metric spaces, as we shall soon see), with the metric being derived from the Euclidean norm
as follows: ρ(x, y) ≡ ν(x − y). These spaces are the workhorse for our exploration of classical
and nonlinear programming.

Exercise 4.1. Suppose that (X, ν) is a normed vector space. Show that (X, ρ) is a metric
space when ∀x, y ∈ X, ρ(x, y) ≡ ν(x − y).

The same set could be associated with many different metrics. An example of a different
metric for Rn is the taxicab or Manhattan metric, which is defined for x, y in the set X by
Pn
i=1 |xi − yi |. In general, the metric used with a particular set can alter the mathematical
properties of the associated metric space, but for finite-dimensional Euclidean spaces, the
choice of metric does not affect the continuity properties of functions on the space (for the
set of metrics derived from p-norms).

Definition 4.2. Two metrics ρ1 and ρ2 defined for some set X are strongly equivalent
if there exist positive constants α, β ∈ R such that for all x, y ∈ X,

αρ1 (x, y) ≤ ρ2 (x, y) ≤ βρ1 (x, y)


pP
Exercise 4.2. Show that the standard Euclidean metric ρ(x, y) ≡ i (xi − yi )2 is strongly
equivalent to the taxicab metric.

12
5 Analysis and Topology of Metric Spaces
5.1 Open Sets and Topology
Let (X, ρ) be a metric space.

Definition 5.1. An -ball about x, denoted B (x; ρ), is the set {y ∈ X : ρ(x, y) < },
where  is a positive real number.

When understandable by context, the notation for the metric will be suppressed.

Definition 5.2. A set U ⊂ X is open if for all x ∈ U , there exists  > 0 such that
B (x) ⊂ U .

It should be clear that -balls are open sets.

Definition 5.3. An open neighborhood of a point x ∈ X is an open set U ⊂ X such that


x ∈ U.

Definition 5.4. A point x ⊂ X is an interior point of a set A ⊂ X if there exists  > 0


such that B (x) ⊂ A. The interior of a set A, denoted intA, is the set of all interior points
of A. The point x is a boundary point of the set A if for all  > 0, B (x) ∩ A 6= ∅ and
B (x) ∩ (X − A) 6= ∅. The boundary of a set A, denoted bdA, is the set of all points x ∈ X
that are boundary points of A.

Exercise 5.1. Show that the the interior of a set A ⊂ X is equal to the union of all open
subsets of X that are also subsets of A.

Definition 5.5. A set A ⊂ X is closed if bdA ⊂ A.

Note that a set could be neither open nor closed. Also, a set could be both open and
closed. This is amusingly depicted in the following web comic: http://abstrusegoose.
com/394.
Remark 1. A set A is open if and only if every point is an interior point i.e. A = intA.

Definition 5.6 (Closure). The closure of a set A ⊂ X, denoted clA, is the intersection of
all closed sets containing A. Note that the clA is a closed set.

Definition 5.7 (Bounded Set). Let (X, ρ) be a metric space. A set Y ⊂ X is bounded if
there exists x ∈ Y and r ∈ R+ such that Y ⊂ Br (x, ρ).

Definition 5.8 (Totally Bounded Set). Let (X, ρ) be a metric space. A set Y ⊂ X is totally
S
bounded if for all  ∈ R+ , there exists a finite subset Z ⊂ Y such that Y ⊂ z∈Z B (x, ρ)
i.e. the set Y can be covered by finitely many -balls, for any  > 0.

13
Exercise 5.2. Show that a totally bounded set must also be bounded. Demonstrate with
an example that the converse is not true.

Corollary 5.1. For the Euclidean spaces Rn , every bounded set is totally bounded. Hence,
the definitions are equivalent for these spaces.

Definition 5.9 (Metric Topology). Given a metric space (X, ρ), the metric topology
induced by ρ is the set τ of all open subsets of X, where the open sets are defined as in
Definition 5.2.

Exercise 5.3. Let (X, ρ) be a metric space with the metric topology τ . Show that

1. the sets X and ∅ are both open and closed

2. an arbitrary union of open sets is open

3. the finite intersection of open sets is open

4. the complement of an open set is closed (and vice-versa)

The first three items in the exercise 5.3 could be taken as the axioms for an arbitrary set
X together with a collection τ of subsets of X to define a topological space (X, τ ). Therefore,
while every metric space has an associated topology, one could study a space with a topology
without a metric or with a topology other than the one induced by the metric.

Definition 5.10 (Topological Space). Given an arbitrary set X, a topology τ on X is a


collection of subsets of X that satisfies the following conditions:

1. X and ∅ are elements of τ

2. τ is closed under arbitrary unions i.e. for any subcollection τ 0 ⊂ τ ,


S 
U ∈τ 0 U ∈τ

3. τ is closed under finite intersections i.e. for any finite subcollection τ 0 ⊂ τ ,


T 
U ∈τ 0 U ∈ τ

Member of these sets are called open sets, and complements of open sets are called closed.

Definition 5.11 (Topological Base). Let X be a space. Suppose B is a collection of subsets


of X such that:
S
1. B∈B B=X

2. For every x ∈ B1 ∩ B2 , B1 , B2 ∈ B, there exists B3 ∈ B such that x ∈ B3 ⊂ B1 ∩ B2 .

14
Then B is a base (or basis) for the topology τ , and τ is generated by B as follows: a set
U ⊂ X is open (U ∈ τ ) if for any x ∈ U , there exists B ∈ B such that x ∈ B ⊂ U .

Notice that if one has a topology τ then a collection B is a base if every open set is the
union of base elements. Moreover, every union of base elements is open.

Example 5.1. The intervals (a, b), a < b form a base for the standard topology on R.

Definition 5.12 (Subspace Topology). Let (X, τ ) be a topological space. For Y ⊂ X, the
subspace topology (or relative topology or induced topology) of Y is the collection
τY ≡ {U ∩ Y : U ∈ τ } i.e. the restriction of open sets in X to the set Y .

Definition 5.13 (Box Topology). Let (Xa , τa ), a ∈ A be a family of topological spaces,


Q
indexed by A. The box topology of the Cartesian product X ≡ a Xa is the topology
Q
generated by the base Bbox ≡ { a Ua : Ua ∈ τa } i.e. every open set in X is the union of sets
formed by the cartesian of product of sets open in Xa .

Definition 5.14 (Product Topology). Let (Xa , τa ), a ∈ A be a family of topological spaces,


Q
indexed by A. The product topology of the Cartesian product X ≡ a Xa is the topology
generated by the base Bproduct , every element of which is formed by the cartesian product of
sets a∈A Ya , Ya0 = Xa0 for all a0 ∈ A0 where A − A0 is a finite set, and Ya00 ∈ τa00 for all
Q

a00 ∈ A − A0 . Thus, the base consists of the cartesian product of entire spaces except for a
finite number of the indices, for which the entire space is replaced with some set open in that
particular space.

Notice that if the index set A in the above definitions is finite, then the two bases are
the same, and thus the two topologies are equivalent. Generally, when considering cartesian
products of topological spaces, we will assume unless otherwise stated that the topology of
the product space is the product topology.

Definition 5.15 (Projection Mapping). Let (Xa , τa ) be a family of topological spaces indexed
by the set A. The projection mapping associated with index b ∈ A is the function πb :
Q
X → Xb , where X ≡ a∈A Xa , such that πb ((xa )a∈A ) = xb . Thus, the projection mapping of
an index b associates a point in the cartesian product with its bth coordinate.

Definition 5.16 (Bounded Metric). Let (X, ρ) be a metric space. The metric is bounded
if there exists M such that ρ(x, y) ≤ M for all x, y ∈ X. Thus, we have a bounded metric if
the metric space is itself a bounded set.

A metric need not bounded, but given any metric space (X, ρ), we could construct a
ρ
bounded metric ρ0 ≡ 1+ρ ∈ [0, 1]. Notice that ρ0 preserves the ordering of distances between

15
points i.e. ρ(x, y) ≥ ρ(u, v) ⇐⇒ ρ0 (x, y) ≥ ρ0 (u, v). In fact, there are other bounded
metrics one could define that preserve the ordering of ρ.

Definition 5.17 (Ordinal Equivalence). Let (X, ρ) be a metric space. A metric ρ0 is ordi-
nally equivalent to ρ if for all x, y, u, v ∈ X, ρ(x, y) ≥ ρ(u, v) ⇐⇒ ρ0 (x, y) ≥ ρ0 (u, v).

Definition 5.18 (Equivalence of metrics). Two metrics ρ1 and ρ2 defined for some set X
are (topologically) equivalent if they generate the same topology. In particular, the two
metrics are equivalent if for all x ⊂ X and any  > 0 there exist 0 > 0 and 00 > 0 such
that
B0 (x; ρ1 ) ⊂ B (x; ρ2 ) and B00 (x; ρ2 ) ⊂ B (x; ρ1 )

Exercise 5.4. Suppose ρ is a metric on X. Show that for any strictly increasing, continuous,
subadditive function f : R+ → R+ , where f (0) = 0, the function ρ0 ≡ f ◦ ρ defines a metric
that is equivalent to ρ; a function f is subadditive if for any x, y, f (x + y) ≤ f (x) + f (y).
Don’t forget to prove that ρ0 satisfies the conditions to be a metric. Conclude that the metric
ρ
ρ0 ≡ 1+ρ is equivalent to ρ.

Exercise 5.5. Show that ordinally equivalent metrics are equivalent metrics.

Exercise 5.6. Show that strong equivalence of two metrics for a space X implies the metrics
are equivalent. Note that the converse is not true arbitrary metric spaces, because a bounded
metric can be equivalent to an unbounded metric, but cannot be strongly equivalent to it,
because strong equivalence preserves the boundedness property (can you see why?).

Definition 5.19 (Limit Point). For some topological space (X, τ ), a point x is a limit point
of a set A ⊂ X if every open neighborhood of x intersects A at some point other than x itself.

Notice that a limit point of a set need not be in the set. Closed sets exhibit the property
that they contain all their limit points, which is a corollary to the following exercise.

Exercise 5.7. Show that closure of a set A ⊂ X is the union of A with the set of limit
points of A.

Definition 5.20 (Denseness). Let (X, τ ) be a topological space. A subset Y ⊂ Z is dense


in Z if the closure of Y contains Z.

Definition 5.21 (Separable). A metric space (X, ρ) is separable if there is a dense subset
Y ⊂ X that is countable.

Remark 2 (Density of Rationals). The space of real numbers R are separable, because the
rational numbers are a countable set that is dense in R.

Exercise 5.8. Prove Remark 2.

16
5.2 Sequences
Definition 5.22. A sequence in X is a function a from N to X. A sequence is usually
denoted (xn ), where xn ≡ x(n), n ∈ N.

Definition 5.23 (Convergence: Metric). Let (X, ρ) be a metric space. A sequence (xn ) in
X converges to x if, for every  > 0, there exists an N ∈ N such that whenever n ≥ N ,
ρ(xn , x) < . We say that such an x is the limit of the sequence, with the notation being
lim xn = x. A sequence that does not converge is said to diverge.

This definition of convergence will not work for a topological space without a metric. We
can define convergence of a sequence more generally for such spaces.

Definition 5.24 (Convergence: Topological). Let (X, τ ) be a topological space. A sequence


(xn ) in X converges to x if for every open set U 3 x, the sequence is eventually contained
in the set U i.e. there exists an N such that xn ∈ U whenever n ≥ N .

Exercise 5.9. Show that the two definitions of convergence of a sequence (xn ) are equivalent
for metric spaces.

Definition 5.25 (Sequentially Closed). Let (X, τ ) be a topological space. A set Y ⊂ X is se-
quentially closed if for every convergent sequence (xn ) contained in Y (where convergence
is relative to the topology of X) the limit of the sequence is in Y .

Theorem 5.2. If (X, ρ) is a metric space, then a set is closed if and only if it is sequentially
closed.

For general topological spaces, it is only true that a closed set is sequentially closed. The
converse does not hold for an arbitrary topological space.

Definition 5.26 (Subsequence). For some space X, let (xn ) be a sequence in X and consider
an increasing sequence of natural numbers (mi ). This increasing sequence (mi ) produces a
unique subsequence, (ami ), of the original sequence. Note that the generated subsequence
is itself a sequence.

Definition 5.27 (Cauchy Sequence). Let (X, ρ) be a metric space. A sequence (xn ) is a
Cauchy sequence if, for all  > 0, there exists N ∈ N such that for all m, n ≥ N ,
ρ(an , am ) < .

Notice that Cauchy sequences can only be defined for metric spaces, and not for topo-
logical spaces in general. Thus, completeness is not a topological property, because two
equivalent metrics could yield different completeness properties.

17
Definition 5.28 (Completeness). Let (X, ρ) be a metric space. A set Y ⊂ X is complete
if every Cauchy sequence in Y has a limit in Y .

Remark 3. Our favorite space Rn is a complete metric space.

Definition 5.29 (Bounded Sequence). Let (X, ρ) be a metric space. A sequence (xn ) is a
bounded sequence if the set {xn } is a bounded set.

1. Every convergent sequence is bounded

2. Let lim an = a and lim bn = b, where (an ) and (bn ) are sequences in R. Then,

(a) lim can = ca, ∀c ∈ R


(b) lim(an + bn ) = a + b
(c) lim(an bn ) = ab
(d) lim(an /bn ) = a/b, b 6= 0
(e) (an ≥ 0, ∀n) =⇒ a ≥ 0
(f) (an ≤ bn , ∀n) =⇒ a ≤ b
(g) (∃c ∈ R, c ≤ bn , ∀n) =⇒ c ≤ b. A similar statement with the inequalities reverse
also holds.

3. Every monotone and bounded sequence converges.

4. Subsequences of a convergent sequence converge to the same limit as the original se-
quence.

Exercise 5.10 (Closed Sets Inherit Completeness). Let (X, ρ) be a complete metric space.
Show that any closed subset Y ⊂ X is also complete.

5.3 Continuity
Definition 5.30 (Continuity at a point: Topological definition). Let (X, τX ) and (Y, τY ) be
topological spaces. A function f : X → Y is continuous at x, if for all open neighborhoods
V of f (x), there exists an open neighborhood U of x such that f (U ) ⊂ V i.e. the pre-image
of open neighborhoods of f (x) are open.

Definition 5.31 (Continuity at a point: Cauchy-Weierstrass definition). Let (X, ρX ) and


(Y, ρY ) be metric spaces. A function f : X → Y is continuous at x, if for all  > 0, there
exists δ > 0 such that for all x0 ∈ X, ρX (x, x0 ) < δ implies ρ(f (x), f (x0 )) < .

18
Definition 5.32 (Sequential Continuity at a point: Heine definition). Let (X, τX ) and
(Y, τY ) be topological spaces. A function f : X → Y is sequentially continuous at
x, if for all sequences (xn ) in X that converge to x, the sequence (f (xn )) converges to f (x)
i.e. sequentially continuous functions preserve limits.

Definition 5.33. A function f : X → Y is continuous if it is continuous at every point


x ∈ X. A function f : X → Y is sequentially continuous if it is sequentially continuous
at every point x ∈ X.

For metric spaces, all both definitions of continuity 5.30 and 5.31 are equivalent. More-
over, for metric spaces continuity and sequentially continuity are equivalent. However, in
more general topological spaces, sequential continuity does not imply continuity, but the
converse is still true.

Exercise 5.11. For arbitrary topological spaces X and Y , show that any continuous function
f : X → Y is sequentially continuous.

Exercise 5.12. Suppose f : Rn → R is continuous under the Euclidean metric. Show that
function f is continuous under any metric on Rn that is equivalent to the Euclidean metric.

Exercise 5.13. Show that the composition of two continuous functions is continuous.

Definition 5.34 (Uniform Continuity). A function f : X → R is uniformly continuous


if for all  > 0 there exists a δ > 0 such that for any x, y ∈ X if ρ(x, y) < δ, then
|f (x) − f (y)| < .

Notice the slight change in the order of the quantifiers in the definition of uniform con-
tinuity from the Cauchy-Weierstrass definition of continuity.

Exercise 5.14. Suppose f : Rn → R is uniformly continuous under the Euclidean metric.


Show that function f is uniformly continuous under any metric on Rn that is strongly
equivalent to the Euclidean metric.

Proposition 5.3 (Continuity of Projection Mappings). Let (Xa , τa ) be a family of topological


Q
spaces indexed by the set A, and define X ≡ a Xa . The projection mappings are continuous
in both the product and box topologies.

Proof. Suppose Ub ∈ Xb is an open set. Then the preimage of Ub under the projection
Q
mapping πb is the set a Ya where Ya = Xa for all a 6= b and Yb = Ub . But this set is an
element of both the base Bbox and Bproduct and is therefore open in both the box and the
product topology. Thus, the preimage of open sets are open for any projection mapping,
and so these mappings are continuous.

19
Theorem 5.4. Let (Xa , τa ) be a family of topological spaces indexed by the set A, and define
Q
X ≡ a Xa with the product topology. Suppose f : Y → X is defined by f (y) ≡ (fa (y))a∈A ,
where fa : Y → Xa for every a. Then f is continuous if and only if fa is continuous for all
a ∈ A.

The previous theorem is not true for infinite cartesian products with the box topology.
The following provides a simple counterexample.

Example 5.2. Suppose f : R → R∞ . Suppose that fn : R → R is defined by fn (t) = t


for all n ∈ N. Thus, f (t) = (t, t, t, . . .). For each n, fn is continuous is the standard
topology of R. However, f is not continuous when R∞ has the box topology. Consider the
set U = (−1, 1) × (− 21 , 12 ) × (− 31 , 13 ) × · · · . It is clear that U is open in R∞ under the box
topology, since (− n1 , n1 ) is open in R for any n ∈ N. However, f −1 (U ) is not open in R. To
demonstrate this, suppose to the contrary f −1 (U ) were open. Then it would have to be an
interval around 0, say (−, ), which implies that f ((−, )) ⊂ U . Applying the projection
mapping to the left side of the previous inclusion yields πn (f ((−, ))) = fn ((−, )) = (−, )
and to the right side yields πn (U ) = (− n1 , n1 ). Thus, (−, ) ⊂ (− n1 , n1 ) for all n, which yields
a contradiction since -interval could satisfy this, and since the set {0} is not open.

5.4 Compactness
The most general definition, one that works for an arbitrary topological space, involves
the notion of covers.

Definition 5.35 (Open Cover). Let (X, τ ) be a topological space, and F = {Uα ∈ τ : α ∈ A}
be an indexed family of open sets, where A is an index set. Then, F is an open cover of
S
X if X ⊂ α∈A Uα .

Definition 5.36 (Finite Subcover). Given an open cover F of X, a finite subcover is a


finite subcollection of set from the original open cover F whose union still contains X.

Definition 5.37 (Compact Set: Heine-Borel (Topological) definition). A set S ⊂ X is


compact if every open cover of S has a finite subcover.

The topological definition is quite abstract and at this stage obscure; I include it for the
sake of completeness of exposition. A somewhat more useful but still abstract definition
involves the finite intersection property.

Definition 5.38 (Finite Intersection Property). A collection of sets A has the finite in-
tersection property if every finite subcollection {A1 , . . . , Am } has a nonempty intersection
i.e. m
T
i=1 Ai 6= ∅.

20
Theorem 5.5. A set S ⊂ X is compact if and only if every collection of closed subsets of
T
S, A, with the finite intersection property has a nonempty intersection i.e. A∈A A 6= ∅.

For metric spaces, the following notion, sequential compactness is equivalent to compact-
ness, and is for us a more useful definition.

Definition 5.39 (Sequential Compactness). For a topological space (X, τ ), a set S ⊂ X


is sequentially compact if every sequence in S has a subsequence that converges to a
limit that is also in S. If X is a metric space, then sequential compactness is equivalent to
compactness.

Theorem 5.6 (Heine-Borel Theorem). A nonempty set S ⊆ Rn (with the Euclidean metric)
is compact if and only it is closed and bounded.

The Heine-Borel Theorem allows us an easy characterization of compact sets in Rn .


Compact sets are useful because they behave as though they are finite sets (hence the word
compact). In economics, we often assume that the sets we are working with are compact,
particularly because of the Weierstrass Theorem. The theorem makes it easy to identify
whether a given set from a Euclidean space is compact or not. There is a generalization of
this theorem to metric spaces that requires some strengthening of the conditions.

Theorem 5.7. Let (X, ρ) be a metric space. A set Y ⊂ X is compact if and only if it is
complete and totally bounded.

Theorem 5.8 (Bolzano-Weierstrass Theorem). Every bounded sequence (xn ) in Rn has a


convergent subsequence. Equivalently, a subset of Rn is sequentially compact (hence compact)
if and only if it is closed and bounded.

For metric spaces, the Bolzano-Weierstrass Theorem is essentially the same as the Heine-
Borel Theorem because of the equivalence of the compactness and sequential compactness.
The following is a crucial theorem from which the Weierstrass Extreme Value Theorem
follows quite simply.

Theorem 5.9 (Continuous Mappings Preserve Compactness). Let (X, τX ) and (Y, τY ) be
topological spaces, and suppose f : X → Y is a continuous function. Then for any K ⊂ X
that is compact, f (K) is compact.

Proof. Let AY ≡ {Va : a ∈ A} be an open covering of the image f (K) of a compact set
K ⊂ X. Now, since f is continuous, the pre-image of every member of the collection AY
is open in X, so we have an open covering AX ≡ {f −1 (V ) : V ∈ AY } of K. Since K is
compact, every open cover has a finite subcover, and so there exists a finite subset A0 ⊂ A

21
such that K ⊂ a∈A0 ,Va ∈AY f −1 (Va ). Now, since A0 defines a subcover of K relative to the
S

collection AX , it defines a subcover of f (K) relative to the collection AY . But A0 is finite


and so we have a finite subcover for f (K), proving compactness of f (K).

Theorem 5.10 (Uniform Continuity Theorem). Let (X, ρX ) and (Y, ρY ) be metric spaces.
If f : X → Y is a continuous function, and X is a compact space, then f is uniformly
continuous.

The following theorem is a very useful result that says the Cartesian product of compact
spaces is compact.

Theorem 5.11 (Tychonoff Theorem). Suppose (Xa , τa ) is a compact space for any a ∈ A,
Q
where A is some index set. Then a∈A Xa is compact in the product topology.

Proposition 5.12. The following are some useful results about bounded sets and compact
sets:

1. The union of an arbitrary collection of bounded sets is not necessarily bounded.

2. The union of a finite collection of bounded sets is bounded.

3. The intersection of an arbitrary collection of bounded sets is bounded.

4. The sum of two bounded sets is bounded.

5. The union of an arbitrary collection of compact sets is not necessarily compact.

6. The union of a finite collection of compact sets is compact.

7. The sum of two compact sets is compact.

8. Closed subsets of compact spaces are compact.

Exercise 5.15. Prove item 8 in Proposition 5.12.

5.5 Connectedness
Definition 5.40 (Connectedness). A space (X, τ ) is connected if there do not exist two
nonempty open disjoint sets U and V such that X = U ∪ V . A subset S ⊂ X is connected
in X if it is a connected space under the subspace topology.

Proposition 5.13. A space (X, τ ) is connected if and only if the only sets that are both
open and closed are X and ∅.

22
Proof. Assumed connectedness of X. Suppose U ( X is nonempty and open. Then, U c is
closed. But connectedness implies that U c is no open. Since the complement of every closed
set is an open set, and since U is a generic nonempty open strict subset of X, generically
every nonempty closed strict subset of X is not open.
The proof of the other direction is trivial.

Proposition 5.14 (Results about Connected Sets). Some results involving connected sets:

1. Continuous maps preserve connectedness i.e. the image of a connected set under a
continuous function is connected.

2. Finite Cartesian products of connected sets are connected. Arbitrary Cartesian products
of connected sets are connected under the product topology, but not the box topology.

3. The real line R is connected, as are intervals and rays (intervals that are unbounded
on one side).

As you continue your study of economics, you will find fixed point theorems pop up every-
where in microeconomics, because of their usefulness in proving equilibrium existence, from
Walrasian equilibrium in the Arrow-Debreu-McKenzie-Nikaido general equilibrium model to
Nash equilibrium in game theory. Fixed point theorems generally state that for some map-
ping ψ : Y → Y , for some space Y , there exists a solution to the equation ψ(y) = y. You
may not realize this, but you are probably already familiar with a fixed theorem, just not by
that name. Acemoglu argues, convincingly, that the Intermediate Value Theorem has the
quality of a fixed point theorem. Let us first see a statement of the theorem.

Theorem 5.15 (Intermediate Value Theorem). Let (X, τ ) be a connected topological space.
Suppose f : X → Y is a continuous function, where Y ⊂ R endowed with the standard
(subspace) topology5 . If a, b ∈ X and there exists z ∈ R such that f (a) ≤ z ≤ f (b), then
there exists c ∈ X such that f (c) = z.

To see why the Intermediate Value Theorem resembles a fixed point theorem consider a
function f : X → X, where X is a compact, connected subset of R i.e. X = [a, b], a ≤ b.
Then, if f is continuous there exists c ∈ [a, b] such that f (c) = c. This follows quite simply
from an application of the Intermediate Value Theorem.

5
The theorem could be generalized by taking the space Y to be any ordered space endowed with the
order topology.

23
5.6 Sequences of Functions
Suppose (fn ) is a sequence of functions from some set X to a metric space (Y, ρ).

Definition 5.41 (Pointwise Convergence). The sequence (fn ) converges pointwise to a


function f if for all x ∈ X, the sequence (fn (x)) converges to f (x). In notation, for all
x ∈ X, for all  > 0 there exists N such that for all n ≥ N , ρ(fn (x), f (x)) < .

Definition 5.42 (Uniform Convergence). The sequence (fn ) converges uniformly to a


function f if for all  > 0 there exists N such that for any x ∈ X and for all n ≥ N ,
ρ(fn (x), f (x)) < . Equivalently, we have uniform convergence if lim sup{ρ(fn (x), f (x)) :
x ∈ X} = 0.

Notice the change in the order of quantifiers that is similar to the swap for uniform
continuity. Pointwise convergence looks at the convergence of the function at a point, treating
each point as a sequence by itself, whereas uniform convergence ties together the “rate of
convergence” of sequences at each point x by requiring the same threshold N for all points
x.

Theorem 5.16 (Uniform Convergence Theorem). Suppose (fn ) is a sequence of functions


fn : X → Y , where X is a topological space and Y is a metric space. If (fn ) is a sequence
of continuous functions that converges uniformly to a function f , then f is continuous.

Example 5.3. Let X ≡ [0, 1] and Y ≡ R. Suppose fn : X → Y is defined by fn (x) ≡ xn ,


where n ∈ N. Notice that (fn ) converges pointwise to f , where f (x) = 0 for all x ∈ [0, 1)
and f (x) = 1, x = 1. Thus, a sequence of continuous functions converges to a discontinuous
function. However, this sequence of functions does not converge uniformly.

Proof. We shall demonstrate the pointwise convergence of (fn ) to f . Choose some x ∈ (0, 1)
and define (an ) by an ≡ fn (x) = xn . Let  > 0. Then, for all n > N ≡ | log 
log x|
, |xn − 0| < 
and thus an converges to 0. For x = 0 and x = 1 it is clear that (fn (x)) converges to f (0)
and f (1), respectively.

6 Acknowledgements
These notes greatly benefited from the notes of Kim Border at http://www.hss.
caltech.edu/~kcb/Notes.shtml and from Appendix A of Daron Acemoglu’s “Introduc-
tion to Modern Economic Growth”. I have also consulted James Munkres’ “Topology”.

24
7 References
Simon, Blume: Ch. 12, 29

Acemoglu: Appendix A.1–5

Abbott, Stephen. Understanding Analysis. Springer-Verlag, New York. 2001.

Simon, Carl P., Lawrence Blume. Mathematics for Economists. Norton, New York. 1994.

Solow, Daniel. How to Read and Do Proofs. 3.ed. Wiley, New York. 2002.

25
Part II

Static Optimization
We will study here techniques that fall under the category of (static) nonlinear program-
ming. While these techniques still apply for the subdomain of linear programming, there
exist stronger results for that domain that we will not explore. Good references for linear
programming include Dantzig and Intrilligator.

8 Statement of the Problem


The general optimization problem (for our purposes) consists of an objective function, as-
sumed to be real-valued, together with a set of inequality constraints and equality constraints.
Given θ ∈ Θ, the problem is to find x ∈ X that solves

max f (x, θ) (8.1)

subject to the inequality constraints

gj (x, θ) ≤ 0, 1≤j≤J (8.2)

and the equality constraints


hk (x, θ) = 0, 1≤k≤K (8.3)

If we define the constraint set as


! !
\ \
C(θ) ≡ X ∩ {x : gj (x, θ) ≤ 0} ∩ {x : hk (x, θ) = 0} (8.4)
1≤j≤J 1≤k≤K

then the problem can be more compactly written as

max f (x, θ). (8.5)


x∈C(θ)

A solution to equation (8.5) is a global maximizer.

Definition 8.1. A point x∗ ∈ C(θ) is a global maximizer (or just maximizer) for the
maximization problem (8.5) if for all x ∈ C(θ), f (x∗ , θ) ≥ f (x, θ). It is a strict global
maximizer if for all x ∈ C(θ), x 6= x∗ , f (x∗ , θ) > f (x, θ). The definition of (strict) global
minimizer has the inequality reversed.

26
While not necessarily a solution to the maximization problem, local maximizers are in-
teresting candidates for solutions since global maximizers are necessarily local maximizers.

Definition 8.2. A point x∗ ∈ C(θ) is a local maximizer for the maximization problem
(8.5) if there exists an open6 neighborhood U ⊂ C(θ) of x∗ such that for all x ∈ U , f (x∗ , θ) ≥
f (x, θ). It is a strict local maximizer if for all x ∈ U , x 6= x∗ , f (x∗ , θ) > f (x, θ). The
definition of (strict) local minimizer has the inequality reversed.

Suppose x∗ ∈ C(θ) is a solution of equation (8.5) (where the notation of the dependence
of x∗ on θ has been suppressed). Then the value function can defined as follows: V (θ) ≡
maxx∈C(θ) f (x, θ) = f (x∗ , θ).
Consider a statement of the following sort: “If x∗ ∈ C(θ) solves (8.5), then condition
A”, where A is a mathematical statement. This type of statement describes a necessary
condition for a maximizer. Suppose instead we have a statement of the following sort: “If
condition A, then x∗ ∈ C(θ) solves (8.5)”. This type of statement describes a sufficient
condition for a maximizer. A necessary condition furnishes a set of potential solutions and
guarantees that any solution is a member of this set. A sufficient condition furnishes a set
of guaranteed solutions but potentially excludes some solutions. Sufficient conditions can be
viewed as existence theorems.

9 Existence of Optima
Theorem 9.1 (Finite Constraint Set). If the constraint set C(θ) is nonempty and finite,
then the objective function f has both a maximizer and a minimizer in the constraint set.

Theorem 9.2 (Weierstrass Extreme Value Theorem). If the constraint set C(θ) is nonempty
and compact and f is continuous, then f has both a maximizer and a minimizer in the
constraint set.

Proof. Since continuous functions map compact sets to compact sets (see Theorem 5.9),
V ≡ f (C(θ)) ⊂ R is a compact set. By the Heine-Borel Theorem, V is closed and bounded.
But any closed and bounded subset of R contains its least upper bound, and thus has a
maximal value. Then, there exists some x ∈ C(θ) that maps to the maximal member of the
set V , and so x is a maximizer. The proof for the case of the existence of a minimizer is
analogous.

6
The relevant topology for the problem is the relative topology of C(θ) derived from that of X, both of
which can be generated from the metric of the space X.

27
Corollary 9.3. If X ≡ RN and the constraint set C(θ) is nonempty, closed and bounded,
and if f is continuous, then f has both a maximizer and a minimizer.

Example 9.1. The following are examples illustrating the role of the assumptions of the
Weierstrass Theorem.

1. Suppose C(θ) = R, which is nonempty but not compact (since it is not bounded).
Then the continuous function f (x) = x has no maximizer. Also if C(θ) = (0, 1) which
is bounded but not closed and so not compact, then f has no maximizer. But if
C(θ) = (0, 1], which is also nonempty and not compact, then our previously defined
continuous function has a maximizer.

2. Suppose C(θ) = [−1, 1], which is nonempty and compact. Then the discontinuous
function (
1 − |x| : x 6= 0
f (x) =
0 : x=0
has no maximizer.

3. Suppose we have a discontinuous function f on R such that f (x) = 1 when x is rational,


and f (x) = 0 when x is irrational. Then f has a maximizer when C(θ) ≡ R, which is
a noncompact set.

The condition of continuity of the objective function can be weakened to yield a gener-
alization of the Weierstrass Theorem.

Definition 9.1 (Level Sets). Let f : X → R be a function on some space X.


The level set of f at α (also termed the contour or isoquant) is the set I(α; f ) ≡
{x ∈ X : f (x) = α}.
The upper level set of f at α (also termed the upper contour) is the set U (α; f ) ≡
{x ∈ X : f (x) ≥ α}.
The lower level set of f at α (also termed the lower contour) is the set L(α; f ) ≡
{x ∈ X : f (x) ≤ α}.

When clear from context, I will denote upper and lower sets of a function without refer-
ence to the function.

Definition 9.2 (Semicontinuity). Let (X, ρ) be a metric space. A function f : X → R is


upper semicontinuous if for all α ∈ R, the upper level set U (α) is closed. It is lower
semicontinuous if for all α ∈ R, the lower level set L(α) is closed. A function f is
continuous if and only if it is both upper and lower semicontinuous.

28
Theorem 9.4 (Generalized Weierstrass Theorem). Suppose the constraint set C(θ) is com-
pact. If f is upper semicontinuous, then it has a maximizer in the constraint set. If f is
lower semicontinuous, then it has a minimizer.

10 Convex Sets and Functions on Convex Sets


Before we dig deeper into necessary or sufficient conditions for optimizers, we will define
and understand the properties of four special classes of functions, quasiconcave, concave,
quasiconvex, and convex functions.
Suppose V is a vector space, for example RN .

Definition 10.1 (Convex Set). A set S ⊂ V is convex if for all x, y ∈ S, λx + (1 − λ)y ∈ S


for all λ ∈ (0, 1). A set is strictly convex if for all x, y ∈ S, λx + (1 − λ)y ∈ intS for all
λ ∈ (0, 1).

The empty set is assumed to be convex.

Proposition 10.1. The intersection of an arbitrary family of convex sets is convex.

Definition 10.2 (Convex Hull). The convex hull of a set S ⊂ V , denoted cvxS is smallest
(under the set inclusion order) convex set that contains S. Equivalently, it is the intersection
of all convex sets that contain S.

Definition 10.3 (Concavity and Convexity). Let f be a real-valued function on a convex


subset S of a vector space V .
The function f is concave if for all distinct x, y ∈ S, f (λx + (1 − λ)y) ≥ λf (x) + (1 −
λ)f (y) for all λ ∈ (0, 1).
The function f is strictly concave if for all distinct x, y ∈ S, f (λx + (1 − λ)y) >
λf (x) + (1 − λ)f (y) for all λ ∈ (0, 1).
The function f is convex if for all distinct x, y ∈ S, f (λx+(1−λ)y) ≤ λf (x)+(1−λ)f (y)
for all λ ∈ (0, 1).
The function f is strictly convex if for all distinct x, y ∈ S, f (λx + (1 − λ)y) <
λf (x) + (1 − λ)f (y) for all λ ∈ (0, 1).

Definition 10.4 (Quasiconcavity and Quasiconvexity). Let f be a real-valued function on


a convex subset S of a vector space V .
The function f is quasiconcave if for all distinct x, y ∈ S, f (λx + (1 − λ)y) ≥
min{f (x), f (y)} for all λ ∈ (0, 1).

29
The function f is strictly quasiconcave if for all distinct x, y ∈ S, f (λx + (1 − λ)y) >
min{f (x), f (y)} for all λ ∈ (0, 1).
The function f is quasiconvex if for all distinct x, y ∈ S, f (λx + (1 − λ)y) ≤
max{f (x), f (y)} for all λ ∈ (0, 1).
The function f is strictly quasiconvex if for all distinct x, y ∈ S, f (λx + (1 − λ)y) <
max{f (x), f (y)} for all λ ∈ (0, 1).

Concavity and convexity could also be defined in terms of hypographs and epigraphs.

Definition 10.5 (Graph, Hypograph, Epigraph). The graph of a function f : X → R,


where X is a convex set, is the set G(f ) ≡ {(x, α) : f (x) = α} ⊂ X × R. The hypograph is
H(f ) ≡ {(x, α) : f (x) ≥ α} ⊂ X ×R. The epigraph is E(f ) ≡ {(x, α) : f (x) ≤ α} ⊂ X ×R.

Proposition 10.2. Suppose we have a function f : X → R, where X is a convex set.


The function f is concave if its hypograph H(f ) is convex, and is strictly concave if its
hypograph is strictly convex.
The function is convex if its epigraph E(f ) is convex, and is strictly convex if its epigraph
is strictly convex.

Quasiconcavity and quasiconvexity could also be defined in terms of level sets 7 .

Proposition 10.3. A function f is quasiconcave if and only if for all α ∈ R, U (α) is convex.
A function f is strictly quasiconcave if and only if for all α ∈ R, U (α) is strictly convex.
A function f is quasiconvex if and only if for all α ∈ R, L(α) is convex.
A function f is strictly quasiconvex if and only if for all α ∈ R, L(α) is strictly convex.

Notice that the definitions of these properties do not require continuity or differentiability.
In fact, the weak version of these properties have do not require a topology on the space. The
strict version of these properties (for example, strict quasiconcavity) does require the vector
space to have a norm, however, because our definition of a strictly convex set makes reference
to the interior of the set, which is a topological concept. If we strengthen the assumptions to
include differentiability (of varying degrees), we can obtain alternative conditions that are
necessary or sufficient for these properties. We shall see this below.
It is straightforward to show that every concave function is quasiconcave and every convex
function is quasiconvex. The converse is not true. For example, any monotonic function is
both quasiconcave and quasiconvex, but only linear functions are both concave and convex.
Positive monotonic transformations of a concave (convex) function do not preserve concavity
(convexity) necessarily, but they do preserve quasiconcavity (quasiconvexity).
7
The definition in terms of level sets is the one put forth by Arrow, Enthoven (Econometrica 1961).

30
Proposition 10.4. Let f : S → R be a quasiconcave (quasiconvex) function. Then, for any
nondecreasing function g : R → R, g ◦ f is quasiconcave (quasiconvex).

Example 10.1. Suppose f : R+ → R is defined by f (x) = x. This function is strictly
concave. Now, suppose g : R → R is defined by g(x) = x4 , which is a non-decreasing
function. Notice that g ◦ f (x) = x2 , which is a strictly convex function. Thus, concavity is
not preserved. However, both f and g ◦ f are quasiconcave.

A natural question to ask is whether every quasiconcave function is just a non-negative


monotonic transformation of some concave function. The answer is no; the following is an
example from Arrow, Enthoven (Econometrica 1961).
1
Example 10.2. Suppose f (x, y) = (x − 1) + ((x − 1)2 + 4(x + y)) 2 . The level sets of f
are nonparallel straight lines (a Grapher file of the function and level sets is available here:
https://www2.bc.edu/samson-alva/ec720f11/arrowQCexample.gcx).
Another example originally from Aumann (Econometrica 1975). Suppose f (x, y) = y +
p
x + y 2 . Again, the level sets of f are nonparallel straight lines (a Grapher file of the
function and level sets is available here: https://www2.bc.edu/samson-alva/ec720f11/
aumannQCexample.gcx). Notice that f is strictly concave when restricted to either the
first or the second dimension, but linearity of the level sets implies that it is only weakly
quasiconcave.
Philip Reny (2010) proves that a continuous quasiconcave function cannot be transformed
by a strictly increasing function into a concave function unless it has parallel level sets (his
result is actually even stronger than this). The two examples above are such continuous
quasiconcave functions.

Afriat’s Theorem states that for any finite set of choices satisfying the Generalized Axiom
of Revealed Preference there exists a continuous strictly increasing concave utility function
that would generate those choices.
For more details on concavifiability of quasiconcave functions, see the extensive discussion
in Connell, Rasmusen (2011).

Proposition 10.5. Here are some useful results about quasiconcave and concave functions:

1. If f is strictly concave, and h is strictly increasing, then h ◦ f is strictly quasiconcave.

2. If f is strictly quasiconcave and h is strictly increasing, then h ◦ f is strictly quasicon-


cave.

3. If f is strictly quasiconcave and h is nondecreasing, then h ◦ f is weakly quasiconcave.

31
4. If f is weakly but not strictly quasiconcave and h is nondecreasing, then h ◦ f is weakly
quasiconcave.

5. If f is weakly but not strictly quasiconcave and h is strictly increasing, then h ◦ f is


NOT necessarily strictly quasiconcave.

Proposition 10.6. Here are some useful results about quasiconvex and convex functions:

1. If fi is quasiconvex, wi ≥ 0 then f ≡ maxi {wi fi } is quasiconvex.

2. If fi is convex, then maxi {fi } is convex.

3. If f, g are convex, and g is nondecreasing, then g(f ) is convex.

4. If f, g are concave, and g is nonincreasing, g(f ) concave.

Now, let’s make some assumptions about differentiability.

Theorem 10.7. Let X ⊂ RN , and suppose f : X → R, f ∈ C 1 .

1. f is concave if and only if, for all x, y ∈ X, f (y) − f (x) ≤ Df (x)(y − x).

2. f is strictly concave if and only if, for all x, y ∈ X, y 6= x, f (y) − f (x) < Df (x)(y − x).

3. f is convex if and only if, for all x, y ∈ X, f (y) − f (x) ≥ Df (x)(y − x).

4. f is strictly convex if and only if, for all x, y ∈ X, y 6= x, f (y) − f (x) > Df (x)(y − x).

5. f is quasiconcave if and only if, for all x, y ∈ X, f (y) ≥ f (x) implies Df (x)(y−x) ≥ 0.

6. If, for all x, y ∈ X, y 6= x, f (y) ≥ f (x) implies Df (x)(y − x) > 0, then f is strictly
quasiconcave. The converse is not true, as discussed below.

7. f is quasiconvex if and only if, for all x, y ∈ X, f (y) ≤ f (x) implies Df (x)(y − x) ≤ 0.

8. If, for all x, y ∈ X, y 6= x, f (y) ≤ f (x) implies Df (x)(y − x) < 0, then f is strictly
quasiconvex. The converse is not true, as discussed below.

Theorem 10.8. Let X ⊂ RN , and suppose f : X → R, f ∈ C 2 .

1. f is concave if and only if for all x, D2 f (x) is negative semidefinite.

2. If, for all x, D2 f (x) is negative definite, then f is strictly concave.

3. f is convex if and only if for all x, D2 f (x) is positive semidefinite.

32
4. If, for all x, D2 f (x) is positive definite, then f is strictly convex.

5. f is quasiconcave if and only if for all x, D2 f (x) is negative semidefinite on the


nullspace8 of Df (x).

6. If, for all x, D2 f (x) is negative definite on the nullspace of Df (x), then f is strictly
quasiconcave.

7. f is quasiconvex if and only if for all x, D2 f (x) is positive semidefinite on the nullspace
of Df (x).

8. If, for all x, D2 f (x) is positive definite on the nullspace of Df (x), then f is strictly
quasiconvex.

There is a characterization of (semi)definite matrices involving determinants.

Definition 10.6 (Principal Minors). Let A be a real-valued, symmetric N ×N matrix. Then


a principal minor of order m of the matrix A is a submatrix of A where all but m rows
and corresponding (by index) columns are deleted. There are m!(NN−m)!
!
principal minors of
order m.
The leading principal minor of order m is the principal minor of order m with the
last N − m rows and columns deleted.

The following theorems characterize (semi)definiteness of a symmetric matrix.

Theorem 10.9 (Characterization of Definiteness). Suppose A is a real-valued, symmetric


N × N matrix.

1. A is negative definite if and only if the determinant of the leading principal minor of
order m is nonzero and has the sign (−1)m , for all 1 ≤ m ≤ N .

2. A is positive definite if and only if the determinant of the leading principal minor of
order m is strictly positive, for all 1 ≤ m ≤ N .

3. A is negative semidefinite if and only if the determinant of every principal minor of


order m is zero or has the sign (−1)m , for all 1 ≤ m ≤ N i.e. odd-ordered prin-
cipal minors have nonpositive determinants and even-ordered principal minors have
nonnegative determinants.

4. A is positive semidefinite if and only if the determinant of every principal minor of


order m is nonnegative, for all 1 ≤ m ≤ N .
8
The nullspace of a vector is the set of all vectors that are orthogonal to it. The nullspace of a matrix
is the set of all vectors that the matrix maps to the zero vector.

33
Checking definiteness of a matrix on a subspace requires using the Bordered Matrix test,
where the matrix in question is bordered on upper and on the left side by the constraints.

Definition 10.7 (Bordered Matrix). Let A be a real-valued symmetric N × N matrix and


bk ∈ RN for k ∈ {1 . . . K}, a set of independent vectors. Let B be the N × K matrix

b1 . . . bk , and denote by B 0 the transpose of B. Then, the bordered matrix of A with
!
0 B0
respect to B is H ≡ .
B A

Definition 10.8 (Border-Respecting Principal Minors). Let A be a real-valued, symmetric


N ×N matrix, and B be a real-valued N ×K matrix of full rank, and denote by H the bordered
matrix of A with respect to B. Then a border-respecting principal minor of order m
of the bordered matrix H is a submatrix of H where all but m rows and corresponding (by
index) columns are deleted, with the restriction that the index of a deleted row (and column)
be greater than K.
The leading border-respecting principal minor of order m is the principal minor
of order m with the last N + K − m rows and columns deleted.

Theorem 10.10 (Characterization of Definiteness on a Linear Constraint Set). Suppose A


N
is a real-valued symmetric N ×N matrix
 and bk ∈ R for k ∈ {1 . . . K}, a set of independent
vectors. Let B be the N × K matrix b1 . . . bk , and denote by B 0 the transpose of B.
!
0 B0
Define the bordered matrix H ≡ .
B A

1. A is negative definite on the subspace {v : B 0 v = 0} if and only if for each m ∈


{2K + 1 . . . N + K}, the determinant of the leading border-respecting principal minor
of order m of matrix H is nonzero and has the sign (−1)(m−K) i.e. the determinant of
H has the sign (−1)N and the signs of the last (largest) N −K leading border-respecting
principal minors have alternating signs.

2. A is positive definite on the subspace {v : b0 v = 0} if and only if for each m ∈


{2K + 1 . . . N + K}, the determinant of the leading border-respecting principal minor
of order m of matrix H is nonzero and has the sign (−1)K .

The characterization of semidefiniteness on a linear constraint set involves testing every


border-respecting principal minor of every order m ∈ {2K + 1, . . . , N + K}, and not just
the border-respecting leading principal minors, analogous to the characterization of semidef-
initeness of an unconstrained symmetric matrix.

34
Theorem 10.11 (Characterization of Semidefiniteness on a Linear Constraint Set). Suppose
we have A, bk , and B as in Theorem 10.10.

1. A is negative semidefinite on the subspace {v : b0 v = 0} if and only if for each m ∈


{2K +1 . . . N +K}, the determinant of every border-respecting principal minor of order
m alternates in sign or is equal to zero, with the sign of the determinant of H being
(−1)N or equal to zero i.e. every border-respecting principal minor of order m has a
nonpositive determinant if m − K is odd and has a nonnegative determinant if m − K
is even.

2. A is positive semidefinite on the subspace {v : b0 v = 0} if and only if for each m ∈


{2K +1 . . . N +K}, the determinant of every border-respecting principal minor of order
m is nonnegative if K is even and is nonpositive if K is odd.

Therefore, to test for, say, quasiconcavity of a twice continuously differentiable function


f in the neighborhood of a point x, we need to find the Hessian of f evaluated at x, which
is a real-valued symmetric matrix, and check whether this Hessian, when bordered by the
Jacobian of f evaluated at x, passes the test of negative semidefiniteness of a matrix on a
linear subspace described in Theorem 10.11.

10.1 References
Afriat. The Construction of Utility Functions from Expenditure Data. (International
Economic Review 1967)
Arrow, Enthoven. Quasiconcave Programming. (Econometrica 1961)
Aumann. (Econometrica 1975)
Connell, Rasmusen. Concavifying the Quasiconcave. (Working Paper 2011)
Reny. A Simple Proof of the Nonconcavifiability of Functions with Linear Not-All-Parallel
Contour Sets. (Working Paper 2010)

11 Unconstrained Optimization With a Differentiable


Objective Function
11.1 Overview
- FONC can be derived using a first-order Taylor expansion, which requires the objective
function to be continuously differentiable

35
- SONC can be derived using a second-order Taylor expansion, which requires the objec-
tive function to be twice continuously differentiable
- SOSC can be derived using a second-order Taylor expansion, which requires the objec-
tive function to be twice continuously differentiable
Also, see later 12.4 for more on second-order conditions for unconstrained problems.

11.2 References
See Kim Border’s notes on the calculus of one variable: http://www.hss.caltech.edu/
~kcb/Notes/Max1.pdf

12 Classical Programming: Optimization with Equal-


ity Constraints
Let us focus on optimization problems where the domain of the objective and constraint
functions is an open subset X of RN , with only equality constraints i.e. J = 0 in equation
(8.2)

12.1 Overview
- Introduce auxiliary variables, called multipliers, for each equality constraint, thereby
converting a constrained optimization problem to an unconstrained optimization problem
with a larger set of choice variables
- Show that the necessary conditions for maxima of the Lagrangian problem yield neces-
sary conditions for maxima of the original problem
- Intuition based on the gradients of the objective function and the constraint function
- Interpretation of the multipliers. Nice article on Wikipedia: http://en.wikipedia.
org/wiki/Lagrange_multiplier
- Explanation of the constraint qualification and the failure of the theory to find optima
when the CQ is violated

12.2 First Order Conditions


For illustrative purposes consider the problem with one equality constraint:

max f (x1 , x2 ) subject to h1 (x1 , x2 ) = 0

36
Figure 1: Graphical Depiction of a Constrained Optimization Problem

A geometric visualization of this problem is given in Figure 1. Note from Figure 1 that at
the point x∗ = (x∗1 , x∗2 ) the level curve of f and the constraint are tangents to each other, i.e.
both have a common slope. We will explore this observation in order to find a characterization
of the solution for this class of problems and its generalization to m restrictions.
In order to find the derivative of the level curve at the optimum point x∗ , recall from the
implicit function theorem that for a function G(y1 , y2 ) = c,
 −1
dy2 ∂G ∂G
(ŷ) = − (ŷ) (ŷ),
dy1 ∂y2 ∂y1

where ŷ is some point in the domain, if, on an open neighborhood of (ŷ), G is continuously
differentiable and ∂y∂G
2
(ŷ) is nonzero. In particular according to Figure 1, dx 2
dx1
(x̂) defined
∗ dx2 ∗
implicitly by f (x̂) ≡ f (x ), and dx1 (x̂) defined implicitly by h(x̂) ≡ h(x ), must be the same
at x∗ :
∂f
dx2 ∗ ∂x1
(x∗ ) ∂h
∂x1
(x∗ ) dx2 ∗
(x ) = ∂f = ∂h ∗ = (x ),
dx1 ∂x

(x ) ∂x2
(x ) dx1
2

which after some rearrangement yields


∂f ∂f
∗ ∂x1
(x∗ ) ∂x2
(x∗ )
λ ≡ ∂h
= ∂h
. (12.1)
∂x1
(x∗ ) ∂x2
(x∗ )

37
where λ∗ ∈ R is the common value of the slope at x∗ . We are assuming that the ratios above
do not have zero denominators, the assurance of which is the motivation for the constraint
qualifications discussed later.
Now, rewrite equation (12.1) as two equations

∂f ∗ ∂h ∗
(x ) − λ∗ (x ) = 0 (12.2)
∂x1 ∂x1

and
∂f ∗ ∂h ∗
(x ) − λ∗ (x ) = 0. (12.3)
∂x2 ∂x2
Together with the constraint equation

h(x1 , x2 ) = c, (12.4)

we have a system of three equations (12.2), (12.3), and (12.4) with three unknowns:
(x∗1 , x∗2 , λ∗ ).
This system is equivalent to the first-order conditions for stationary points of the following
function
L(x1 , x2 , λ) ≡ f (x1 , x2 ) − λ(h(x1 , x2 ) − c), (12.5)

which we call the Lagrangian; we also call the term λ the Lagrange multiplier. Thus, for the
case with two choice variables and one constraint, a maximizer (subject to a qualification)
x∗ satisfies ∂x∂L
1
(x∗1 , x∗2 , λ∗ ) = 0, ∂x∂L
2
(x∗1 , x∗2 , λ∗ ) = 0, and ∂L
∂λ
(x∗1 , x∗2 , λ∗ ) = 0, for some λ∗ .
The Lagrange method transforms a constrained problem into an unconstrained problem
via the formulation of the Lagrangian. The transformation introduces a Lagrange multiplier
for every constraint. It is important to note that the transformation is valid only if at least
one of ∂x∂h
1
(x∗ ) and ∂x ∂h
1
(x∗ ) is nonzero. If not, then there is no way to define a multiplier, as
should be clear from examining equation (12.1). This is called the non-degenerate constraint
qualification. If the constraint is linear, this qualification will automatically be satisfied.
We can mimic the steps above for an arbitrary problem with N choice variables and K
constraints, where K < N . Suppose we have a solution to the general constrained maxi-
mization problem with only equation constraints; denote it x∗ ∈ RN . Then it must be the
case that h(x∗ ) = 0, where h is the K-dimensional vector function of constraints. Consider a
linear approximation of this constraint function at x∗ : the derivative (the Jacobian) of this
linear approximation will be Dx h(x∗ ), according to the Taylor Approximation Theorem for
a first-order approximation, assuming h is continuously differentiable at x∗ . This Jacobian
∂hk
matrix is a K × N matrix, where a generic term is ∂x n
, and when each row is viewed as

38
a vector, the K vectors span a subspace of RN . Each row of the Jacobian is a vector (the
transpose of the gradient vector of the associated constraint) that has an associated (N − 1)-
dimensional nullspace (as long as this vector is non-degenerate), which is the tangent plane
to the associated constraint at x∗ , the linear approximation of the constraint function at x∗ .
Then, the K rows define K such subspaces, and for a vector to satisfy all these constraints,
the vector must be in every one of these subspaces. If the row vectors of the Jacobian are
linearly independent, then the nullspace of the Jacobian is exactly the subspace of vectors
that are in the tangent plane of each constraint, a subspace of (N − K) dimensions.
Now, given the gradient vector of the objective function Dx f (x∗ ), consider the subspace
orthogonal to the gradient. This hyperplane is a linear approximation of the level set of the
objective function at the point x∗ i.e. any local movement from x∗ within this hyperplane
will not change the value of the objective function. So, if x∗ is a maximum, it must be the
case that any local move from x∗ that does not locally violate the constraints i.e. movement
within the nullspace of the full-rank Jacobian Dx h(x∗ ), does not change the value of the
objective, and thus we can conclude that the nullspace of Dx h(x∗ ) must be contained by
the nullspace of Dx f (x∗ ). But this means that the vector Dx f (x∗ ) is not in the nullspace of
Dx h(x∗ ) and so it can be expressed as a linear combination of the rows of this matrix. Thus,
we can conclude that at a maximum, there exists K constants, λ∗k , 1 ≤ k ≤ K, such that
X
Dx f (x∗ ) = λ∗k Dx hk (x∗ ).
k

Combined with the constraints h, we have N + K equations that must be satisfied by the
N + K unknown maximum x∗ and the constant λ∗ , under the qualification that Dx h(x∗ )
has full rank. Notice, too, that the argument above is exactly the same if x∗ is a minimum.
Thus, these necessary conditions hold for both maxima and minima. The results are formally
stated below.

Definition 12.1 (Nondegenerate Constraint Qualification). The functions hk , 1 ≤ k ≤ K


satisfy the Nondegenerate Constraint Qualification (NDCQ) at x∗ if the rank of the
Jacobian Dx h(x∗ ) is K i.e. the Jacobian has full rank.

Theorem 12.1 (First Order Necessary Conditions). Let f and h be continuously differ-
entiable functions. Suppose x∗ is a constrained maximum or minimum of f , where the
constraint set is defined by {x ∈ X : h(x) = 0}, and suppose that the constraints satisfy the
NDCQ at x∗ . Define L(x, λ) ≡ f (x) − λh(x) to be the associated Lagrangian. Then there

39
exists λ∗ ∈ RK such that:

∂L ∗ ∗ ∂f ∗ X ∂hk
(x , λ ) = (x ) − λ∗k (x∗ ) = 0
∂xn ∂xn k
∂xn

and
∂L ∗ ∗
(x , λ ) = −hk (x∗ ) = 0
∂λk
for every n ∈ {1, . . . , N }, k ∈ {1, . . . , K}, which equivalently means that (x∗ , λ∗ ) is a sta-
tionary point of the Lagrangian.

12.3 Meaning of the Multiplier


Theorem 12.2 (Meaning of the Multiplier). Let f , hk be C 1 function with domain in RN
and let θk ∈ R. Consider the maximization problem: max f (x) subject to hk (x) = θk ,
k ∈ {1, . . . , K}.
Suppose x∗ is a constrained maximizer, and λ∗ the associated Lagrange multipliers, and
suppose x∗ and λ∗ are C 1 functions of θ. Then,

∂f (x∗ (θ))
= λ∗k (θ).
∂θk

From Theorem 12.2 we conclude that the Lagrange multiplier can be interpreted as the
change in the maximum achieved by the objective function if we slightly relax (tighten) the
corresponding constraint. Consider for example the case in which f is a utility function with
two arguments x1 and x2 , with h(x1 , x2 ) = I representing the budget constraint given an
income of I. The Lagrange multiplier associated with the budget constraint is equivalent to
the marginal utility of income.
The principle behind this result is the envelope theorem, which states that only the direct
effect of an exogenous parameter on the objective function matters when studying the total
effect of a change on the optimal value. The change in the parameter also induces a change
in the endogenous choice variables, but optimality requires that the first-order effect of a
change in the endogenous variables will have no effect on the value of the objective.

12.4 Second Order Conditions


12.4.1 Unconstrained Optimization

As discussed in section 10, a concave function f , when twice continuously differentiable,


is characterized by the result that they have negative semidefinite Hessians i.e. Dx2 f is

40
negative semidefinite at every point on the domain. There is no characterization of a strictly
concave function, but a negative definite Hessian on the domain of the function is a sufficient
condition of strict concavity.
We can easily define local version of these concepts at a point x, by weakening the
requirement that the property of the Hessian holds on the whole domain to holding for some
open neighborhood of the point x. Then, if x is a local maximum i.e. x is a maximum of f
in an open neighborhood of x, then f (x) ≥ f (x0 ) for all x0 in the neighborhood, and with a
continuously differentiable function f , this yields Df (x)(x0 − x) ≥ f (x0 ) − f (x). But this is
exactly the condition for a continuously differentiable function to be concave, and so local
concavity of f is a necessary condition for a local maximum. If we also know f is twice
continuously differentiable, then we can conclude that the Hessian of f must be negative
semidefinite on the neighborhood, given that f must be locally concave.
Suppose f is locally strictly concave on this open neighborhood, then x is a local strict
maximum.9 Moreover, if f is twice continuously differentiable at x, then strict local concavity
implies Dx2 f (x) is negative definite on this neighborhood, and thus serves as a sufficient con-
dition for a local maximum when combined with some necessary conditions for a maximum,
such as the standard first-order conditions.

Definition 12.2 (Regular Maximum). For some twice continuously differentiable function
f : X → R, we call a local maximizer x∗ regular if Dx2 f (x∗ ) is negative definite on an open
neighborhood of x∗ .

Notice that from the arguments above, a regular maximum is a strict local maximum.
However, the converse need not be true, as should be clear from the following example.

Example 12.1 (A strict local maximizer that is not regular). Suppose f (x) = −x4 . Then,
f is twice continuously differentiable, with a strict (local) maximum at x = 0. However,
f 00 (x) = −12x2 evaluates to 0 at x = 0, and so f 00 is not negative definite at 0. However, f is
strictly concave, and so we see that strict concavity does not imply the Hessian is negative
definite, and as a consequence, a strict local maximum need not be regular.

12.4.2 Constrained Optimization

For constrained maximization problems, the intuition for second order conditions is sim-
ilar to the unconstrained case. However, the constraints imply that local (strict) concavity

9
It may seem that the converse should also be true, but notice that the function f (x) = −|x|, which has
a strict maximum at x = 0, is only concave, and not strictly concave, even locally at 0.

41
at the optimum need only be tested on the tangent space of the constraint set at the opti-
mum. Moreover, the relevant function is no longer the objective function, but the associate
Lagrangian function, which is the function whose stationary points we actually compute.

Theorem 12.3 (Second Order Sufficient Conditions). Suppose the functions f : X → R


and h : X → RK are twice continuously differentiable. Let L ≡ f − µh be the Langrangian
function. Suppose x∗ ∈ RN , λ∗ ∈ RK , such that x∗ and λ∗ satisfies the first order conditions
of Theorem 12.1 and the constraint h(x∗ ) = 0, and x∗ satisfies the NDCQ.
If Dx2 L(x∗ , λ∗ ) is negative definite on the subspace {v : Dx h(x∗ )v = 0}, then x∗ is a strict
local constrained maximum.
If Dx2 L(x∗ , λ∗ ) is positive definite on the subspace {v : Dx h(x∗ )v = 0}, then x∗ is a strict
local constrained minimum.

Theorem 12.4 (Second Order Necessary Conditions). Suppose the functions f : X → R


and h : X → RK are twice continuously differentiable. Let L ≡ f − µh be the associated
Langrangian function, and suppose that x∗ ∈ RN satisfies the NDCQ.
If x∗ is a local constrained maximum, then there exists λ∗ ∈ RK such that Dx2 L(x∗ , λ∗ ) is
negative semidefinite on the subspace {v : Dx h(x∗ )v = 0}.
If x∗ is a local constrained minimum, then there exists λ∗ ∈ RK such that Dx2 L(x∗ , λ∗ ) is
positive semidefinite on the subspace {v : Dx h(x∗ )v = 0}.

13 Nonlinear Programming: The Karush-Kuhn-


Tucker Approach
13.1 Overview
- The KKT method of dealing with inequality constraints: complementary slackness
- Show that the necessary conditions for maxima of the KKT problem yield necessary
conditions for maxima of the original problem
- Intuition based on gradients of the objective function and constraint function, and an
explanation of the constraint qualification, paying attention to the difference in the meaning
of the sign of the gradient of the constraint function, and hence the difference from the case
with equality constraints
- Failure of CQ is problematic

42
13.2 First Order Conditions
Theorem 13.1 (Karush-Kuhn-Tucker Theorem). Let f and g be continuously differentiable
functions. Suppose x∗ is a constrained maximum or minimum of f , where the constraint set
is defined by {x ∈ X : g(x) ≤ 0}. Denote by JB the subset of indices of the constraints that
bind (hold with equality) at x∗ , and suppose that these binding constraints satisfy the NDCQ
at x∗ . Define L(x, λ) ≡ f (x) − λg(x) to be the associated Lagrangian. Then there exists
λ∗ ∈ RJ+ such that:
∂L ∗ ∗ ∂f ∗ X ∂gj
(x , λ ) = (x ) − λ∗j (x∗ ) = 0
∂xn ∂xn j
∂xn

and
∂L ∗ ∗ ∂L ∗ ∗
(x , λ ) = −gj (x∗ ) ≥ 0, λ∗j ≥ 0, λ∗j (x , λ ) = 0
∂λk ∂λk
for every n ∈ {1, . . . , N }, j ∈ {1, . . . , J}.

Suppose we have nonnegativity constraints on the choice variables. We can treat these
nonnegativity constraints differently, as done so in the original Kuhn-Tucker Theorem.

Theorem 13.2 (Original Kuhn-Tucker Theorem). Let f and g be continuously differentiable


functions. Suppose x∗ is a constrained maximum or minimum of f , where the constraint set
is defined by {x ∈ X : g(x) ≤ 0, x ≥ 0}. Denote by JB the subset of indices of the constraints
that bind (hold with equality) at x∗ , and suppose that these binding constraints satisfy the
NDCQ at x∗ . Define L(x, λ) ≡ f (x) − λg(x) to be the associated Lagrangian. Then there
exists λ∗ ∈ RJ+ such that:

∂L ∗ ∗ ∂f ∗ X ∂gj ∂L ∗ ∗
(x , λ ) = (x ) − λ∗j (x∗ ) ≤ 0, x∗n ≥ 0, x∗n (x , λ ) = 0
∂xn ∂xn j
∂x n ∂xn

and
∂L ∗ ∗ ∂L ∗ ∗
(x , λ ) = −gj (x∗ ) ≥ 0, λ∗j ≥ 0, λ∗j (x , λ ) = 0
∂λk ∂λk
for every n ∈ {1, . . . , N }, j ∈ {1, . . . , J}.

13.3 Second Order Conditions


The second-order theorems for the case of equality constraints in the classical program-
ming framework hold here, with binding constraints and any equality constraints of the
general nonlinear problem treated as the equality constraints in the classical programming
framework and the nonbinding constraints just ignored.

43
13.4 The Fritz John Theorem
The following theorem does not require a constraint qualification, which seems good,
but in some cases introduces many candidates, even when the corresponding KKT theorem
would yield sufficient conditions, such as the case of a concave objective.

Theorem 13.3 (Fritz John). Suppose f and g are continuously differentiable, as in the
KKT Theorem 13.1. Suppose x∗ is a constrained maximizer of f subject to the constraints
g = 0. Then there exists λ∗ ∈ RJ+ and γ ∗ ∈ R, with at least one of γ ∗ , λ∗1 , . . . , λ∗J not equal
to 0, such that
γ ∗ Df (x∗ ) − λ∗ Dg(x∗ ) = 0.

For more on this theorem, see Simon, Blume pg 475.

14 The Saddle Point Theorem


Definition 14.1 (Saddle Point). Let f : X × Y → R. (x∗ , y ∗ ) ∈ X × Y is a saddle point
of f if f (x, y ∗ ) ≤ f (x∗ , y ∗ ) ≤ f (x∗ , y), for ∀x ∈ X , y ∈ Y

Lemma 14.1 (Interchangebility). Let f : X × Y → R, and let (x1 , y1 ) ∈ X × Y and


(x2 , y2 ) ∈ X × Y be saddle point. Then (x1 , y2 ) and (x2 , y1 ) are also saddle points. Also all
saddle points have the same value.

Proof. We know that

f (x, y1 ) ≤ f (x1 , y1 ) ≤ f (x1 , y), x ∈ X , y ∈ Y

and
f (x, y2 ) ≤ f (x2 , y2 ) ≤ f (x2 , y), x ∈ X , y ∈ Y

Then,
f (x, y1 ) ≤ f (x1 , y2 ) ≤ f (x2 , y2 )

Also,
f (x2 , y2 ) ≤ f (x2 , y2 ) ≤ f (x1 , y1 )

Thus,

f (x, y2 ) ≤ f (x2 , y2 ) ≤ f (x1 , y1 ) ≤ f (x1 , y2 ) ≤ f (x2 , y2 ) ≤ f (x1 , y1 ) ≤ f (x1 , y), x ∈ X , y ∈ Y

44
Similarly, we can show

f (x, y1 ) ≤ f (x2 , y1 ) ≤ f (x2 , y), x ∈ X , y ∈ Y

Suppose, we have the Lagrangian:

L(x, λ) ≡ f (x) − λg(x)

where g(x) : RN → RJ .

Theorem 14.2 (Saddle Point Theorem). For any X ⊂ RN , and any f, gj : X → R, if


(x∗ , λ∗ ) is a saddle point of L, then x∗ maximize f over X s.t. gj (x) ≤ 0 and moreover
λ∗j gj (x∗ ) = 0, ∀j ∈ J

Proof. Since (x∗ , λ∗ ) is a saddle point, L(x∗ , λ∗ ) ≤ L(x∗ , λ), and so f (x∗ ) − λg(x∗ ) ≤ f (x∗ ) −
λg(x∗ ), =⇒ , λ∗ (x∗ ) ≥ λg(x∗ ) for all λ ≥ 0. Thus, g(x∗ ) ≤ 0. If this is not true, then
∗ ∗)
and λ˜j 0 , ∀j 6= j, then λ̃g(x∗ ) = ˜(λj )gj (x∗ ) > λ∗ g(x∗ ),
0
∃j, s.t.gj (x∗ ) > 0. Now is λ̃j > λgjg(x
(x∗ )
violating the saddle point condition. Thus, x∗ satisfies constraints. Also, λ = 0, =⇒
λ∗ g(x∗ ) ≥ 0. But λ∗ ≥ 0 and g(x∗ ) ≤ 0, =⇒ λ∗ g(x∗ ) ≤ 0 =⇒ λ∗ g(x∗ ) = 0. In fact,
we have λ∗j gj (x∗ ) = 0. Note, L(x∗ , λ∗ ) ≥ L(x, λ∗ ) and so f (x∗ ) − λ∗ g(x∗ ) ≥ f (x) − λ∗ g(x).
But λ∗ g(x∗ ) = 0, so f (x∗ ) ≥ f (x) − λ∗ g(x∗ ) ≥ f (x) − λ∗ g(x). But λ∗ g(x∗ ) = 0. So
f (x∗ ) ≥ f (x) − λ∗ g(x). If x satisfies g(x) ≤ 0, then λ∗ g(x) ≤ 0, =⇒ f (x) − λ∗ g(x) ≥ f (x).
Thus, f (x∗ ) ≥ f (x)

The converse of the Saddle Point Theorem 14.2 isn’t true in general. But there is a
partial converse result.

Theorem 14.3. Let X be convex subset of R. Let f : X → R be quasiconcave and gj : X →


R be convex. Suppose, ∃x̃ ∈ X , s.t. gj (˜(x)) < 0, ∀j ∈ J , a condition known as the Slater
Constraint Qualification.
If x∗ is a constrained max of f and g, then ∃λ∗ ∈ RJ+ , s.t. (x∗ , λ∗ ) is a saddlepoint of
L : X × RJ+ → R, L(x, λ) ≡ f (x) − λg(x)

15 The Hyperplane Theorems and the Farkas Lemma


Theorem 15.1 (Strictly Separating Hyperplane Theorem). Let X ⊂ R be nonempty closed
and convex. Let y ∈ RN =⇒ X . Then ∃a ∈ Rn , and c ∈ R, s.t. ax < c < ay, ∀x ∈ X

45
Importance of assumptions:

1. X is closed: otherwise y could be a boundary point and then there are points arbitrarily
close to y, yielding a failure of the strict inequality (though of course we can still find
a ∈ RN , s.t. ax ≤ ay)

2. X is convex: otherwise the plane will intersect X for some choice of y ∈


/X

Theorem 15.2. Let x, y ⊂ RN be nonempty convex and disjoint. Then ∃a ∈ RN , s.t.


0 0
∀x ∈ X , y ∈ Y , ax ≤ ay, and ∃x , y ∈ Y , s.t. ax < ay

Theorem 15.3 (Separating Hyper-Plane Theorem). Let x, y ⊂ Z ⊂ RN , nonempty convex


and intX Y = ∅. Then ∃a ∈ RN and c ∈ R, s.t. ax ≤ c ≤ ay, ∀x ∈ X , y ∈ Y
T

0
Theorem 15.4 (Supporting Hyperplane Theorem). Let x ⊂ RN , X is convex and x ∈
0
X − intX . Then, ∃a ∈ RN , a 6= 0, s.t. ax ≤ ax
0 00 00 0 00 0
Note: x needs to be a boundary point of X , otherwise ∃x ∈ X , x 6= x , s.t. ax > ax

Theorem 15.5 (The Farkas Lemma). Let a1 , · · · , an , · · · , be non zero vectors in RN . Let
 
a1
 . 
A≡ . 
 . 
am

Then exactly one of following is true: 1)∃λ ∈ Rm


t , s.t. b ∈ λA
n
2)∃x ∈ R , s.t. bx > 0 and Ax ≤ 0

16 Solving Constrained Optimization Problems


1. Find all points that violate NDCQ (or other constraint qualification). These points
are candidate optima.

2. Determine the KKT first order conditions.

3. Find all points that satisfy the KKT conditions: for every subset of constraints, as-
sume these have nonzero multipliers and try to find any points that satisfy the KKT
conditions. All such points that pass NDCQ are candidate optima.

4. If the objective is concave (convex) and the constraint function quasiconvex, then every
solution of the KKT conditions is a global constrained maximum. However, we need
to ensure that we haven’t missed a solution that violates the NDCQ.

46
5. Evaluate all candidate points to find global optima, or use second order conditions to
discriminate between candidate points.

17 Summary of Optimization Theorems


17.1 Unconstrained
Let f : X → R be a continuous function, where X is an open subset of RN . Note that
assuming X is open means that any local optimum is an interior optimum. Henceforth, I
will assume the problem is to find maxima. The results can be easily translated for minima.
Also, keep in mind that an open domain implies a maximum may not exist.

1. If f is continuously differentiable, then a necessary condition for a local maximum x∗


is that Df (x∗ ) = 0.

2. If f is twice continuously differentiable, then a necessary condition for a local maximum


x∗ is that D2 f (x∗ ) is negative semidefinite.

3. If f is twice continuously differentiable, then the first order necessary condition is also
a sufficient condition for a local maximum x∗ if D2 f (x∗ ) is negative definite.

4. If f is continuously differentiable and concave, then the first order necessary condition
for a local maximum is also a sufficient condition for a global maximum.

5. If f is continuously differentiable and strictly concave, then x∗ is the unique maximizer


of f if Df (x∗ ) = 0.

6. If f is twice continuously differentiable and concave, then, if x∗ solves Df (x∗ ) = 0 and


D2 f (x∗ ) is negative definite, it is the unique maximizer.

17.2 Constrained
Theorem 17.1 (Local-Global Theorem). Let f : X → R be a continuous function, where
X is an open subset of RN . Suppose C ⊂ X is convex and compact, and f is quasiconcave.
Then every local maximum is a global maximum.

17.2.1 Classic KKT

Let f : RN N J
+ → R and g : R+ → R be continuous functions, and suppose f, g are
continuously differentiable.

47
Consider the following problem:
max f (x)
x∈RN
+

subject to
g(x) ≤ 0.

Notice that x comes from RN


+ , and so implicitly we have nonnegativity constraints.
The classic KKT conditions are:

Dx f (x∗ ) − λ∗ Dx g(x∗ ) ≤ 0, x∗ ≥ 0, (17.1)


g(x∗ ) ≤ 0 λ∗ ≥ 0, (17.2)
x∗ (Dx f (x∗ ) − λ∗ Dx g(x∗ )) = 0, (17.3)
λ∗ g(x∗ ) = 0. (17.4)
(17.5)

Suppose x∗ is a solution to the maximization problem. Then the KKT conditions are
necessary conditions (and there exists an associated λ∗ ) if any one of the following is true:

1. The Jacobian of the constraints that bind at x∗ has full rank (NDCQ).
10
2. The constraints are affine .

3. The constraint functions gj are convex, and there exists an interior point of the con-
straint set i.e. there exists x̃ ∈ RN
+ such that for all j, gj (x̃) < 0. This is the Slater
condition.

4. The constraint functions gj are quasiconvex, have a nonempty interior (the Slater
condition), and if for any j, gj is not convex, then Dx gj (x) 6= 0 for any x ∈ R+ . This
is a weakening of the previous item, but ruling out pesky stationary points for the
constraint functions.

Suppose (x∗ , λ∗ ) is a solution of the KKT conditions. Then x∗ is a maximizer, that is


the KKT conditions are sufficient conditions, if gj is quasiconvex for every j and any one of
the following is true:

1. f is concave.

2. f is twice continuously differentiable, quasiconcave, and Dx f (x∗ ) 6= 0.


10
A function F is linear if F (ax + by) = aF (x) + bF (y). An affine function is a linear function with an
added constant. For example, F (x) = 12x is linear (and affine), but F (x) = 12x + 3 is not linear, though
still affine.

48
3. f is quasiconcave, and one of the following holds:
∂f
(a) ∂xi
(x∗ ) < 0 for some i
∂f
(b) ∂xi
(x∗ ) > 0 for some i such that there exists x̃ ∈ RN
+ with x̃i > 0.

17.2.2 Modern KKT

Let f : X → R and g : X → RJ be continuous functions, where X ⊂ RN is open, and


suppose f, g are continuously differentiable.
Consider the following problem:
max f (x)
x∈X

subject to
g(x) ≤ 0.

Any nonnegativity constraints should be included in the set of inequality constraints


explicitly. Define the constraint set by C ≡ {x ∈ X : g(x) ≤ 0}.
The modern KKT conditions are:

Dx f (x∗ ) − λ∗ Dx g(x∗ ) = 0 (17.6)


g(x∗ ) ≤ 0 λ∗ ≥ 0, (17.7)
λ∗ g(x∗ ) = 0. (17.8)
(17.9)

If every reference to classic KKT is replaced with modern KKT in the section on Classic
KKT, then the results there apply here.

17.2.3 KKT - mixed constraints

Let f : RN → R, g : RN → RK , h : RN → R be continuous functions, and suppose f, g, h


are continuously differentiable.
Consider the following problem:
max f (x)
x∈RN

subject to
gj (x) ≤ 0, hk (x) = 0

Note also that an equality constraint hk (x) = 0 could be replaced by two inequality con-
straints gj (x) ≤ 0 and gj 0 (x) ≤ 0, where gj = hk and gj 0 = −hk . Thus, the results below are
just appropriate restatements of the results in the classic KKT section.

49
The mixed KKT conditions are:

Dx f (x∗ ) − λ∗ Dx g(x∗ ) − µ∗ Dx h(x∗ ) = 0, (17.10)


g(x∗ ) ≤ 0, λ∗ ≥ 0, λ∗ g(x∗ ) = 0 (17.11)
h(x∗ ) = 0 (17.12)

Suppose x∗ is a solution to the maximization problem. Then the mixed KKT conditions
are necessary conditions (and there exists an associated λ∗ ) if any one of the following is
true:

1. The Jacobian of the equality constraints and the binding inequality constraints at x∗
has full rank (NDCQ).

2. The constraints are affine.

Suppose (x∗ , λ∗ , µ∗ ) is a solution of the mixed KKT conditions. Then x∗ is a maximizer,


that is the KKT conditions are sufficient conditions, if gj is quasiconvex for every j, hk is
linear for every k and any one of the following is true:

1. f is concave.

2. f is twice continuously differentiable, quasiconcave, and Dx f (x∗ ) 6= 0.

3. f is quasiconcave, and one of the following holds:


∂f
(a) ∂xi
(x∗ ) < 0 for some i
∂f
(b) ∂xi
(x∗ ) > 0 for some i such that there exists x̃ ∈ RN
+ with x̃i > 0.

17.2.4 Saddlepoint Theorem

Let f : X → R and g : X → RJ be functions on an arbitrary set X. Define the


Lagrangian function L(x, λ) ≡ f (x) − λg(x), where λ ∈ RJ+ .

1. If (x∗ , λ∗ ) is a saddlepoint of L, then x∗ is a constrained maximizer, and λ∗ g(x∗ ) = 0.

2. Suppose X ⊂ RN is convex, f is concave, gj is convex for each j. If x∗ is a constrained


maximizer of f subject to g ≤ 0 and there exists x̃ such that g(x̃) < 0 (the Slater
condition), then there exists λ∗ ∈ RJ+ such that (x∗ , λ∗ ) is a saddlepoint of L.

50
18 References
George Dantzig. Linear Programming and Extensions. 1963.11
Michael Intriligator. Mathematical Optimization and Economic Theory. 1971.

11
A pdf version is available for free from RAND Corporation. See http://www.rand.org/pubs/reports/
R366.html

51

You might also like