A Bunch of Things

ECON 5100
5. A Bunch of Things
In this chapter we will see
I Normed vector spaces and how they induce metric spaces
I The notion of convexity
I Some useful “fixed point theorems”
I Multi-variable calculus
Norm
Recall a vector space (V , +, ·): basically, a bunch of arrows.
I We may sometimes want to think about the “length” of an
arrow
I What should a suitable notion of “length” satisfy?
I The “length” of 0 is 0.
I The “length” of a · v is a times the “length” of v.
I The “length” of v + u should be lesss than the “length” of v
plus the “length” of u (triangle inequality).
Based on these intuitions, introduce
Definition. Given a vector space (V , +, ·), the function

k · k : V → R+ is a norm of (V , +, ·) if
1. kvk if and only if v = 0
2. ka · vk = |a|kvk for any v ∈ V and a ∈ R
3. kv + uk ≤ kvk + kuk for any v, u ∈ V .
We say that (V , +, ·, k · k) is a normed vector space.
Example 1. A norm on (Rn , +, ·)
v
u n
uX
k(v1 , ..., vn )k = t vi2
i=1
I Called the Euclidean norm

I Length: distance from the origin to the “arrowhead”
Example 2. A norm on (Fb , +, ·) where Fb is the set of all

bounded functions R → R.
kf k = sup |f (x)|
x∈R
I Known as sup(remum) norm or uniform norm

I Often denoted as k · k∞
I “Length”: distance from the horizontal axis to the highest
“ceiling” or lowest “floor”
You may have observed certain similarities between the norm
of a vector space and the distance of a metric space.
I Both indicate “length” of some sort
I Both satisfy triangle inequality
That’s no coincidence: we can create a metric space (S, d) from

a normed vector space (V , +, ·, k · k):
1. S ⊂ V
2. d(v, u) = kv + (−u)k, where −u is the inverse of u.
To convice ourselves that (S, d) is indeed a metric space, we

need to make sure the three properties of d are satisfied.
(1) d(v, v) = kv + (−v)k = k0k = 0

(2) d(v, u) = kv + (−u)k = | − 1|kv + (−u)k = ku + (−v)k = d(u, v)
(3) d(v, u) = kv + (−u)k = kv + (−w) + w + (−u)k
≤ kv + (−w)k + kw + (−u)k = d(v, w) + d(u, w)
→ Indeed d derived from k · k is a distance function, so (S, d) is

a metric space.
Example 1. (Rn , d E ) is derived from (Rn , +, ·, k · k) because
v
u n
uX
d (v1 , ..., vn ), (u1 , ..., un ) = t (vi − ui )2 = k(v1 − u1 , ..., vn − un )k
i=1
= k(v1 , ..., vn ) + (−(u1 , ..., un ))k
Example 2. (Fb , d ∞ ) is derived from (Fb , +, ·, k · k∞ ) if
d ∞ (f , g) = sup |f (x) − g(x)|

x∈R
I We will need to use (Fb , d ∞ ) when discussing convergence

of a sequence of functions in dynamic programming
Convexity
The term “convexity” often pops up in economics. However,
there are two notions, different but “kind of” related, both
called the name
convex sets vs. convex functions
Here we first talk about convexity of sets.

I Underlying idea: the set has no “holes” or “gaps”.
Definition. Given a vector space (V , +, ·), vector u is said to

be a convex combination of vectors v1 , ..., vk if u is a linear
combination of v1 , ..., vk with coefficients λ1 , ..., λk ∈ [0, 1]
Pk
where i=1 λi = 1.
I Thus, a convex combination is a special kind of linear

combination.
I Geometrically, u lives in the “region” in between v1 , ..., vk .
Definition. Given a vector space (V , +, ·), S ⊂ V is convex if
and only if for any v, u ∈ S, all of their convex combinations
are also in S.
I If two points are in S, then the “line segment” connecting

them are also in S.
I R : convex sets are intervals
What about the convex combination of more than two points?
Proposition 5.1. Given a vector space (V , +, ·), S ⊂ V is

convex if and only if for any v1 , ..., vk ∈ S, all of their convex
combinations are also in S.
(Proof left as exercise. Hint: induction.)

I Hence S is closed under convex combination.
I The set of all convex combinations of a set T of vectors is
called the convex hull (co(T )) of T . Thus the proposition is
the same as saying that S is convex if and only if co(T ) ⊂ S
for any T ⊂ S.
Now we focus on the vector space (Rn , +, ·). There are a few
notable convex sets in this space.
I Hyperplane:
P(a1 , ..., an , c) := {(x1 , ..., xn ) : a1 x1 + ... + an xn = c}
where a1 , ..., an , c ∈ R are parameters and ai 6= 0 for some i.
I A hyperplaine is a point in R1 , a line in R2 , a plane in R3 .

I Geometrically, a hyperplane is a “straight” divider that cuts
the space into two halves
I Half-space: each side of the space cut by hyperplane

P(a1 , ..., an , c)
H+ (a1 , ..., an , c) := {(x1 , ..., xn ) : a1 x1 + ... + an xn ≥ c}

H− (a1 , ..., an , c) := {(x1 , ..., xn ) : a1 x1 + ... + an xn ≤ c}.
(Exercise: show that hyperplanes and half-spaces are convex.)

The following result is of crucial importance to optimization
problem in economics.
Proposition 5.2 (Separating Hyperplane Theorem)

Given the vector space (Rn , +, ·), if S, T ⊂ Rn are two convex
sets which are disjoint, then there exists a hyperplane
P(a1 , ..., an , c) such that S ⊂ H+ (a1 , ..., an , c) and
T ⊂ H− (a1 , ..., an , c).
(Proof omitted)
I Due to Hermann Minkowski (1864-1909), Einstein’s
professor at ETH Zurich and a contributor to the theory of
relativity
I Geometric interpretation: every two disjoint convex
regions have a “gap” in which you can insert a plane
I Algebraic interpretation: every (v1 , ..., vn ) ∈ S satisfies the
linear inequality a1 v1 + ... + an vn ≥ c and every
(v1 , ..., vn ) ∈ T satisfies another linear inequality
a1 v1 + ... + an vn ≤ c
I It is important that S and T are convex and disjoint.
Now let’s introduce convex and concave functions
Definition. Given a convex susbset S of a vector space (V , +, ·), a

function f : S → R is said to be concave if and only if for any
v, u ∈ S and λ ∈ [0, 1],
f (λ · v + (1 − λ) · u) ≥ λf (v) + (1 − λ)f (u).
f is said to be convex if and only if for any v, u ∈ S and λ ∈ [0, 1],
f (λ · v + (1 − λ) · u) ≤ λf (v) + (1 − λ)f (u).
I When the inequalities are always strict, we say that the function
is strictly concave/convex.
I For S ⊂ R, a concave function “curls down”, as the line segment
connecting two points on the curve must lie “under” the curve.
I If f is twice differentiable at x, then f 00 (x) ≤ 0. (Proof as
exercise)
I Similarly, a concave function “curls up”.
I If f is twice differentiable at x, then f 00 (x) ≤ 0.
Concave/convex function: for two vectors, image of convex
combination is above/below convex combination of images
I This actually is true for arbitrarily many vectors:
Proposition 5.3 (Jensen’s Inequality) Suppose S is a convex

set in vector space (V , +, ·) and u ∈ S is a convex
combination of v1 , ..., vk ∈ S with coefficients λ1 , ..., λk .
1. f : S → R is concave if and only if
k
X
f (u) ≥ λi f (vi ).
i=1
2. f : S → R is convex if and only if

k
X
f (u) ≤ λi f (vi ).
i=1
(Proof left as exercise)

Fixed point theorems
Fixed point theorems are very important in economics,

because much of economic analysis focuses on ideas such as
“equilibrium”, “steady state”, “stable outcome”, etc.
I Walrasian equilibrium
I Nash equilibrium
I Steady state of economic growth
The idea behind all fixed point theorems is that for a

self-mapping, i.e. a function with identical domain and
co-domain, something is “fixed” and stable under f , i.e.
f (x) = x.
Banach Fixed Point Theorem
The Banach Fixed Point Theorem says that a kind of
functions, called “contraction mappings”, have a fixed point.
I Due to Stefan Banach (1892-1945).
Definition. Given a metric space (S, d), a self-mapping

f : S → S is a contraction mapping if there exists some
q ∈ [0, 1) such that d(f (x), f (y)) ≤ qd(x, y) ∀x, y ∈ S.
I f pulls things closer.

I If S ⊂ R: the slope between any two points on the graph of
f should have an absolute value no greater than q for it to
be a contraction mapping.
Example 1. f (x) = x/2 is a contraction mapping on (R, d E )

I First, it is a self mapping R → R
I d(f (x), f (x)) = |x/2 − y/2| = |x − y|/2 ≤ 12 d(x, y)
Proposition 5.3 (Banach Fixed Point Theorem) Given a complete
metric space (S, d) and a contraction mapping f : S → S, there
exists a unique x ∗ ∈ S such that f (x ∗ ) = x ∗ . Moreover, for any
x ∈ S, the sequence x, f (x), f (f (x)), f (f (f (x)))... converges to x ∗ in
(S, d).
I x ∗ is a fixed point that never changes position under f .

I All the other points are “drawn” closer and closer to x ∗ as we
keep applying f .
I If S ∈ R: f touches the 45◦ line once and only once.
I Important to the theory of dynamic programming.
Example 1. f (x) = x/2 in (R, d E )

I Is the metric space complete? Yes. So Banach applies.
I We can solve for x ∗ : x ∗ = f (x ∗ ) = x ∗ /2 ⇒ x ∗ = 0.
I For any x: (x, f (x), f (f (x)), ...) = (x, x/2, x/4, ...) → 0
× f (x) = x/2 does not have a fixed point in ((0, ∞), d E ) because the
space is not complete.
× f (x) = x/2 does not have a fixed point in ([1, 2], d E ) because it is
not a self-mapping when restricted to domain [1, 2].
Proof sketch. First we pick any x0 ∈ S. Let (xi ) be the sequence such that
xi = f ◦ f ◦ ... ◦ f (x0 ). We wish to show that (xi ) is Cauchy. Since f is a
| {z }
i times
contraction mapping, there is some q ∈ [0, 1) d(xi+1 , xi ) = d(f (xi ), f (xi−1 ))
≤ qd(xi , xi−1 ). In other words, the distance between two neighboring points in
the sequence keeps shrinking by a factor of at least q. Thus
d(xi , xi+1 ) ≤ qi−1 d(x1 , x2 ), and for any j > i, by triangle inequality we have
d(xi , xj ) ≤ d(xi , xi+1 ) + d(xi+1 , xi+2 ) + ... + d(xj−1 , xj ) ≤
i−1
1
d(xi , xi+1 )(1 + q + q2 + ...) = d(xi , xi+1 ) 1−q ≤ q1−q d(x1 , x2 ). This implies that
(xi ) is Cauchy (fill in the details yourself), and hence convergent to some x ∗
because (S, d) is complete.
Then we wish to show that f (x ∗ ) = x ∗ , which is equivalent to d(f (x ∗ ), x ∗ ) <

for any > 0. Fix > 0. Since xi → x ∗ , there is some n such that
d(x ∗ , xi ) < /2 for any i > n. Observe that d(x ∗ , f (x ∗ )) ≤ d(x ∗ , xn+2 )
+d(f (x ∗ ), xn+2 ) = d(x ∗ , xn+2 ) + d(f (x ∗ ), f (xn+1 )) ≤ /2 + q/2 < . Hence
indeed f (x ∗ ) = x ∗ .
Finally we show that x ∗ is unique. Suppose there is also some y∗ such that
f (y∗ ) = y∗ , then d(x ∗ , y∗ ) = d(f (x ∗ ), f (y∗ )) ≤ qd(x ∗ , y∗ ), implying
d(x ∗ , y∗ ) = 0 and thus, x ∗ = y∗ .
Brouwer Fixed Point Theorem
Proposition 5.4 (Brouwer Fixed Point Theorem) Given S ⊂ Rn which

is compact and convex, and a self-mapping S → S which is
continuous, there exists x ∗ ∈ X such that f (x ∗ ) = x ∗ .
Compare with Banach under Rn :

Banach S is complete, f is contraction, −→ x ∗ unique
Brouwer S is compact and convex, f is continuous, −→ x ∗ exists
I If S ⊂ R, the graph of f can touch the 45◦ multiple times.
Historical notes
I Due to L.E.J. Brouwer (1881-1966) around 1910, generalized by
Shizuo Kakutani for set-valued functions (continuity is replaced
by upper hemi-continuity)
I Used by John Nash in 1951 to prove the existence of Nash
equilibrium
I Thanks to Nash, economists realized its power and used it to
eventually tackle the long-standing problem of the existence of
the Walrasian equilibrium (Arrow-Debreu-McKenzie, 1954)
Proof of the R case: If S ⊂ R is compact and convex, then it must be a closed
interval [a, b]. Since f is continuous, the function g(x) := f (x) − x is also
continuous. Note that g(a) ≥ 0 because f (a) ∈ [a, b] ≥ a. Also g(b) ≤ 0
because f (b) ∈ [a, b] ≤ b. Therefore min{g(a), g(b)} ≤ 0 ≤ max{g(a), g(b)},
and by the Mean Value Theorem there is some x ∗ ∈ [a, b] such that g(x ∗ ) = 0,
which implies that f (x ∗ ) = x ∗ .
Multivariable Calculus
Consider f : Rn → R.
I We already know what it means for f to have a limit or to be
continuous at a point.
I How about differentiability?
We start by thinking about f (x) where x = (x1 , ..., xn ) ∈ Rn as a

function of only xi , where all the xj , j 6= i (which we usually jointly
denote as x−i ) are kept constant (i.e. parameters, not variables).
I f “becomes” a single variable function of xi alone, and we know
how to define derivative for it.
f (xi + h, x−i ) − f (x)

Definition. Consider f : Rn → R. If limh→0
h
exists at x ∈ Rn , then we say:
1. The partial derivative of f with respect to the ith argument
exists at x, which is equal to this limit and is denoted as fi (x).
2. f is partially differentiable with respect to the ith argument
at x.

∂f
I We sometimes use to denote fi (x)

∂xi
x
Example: f (x1 , x2 ) = x1 |x2 |.
I f is partially differentiable with respect to the first argument at
every point: f1 (x1 , x2 ) = |x2 |.
I f is partially differentiable with respect to the second argument
at any every(point except at those where x2 = 0:
x1 if x2 > 0
f2 (x1 , x2 ) =
−x1 if x2 < 0
When f is partially differentiable with respect to all arguments at

= (x1 , ..., xn ),we can write the partial derivatives together as a vector
x
f1 (x), ..., fn (x)
I Called the gradient of f at x and is denoted as ∇f (x)
If f is partially differentiable with respect to the ith argument at every
point in an open ball B (x), fi is a function within that open ball, and
we can talk about the existence and value of the partial derivative of fi
with respect to the jth argument at x
I If that partial derivative exists, it is denoted as fij (x)
I fij (x) is called the second-order partial derivative of f at (x)
Example continued: f (x1 , x2 ) = x1 |x2 |. Recall: f1 (x1 , x2 ) = |x2 |

I f1 is partially differentiable with respect to the first argument at
every point: f11 (x1 , x2 ) = 0.
I f1 is partially differentiable with respect to the second
( argument
1 if x2 > 0
at every point except where x2 = 0: f12 (x1 , x2 ) =
−1 if x2 < 0
If fij exists at x for all i, j = 1, ..., n, then we can write them together as
a matrix [fij (x)]n×n .
I This matrix is called the Hessian matrix and denoted as Hf (x).
Partial derivative fi (x) tells us the rate of change of f around x
if we move “along the ith axis”, i.e. in the direction of ei .
I Keeping x−i constant and only changing xi means moving
along the ith axis.
But there’s no reason to focus only on the directions of the

axes. We can think about the rate of change of f in any
direction, i.e. given by an arbitrary vector v = (v1 , ..., vn ).
Definition. Given vector v ∈ Rn , if
f (x + h · v) − f (x)
lim
h→0 h
exists, then we say the directional derivative of f at x
along the vector v exists, which is equal to this limit.
I We denote this directional derivative as ∇v f (x).

I Clearly, fi (x) = ∇ei f (x), so ∇f (x) = ∇e1 f (x), ..., ∇en f (x)
Even if f is partially differentiable in all arguments at x, it’s not always the
case that the directional derivative of f along every vector exists at x.
I Because partial derivatives just account for n directions, whereras there
are infinitely many directions.
I f can still be “kinky” at x even if all partial derivatives exist.
For f to be “smooth” at x in a way similar to how a single-variable function

looks when differentiable at a point, we need:
Definition. f is (totally) differentiable at x if there exists a vector d ∈ Rn

such that for any vector h ∈ Rn ,
f (x + h) − f (x) − d t h
lim = 0.
h→0n khk
We call d the (total) derivative of f at x and denote it as f 0 (x).
(Note: d t h = d1 h1 + ... + dn hn )
I If f 0 exists in an open ball around x, it is a function (in fact, a vector of n
functions) in that ball, and we can talk about the derivative of f 0 at x if
exists, in which case we say f is twice differentiable at x and call the
derivative of f 0 at x as the second-derivative of f at x, denoted as f 00 (x).
I Thus f 00 (x) is a n × n matrix.
I If f 0 exists and is continuous at x, then we say f 0 is continuously

differentiable at x.
Proposition 5.5
1. If f is differentiable at x, then the directional derivative
of f along any vector exsits at x and hence ∇f (x) exists.
Moreover,
a f 0 (x) = ∇f (x)
(x))t v
b ∇v f (x) = (∇fkvk
c If, in addition, f 00 (x) also exists, then f 00 (x) = Hf (x), and
f 00 (x) is symmetric.
2. If ∇f (x) exists at x ∈ Rn and moreover fi is continuous
at x for every i, then f is differentiable at x.
I When f is twice-differentiable at x:
I Gradient is the same as derivative
I Hessian is the same as second-order derivative, and
moreover fij = fji
I We can compute any directional derivative from gradient
I Continuity of partial derivatives indicates differentiability
The Geometry of Multi-variable Functions
Certain geometric inuitions are helpful to understanding
multi-varible functions. We will focus on f : R2 → R as the geometry is
most easily illustrated.
f : R2 → R looks like a terrain with peaks, valleys, etc.
Let’s think about the equation f (x) = k where k is a constant.

I All points that satisfy this equation constitute a level set
I On R2 , these points describe a contour curve at “height” k
I This contour curve can be interpreted as the graph of a
“function” x2 = g(x1 ), although we don’t know what g looks like.
I This g is called an implicit function, formally defined by
the equation f (x1 , g(x1 )) = k.

I The implicit function treats x as an independent variable
1
and x2 as dependent. It describes the following relation:
given x1 , what must x2 be so that (x1 , x2 ) stays on the
contour curve of height k?
I For any point (x , x ) on the contour curve of height k, the
1 2
slope of the line tangent to the curve at (x1 , x2 ) (if exists) tells
us g0 (x1 ). We can actually compute g0 (x) without knowing
what g is.
Proposition 5.6 (Implicit Function Theorem for R2 ) If f : R2 → R is
continuously differentiable at (x1 , x2 ) where f (x1 , x2 ) = k and
f2 (x1 , x2 ) 6= 0, then there exists a function g on an open ball B (x1 ) such
that
1. f (x̂1 , g(x̂1 )) = k for all x̂1 ∈ B (x1 )
2. g(x1 ) = x2
f1 (x1 , x2 )
3. g0 (x1 ) = − .
f2 (x1 , x2 )
Another related question is: at x, in which direction is f the “steepest”, i.e. the
directional derivative is the greatest?
I Intuitively, it should be the direction orthogonal to the contour curve.
I Since the slope of the contour curve is −f1 (x)/f2 (x), the “slope” of that
steepest direction should be f2 (x)/f1 (x).
I Exactly the “slope” of the gradient vector ∇f (x) = (f (x), f (x))
1 2
Proposition 5.7 If f : R2 → R is differentiable at x, then

∇∇f (x) f (x) ≥ ∇v f (x) ∀v ∈ R2 .
(Proof left as exercise)

A Bunch of Things

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Bunch of Things

Uploaded by

Copyright:

Available Formats

ECON 5100

Based on these intuitions, introduce

Definition. Given a vector space (V , +, ·), the function

I Called the Euclidean norm

Example 2. A norm on (Fb , +, ·) where Fb is the set of all

I Known as sup(remum) norm or uniform norm

That’s no coincidence: we can create a metric space (S, d) from

To convice ourselves that (S, d) is indeed a metric space, we

(1) d(v, v) = kv + (−v)k = k0k = 0

→ Indeed d derived from k · k is a distance function, so (S, d) is

= k(v1 , ..., vn ) + (−(u1 , ..., un ))k

Example 2. (Fb , d ∞ ) is derived from (Fb , +, ·, k · k∞ ) if

d ∞ (f , g) = sup |f (x) − g(x)|

I We will need to use (Fb , d ∞ ) when discussing convergence

convex sets vs. convex functions

Here we first talk about convexity of sets.

Definition. Given a vector space (V , +, ·), vector u is said to

I Thus, a convex combination is a special kind of linear

I If two points are in S, then the “line segment” connecting

Proposition 5.1. Given a vector space (V , +, ·), S ⊂ V is

(Proof left as exercise. Hint: induction.)

P(a1 , ..., an , c) := {(x1 , ..., xn ) : a1 x1 + ... + an xn = c}

where a1 , ..., an , c ∈ R are parameters and ai 6= 0 for some i.

I A hyperplaine is a point in R1 , a line in R2 , a plane in R3 .

I Half-space: each side of the space cut by hyperplane

H+ (a1 , ..., an , c) := {(x1 , ..., xn ) : a1 x1 + ... + an xn ≥ c}

(Exercise: show that hyperplanes and half-spaces are convex.)

Proposition 5.2 (Separating Hyperplane Theorem)

Definition. Given a convex susbset S of a vector space (V , +, ·), a

f (λ · v + (1 − λ) · u) ≥ λf (v) + (1 − λ)f (u).

f is said to be convex if and only if for any v, u ∈ S and λ ∈ [0, 1],

f (λ · v + (1 − λ) · u) ≤ λf (v) + (1 − λ)f (u).

Proposition 5.3 (Jensen’s Inequality) Suppose S is a convex

2. f : S → R is convex if and only if

(Proof left as exercise)

Fixed point theorems are very important in economics,

The idea behind all fixed point theorems is that for a

Definition. Given a metric space (S, d), a self-mapping

I f pulls things closer.

Example 1. f (x) = x/2 is a contraction mapping on (R, d E )

I x ∗ is a fixed point that never changes position under f .

Example 1. f (x) = x/2 in (R, d E )

Then we wish to show that f (x ∗ ) = x ∗ , which is equivalent to d(f (x ∗ ), x ∗ ) < 

Proposition 5.4 (Brouwer Fixed Point Theorem) Given S ⊂ Rn which

Compare with Banach under Rn :

We start by thinking about f (x) where x = (x1 , ..., xn ) ∈ Rn as a

f (xi + h, x−i ) − f (x)

When f is partially differentiable with respect to all arguments at

Example continued: f (x1 , x2 ) = x1 |x2 |. Recall: f1 (x1 , x2 ) = |x2 |

But there’s no reason to focus only on the directions of the

Definition. Given vector v ∈ Rn , if

I We denote this directional derivative as ∇v f (x).

For f to be “smooth” at x in a way similar to how a single-variable function

Definition. f is (totally) differentiable at x if there exists a vector d ∈ Rn

I If f 0 exists and is continuous at x, then we say f 0 is continuously

f : R2 → R looks like a terrain with peaks, valleys, etc.

Let’s think about the equation f (x) = k where k is a constant.

the equation f (x1 , g(x1 )) = k.

Proposition 5.7 If f : R2 → R is differentiable at x, then

(Proof left as exercise)

You might also like

Then we wish to show that f (x ∗ ) = x ∗ , which is equivalent to d(f (x ∗ ), x ∗ ) <