You are on page 1of 26

ECON 5100

5. A Bunch of Things
In this chapter we will see
I Normed vector spaces and how they induce metric spaces
I The notion of convexity
I Some useful “fixed point theorems”
I Multi-variable calculus
Norm
Recall a vector space (V , +, ·): basically, a bunch of arrows.
I We may sometimes want to think about the “length” of an
arrow
I What should a suitable notion of “length” satisfy?
I The “length” of 0 is 0.
I The “length” of a · v is a times the “length” of v.
I The “length” of v + u should be lesss than the “length” of v
plus the “length” of u (triangle inequality).

Based on these intuitions, introduce

Definition. Given a vector space (V , +, ·), the function


k · k : V → R+ is a norm of (V , +, ·) if
1. kvk if and only if v = 0
2. ka · vk = |a|kvk for any v ∈ V and a ∈ R
3. kv + uk ≤ kvk + kuk for any v, u ∈ V .
We say that (V , +, ·, k · k) is a normed vector space.
Example 1. A norm on (Rn , +, ·)
v
u n
uX
k(v1 , ..., vn )k = t vi2
i=1

I Called the Euclidean norm


I Length: distance from the origin to the “arrowhead”

Example 2. A norm on (Fb , +, ·) where Fb is the set of all


bounded functions R → R.

kf k = sup |f (x)|
x∈R

I Known as sup(remum) norm or uniform norm


I Often denoted as k · k∞
I “Length”: distance from the horizontal axis to the highest
“ceiling” or lowest “floor”
You may have observed certain similarities between the norm
of a vector space and the distance of a metric space.
I Both indicate “length” of some sort
I Both satisfy triangle inequality

That’s no coincidence: we can create a metric space (S, d) from


a normed vector space (V , +, ·, k · k):
1. S ⊂ V
2. d(v, u) = kv + (−u)k, where −u is the inverse of u.

To convice ourselves that (S, d) is indeed a metric space, we


need to make sure the three properties of d are satisfied.

(1) d(v, v) = kv + (−v)k = k0k = 0


(2) d(v, u) = kv + (−u)k = | − 1|kv + (−u)k = ku + (−v)k = d(u, v)
(3) d(v, u) = kv + (−u)k = kv + (−w) + w + (−u)k
≤ kv + (−w)k + kw + (−u)k = d(v, w) + d(u, w)

→ Indeed d derived from k · k is a distance function, so (S, d) is


a metric space.
Example 1. (Rn , d E ) is derived from (Rn , +, ·, k · k) because
v
  u n
uX
d (v1 , ..., vn ), (u1 , ..., un ) = t (vi − ui )2 = k(v1 − u1 , ..., vn − un )k
i=1

= k(v1 , ..., vn ) + (−(u1 , ..., un ))k

Example 2. (Fb , d ∞ ) is derived from (Fb , +, ·, k · k∞ ) if

d ∞ (f , g) = sup |f (x) − g(x)|


x∈R

I We will need to use (Fb , d ∞ ) when discussing convergence


of a sequence of functions in dynamic programming
Convexity
The term “convexity” often pops up in economics. However,
there are two notions, different but “kind of” related, both
called the name

convex sets vs. convex functions

Here we first talk about convexity of sets.


I Underlying idea: the set has no “holes” or “gaps”.

Definition. Given a vector space (V , +, ·), vector u is said to


be a convex combination of vectors v1 , ..., vk if u is a linear
combination of v1 , ..., vk with coefficients λ1 , ..., λk ∈ [0, 1]
Pk
where i=1 λi = 1.

I Thus, a convex combination is a special kind of linear


combination.
I Geometrically, u lives in the “region” in between v1 , ..., vk .
Definition. Given a vector space (V , +, ·), S ⊂ V is convex if
and only if for any v, u ∈ S, all of their convex combinations
are also in S.

I If two points are in S, then the “line segment” connecting


them are also in S.
I R : convex sets are intervals
What about the convex combination of more than two points?

Proposition 5.1. Given a vector space (V , +, ·), S ⊂ V is


convex if and only if for any v1 , ..., vk ∈ S, all of their convex
combinations are also in S.

(Proof left as exercise. Hint: induction.)


I Hence S is closed under convex combination.
I The set of all convex combinations of a set T of vectors is
called the convex hull (co(T )) of T . Thus the proposition is
the same as saying that S is convex if and only if co(T ) ⊂ S
for any T ⊂ S.
Now we focus on the vector space (Rn , +, ·). There are a few
notable convex sets in this space.
I Hyperplane:

P(a1 , ..., an , c) := {(x1 , ..., xn ) : a1 x1 + ... + an xn = c}

where a1 , ..., an , c ∈ R are parameters and ai 6= 0 for some i.

I A hyperplaine is a point in R1 , a line in R2 , a plane in R3 .


I Geometrically, a hyperplane is a “straight” divider that cuts
the space into two halves

I Half-space: each side of the space cut by hyperplane


P(a1 , ..., an , c)

H+ (a1 , ..., an , c) := {(x1 , ..., xn ) : a1 x1 + ... + an xn ≥ c}


H− (a1 , ..., an , c) := {(x1 , ..., xn ) : a1 x1 + ... + an xn ≤ c}.

(Exercise: show that hyperplanes and half-spaces are convex.)


The following result is of crucial importance to optimization
problem in economics.

Proposition 5.2 (Separating Hyperplane Theorem)


Given the vector space (Rn , +, ·), if S, T ⊂ Rn are two convex
sets which are disjoint, then there exists a hyperplane
P(a1 , ..., an , c) such that S ⊂ H+ (a1 , ..., an , c) and
T ⊂ H− (a1 , ..., an , c).

(Proof omitted)
I Due to Hermann Minkowski (1864-1909), Einstein’s
professor at ETH Zurich and a contributor to the theory of
relativity
I Geometric interpretation: every two disjoint convex
regions have a “gap” in which you can insert a plane
I Algebraic interpretation: every (v1 , ..., vn ) ∈ S satisfies the
linear inequality a1 v1 + ... + an vn ≥ c and every
(v1 , ..., vn ) ∈ T satisfies another linear inequality
a1 v1 + ... + an vn ≤ c
I It is important that S and T are convex and disjoint.
Now let’s introduce convex and concave functions

Definition. Given a convex susbset S of a vector space (V , +, ·), a


function f : S → R is said to be concave if and only if for any
v, u ∈ S and λ ∈ [0, 1],

f (λ · v + (1 − λ) · u) ≥ λf (v) + (1 − λ)f (u).

f is said to be convex if and only if for any v, u ∈ S and λ ∈ [0, 1],

f (λ · v + (1 − λ) · u) ≤ λf (v) + (1 − λ)f (u).

I When the inequalities are always strict, we say that the function
is strictly concave/convex.
I For S ⊂ R, a concave function “curls down”, as the line segment
connecting two points on the curve must lie “under” the curve.
I If f is twice differentiable at x, then f 00 (x) ≤ 0. (Proof as
exercise)
I Similarly, a concave function “curls up”.
I If f is twice differentiable at x, then f 00 (x) ≤ 0.
Concave/convex function: for two vectors, image of convex
combination is above/below convex combination of images
I This actually is true for arbitrarily many vectors:

Proposition 5.3 (Jensen’s Inequality) Suppose S is a convex


set in vector space (V , +, ·) and u ∈ S is a convex
combination of v1 , ..., vk ∈ S with coefficients λ1 , ..., λk .
1. f : S → R is concave if and only if
k
X
f (u) ≥ λi f (vi ).
i=1

2. f : S → R is convex if and only if


k
X
f (u) ≤ λi f (vi ).
i=1

(Proof left as exercise)


Fixed point theorems

Fixed point theorems are very important in economics,


because much of economic analysis focuses on ideas such as
“equilibrium”, “steady state”, “stable outcome”, etc.
I Walrasian equilibrium
I Nash equilibrium
I Steady state of economic growth

The idea behind all fixed point theorems is that for a


self-mapping, i.e. a function with identical domain and
co-domain, something is “fixed” and stable under f , i.e.
f (x) = x.
Banach Fixed Point Theorem
The Banach Fixed Point Theorem says that a kind of
functions, called “contraction mappings”, have a fixed point.
I Due to Stefan Banach (1892-1945).

Definition. Given a metric space (S, d), a self-mapping


f : S → S is a contraction mapping if there exists some
q ∈ [0, 1) such that d(f (x), f (y)) ≤ qd(x, y) ∀x, y ∈ S.

I f pulls things closer.


I If S ⊂ R: the slope between any two points on the graph of
f should have an absolute value no greater than q for it to
be a contraction mapping.

Example 1. f (x) = x/2 is a contraction mapping on (R, d E )


I First, it is a self mapping R → R
I d(f (x), f (x)) = |x/2 − y/2| = |x − y|/2 ≤ 12 d(x, y)
Proposition 5.3 (Banach Fixed Point Theorem) Given a complete
metric space (S, d) and a contraction mapping f : S → S, there
exists a unique x ∗ ∈ S such that f (x ∗ ) = x ∗ . Moreover, for any
x ∈ S, the sequence x, f (x), f (f (x)), f (f (f (x)))... converges to x ∗ in
(S, d).

I x ∗ is a fixed point that never changes position under f .


I All the other points are “drawn” closer and closer to x ∗ as we
keep applying f .
I If S ∈ R: f touches the 45◦ line once and only once.
I Important to the theory of dynamic programming.

Example 1. f (x) = x/2 in (R, d E )


I Is the metric space complete? Yes. So Banach applies.
I We can solve for x ∗ : x ∗ = f (x ∗ ) = x ∗ /2 ⇒ x ∗ = 0.
I For any x: (x, f (x), f (f (x)), ...) = (x, x/2, x/4, ...) → 0
× f (x) = x/2 does not have a fixed point in ((0, ∞), d E ) because the
space is not complete.
× f (x) = x/2 does not have a fixed point in ([1, 2], d E ) because it is
not a self-mapping when restricted to domain [1, 2].
Proof sketch. First we pick any x0 ∈ S. Let (xi ) be the sequence such that
xi = f ◦ f ◦ ... ◦ f (x0 ). We wish to show that (xi ) is Cauchy. Since f is a
| {z }
i times
contraction mapping, there is some q ∈ [0, 1) d(xi+1 , xi ) = d(f (xi ), f (xi−1 ))
≤ qd(xi , xi−1 ). In other words, the distance between two neighboring points in
the sequence keeps shrinking by a factor of at least q. Thus
d(xi , xi+1 ) ≤ qi−1 d(x1 , x2 ), and for any j > i, by triangle inequality we have
d(xi , xj ) ≤ d(xi , xi+1 ) + d(xi+1 , xi+2 ) + ... + d(xj−1 , xj ) ≤
i−1
1
d(xi , xi+1 )(1 + q + q2 + ...) = d(xi , xi+1 ) 1−q ≤ q1−q d(x1 , x2 ). This implies that
(xi ) is Cauchy (fill in the details yourself), and hence convergent to some x ∗
because (S, d) is complete.

Then we wish to show that f (x ∗ ) = x ∗ , which is equivalent to d(f (x ∗ ), x ∗ ) < 


for any  > 0. Fix  > 0. Since xi → x ∗ , there is some n such that
d(x ∗ , xi ) < /2 for any i > n. Observe that d(x ∗ , f (x ∗ )) ≤ d(x ∗ , xn+2 )
+d(f (x ∗ ), xn+2 ) = d(x ∗ , xn+2 ) + d(f (x ∗ ), f (xn+1 )) ≤ /2 + q/2 < . Hence
indeed f (x ∗ ) = x ∗ .

Finally we show that x ∗ is unique. Suppose there is also some y∗ such that
f (y∗ ) = y∗ , then d(x ∗ , y∗ ) = d(f (x ∗ ), f (y∗ )) ≤ qd(x ∗ , y∗ ), implying
d(x ∗ , y∗ ) = 0 and thus, x ∗ = y∗ .
Brouwer Fixed Point Theorem

Proposition 5.4 (Brouwer Fixed Point Theorem) Given S ⊂ Rn which


is compact and convex, and a self-mapping S → S which is
continuous, there exists x ∗ ∈ X such that f (x ∗ ) = x ∗ .

Compare with Banach under Rn :


Banach S is complete, f is contraction, −→ x ∗ unique
Brouwer S is compact and convex, f is continuous, −→ x ∗ exists
I If S ⊂ R, the graph of f can touch the 45◦ multiple times.

Historical notes
I Due to L.E.J. Brouwer (1881-1966) around 1910, generalized by
Shizuo Kakutani for set-valued functions (continuity is replaced
by upper hemi-continuity)
I Used by John Nash in 1951 to prove the existence of Nash
equilibrium
I Thanks to Nash, economists realized its power and used it to
eventually tackle the long-standing problem of the existence of
the Walrasian equilibrium (Arrow-Debreu-McKenzie, 1954)
Proof of the R case: If S ⊂ R is compact and convex, then it must be a closed
interval [a, b]. Since f is continuous, the function g(x) := f (x) − x is also
continuous. Note that g(a) ≥ 0 because f (a) ∈ [a, b] ≥ a. Also g(b) ≤ 0
because f (b) ∈ [a, b] ≤ b. Therefore min{g(a), g(b)} ≤ 0 ≤ max{g(a), g(b)},
and by the Mean Value Theorem there is some x ∗ ∈ [a, b] such that g(x ∗ ) = 0,
which implies that f (x ∗ ) = x ∗ .
Multivariable Calculus
Consider f : Rn → R.
I We already know what it means for f to have a limit or to be
continuous at a point.
I How about differentiability?

We start by thinking about f (x) where x = (x1 , ..., xn ) ∈ Rn as a


function of only xi , where all the xj , j 6= i (which we usually jointly
denote as x−i ) are kept constant (i.e. parameters, not variables).
I f “becomes” a single variable function of xi alone, and we know
how to define derivative for it.

f (xi + h, x−i ) − f (x)


Definition. Consider f : Rn → R. If limh→0
h
exists at x ∈ Rn , then we say:
1. The partial derivative of f with respect to the ith argument
exists at x, which is equal to this limit and is denoted as fi (x).
2. f is partially differentiable with respect to the ith argument
at x.


∂f
I We sometimes use to denote fi (x)

∂xi
x
Example: f (x1 , x2 ) = x1 |x2 |.
I f is partially differentiable with respect to the first argument at
every point: f1 (x1 , x2 ) = |x2 |.
I f is partially differentiable with respect to the second argument
at any every(point except at those where x2 = 0:
x1 if x2 > 0
f2 (x1 , x2 ) =
−x1 if x2 < 0

When f is partially differentiable with respect to all arguments at


 = (x1 , ..., xn ),we can write the partial derivatives together as a vector
x
f1 (x), ..., fn (x)
I Called the gradient of f at x and is denoted as ∇f (x)
If f is partially differentiable with respect to the ith argument at every
point in an open ball B (x), fi is a function within that open ball, and
we can talk about the existence and value of the partial derivative of fi
with respect to the jth argument at x
I If that partial derivative exists, it is denoted as fij (x)
I fij (x) is called the second-order partial derivative of f at (x)

Example continued: f (x1 , x2 ) = x1 |x2 |. Recall: f1 (x1 , x2 ) = |x2 |


I f1 is partially differentiable with respect to the first argument at
every point: f11 (x1 , x2 ) = 0.
I f1 is partially differentiable with respect to the second
( argument
1 if x2 > 0
at every point except where x2 = 0: f12 (x1 , x2 ) =
−1 if x2 < 0

If fij exists at x for all i, j = 1, ..., n, then we can write them together as
a matrix [fij (x)]n×n .
I This matrix is called the Hessian matrix and denoted as Hf (x).
Partial derivative fi (x) tells us the rate of change of f around x
if we move “along the ith axis”, i.e. in the direction of ei .
I Keeping x−i constant and only changing xi means moving
along the ith axis.

But there’s no reason to focus only on the directions of the


axes. We can think about the rate of change of f in any
direction, i.e. given by an arbitrary vector v = (v1 , ..., vn ).

Definition. Given vector v ∈ Rn , if

f (x + h · v) − f (x)
lim
h→0 h
exists, then we say the directional derivative of f at x
along the vector v exists, which is equal to this limit.

I We denote this directional derivative as ∇v f (x).


 
I Clearly, fi (x) = ∇ei f (x), so ∇f (x) = ∇e1 f (x), ..., ∇en f (x)
Even if f is partially differentiable in all arguments at x, it’s not always the
case that the directional derivative of f along every vector exists at x.
I Because partial derivatives just account for n directions, whereras there
are infinitely many directions.
I f can still be “kinky” at x even if all partial derivatives exist.

For f to be “smooth” at x in a way similar to how a single-variable function


looks when differentiable at a point, we need:

Definition. f is (totally) differentiable at x if there exists a vector d ∈ Rn


such that for any vector h ∈ Rn ,
f (x + h) − f (x) − d t h
lim = 0.
h→0n khk
We call d the (total) derivative of f at x and denote it as f 0 (x).

(Note: d t h = d1 h1 + ... + dn hn )
I If f 0 exists in an open ball around x, it is a function (in fact, a vector of n
functions) in that ball, and we can talk about the derivative of f 0 at x if
exists, in which case we say f is twice differentiable at x and call the
derivative of f 0 at x as the second-derivative of f at x, denoted as f 00 (x).
I Thus f 00 (x) is a n × n matrix.

I If f 0 exists and is continuous at x, then we say f 0 is continuously


differentiable at x.
Proposition 5.5
1. If f is differentiable at x, then the directional derivative
of f along any vector exsits at x and hence ∇f (x) exists.
Moreover,
a f 0 (x) = ∇f (x)
(x))t v
b ∇v f (x) = (∇fkvk
c If, in addition, f 00 (x) also exists, then f 00 (x) = Hf (x), and
f 00 (x) is symmetric.
2. If ∇f (x) exists at x ∈ Rn and moreover fi is continuous
at x for every i, then f is differentiable at x.

I When f is twice-differentiable at x:
I Gradient is the same as derivative
I Hessian is the same as second-order derivative, and
moreover fij = fji
I We can compute any directional derivative from gradient
I Continuity of partial derivatives indicates differentiability
The Geometry of Multi-variable Functions
Certain geometric inuitions are helpful to understanding
multi-varible functions. We will focus on f : R2 → R as the geometry is
most easily illustrated.

f : R2 → R looks like a terrain with peaks, valleys, etc.

Let’s think about the equation f (x) = k where k is a constant.


I All points that satisfy this equation constitute a level set
I On R2 , these points describe a contour curve at “height” k
I This contour curve can be interpreted as the graph of a
“function” x2 = g(x1 ), although we don’t know what g looks like.
I This g is called an implicit function, formally defined by

the equation f (x1 , g(x1 )) = k.


I The implicit function treats x as an independent variable
1
and x2 as dependent. It describes the following relation:
given x1 , what must x2 be so that (x1 , x2 ) stays on the
contour curve of height k?
I For any point (x , x ) on the contour curve of height k, the
1 2
slope of the line tangent to the curve at (x1 , x2 ) (if exists) tells
us g0 (x1 ). We can actually compute g0 (x) without knowing
what g is.
Proposition 5.6 (Implicit Function Theorem for R2 ) If f : R2 → R is
continuously differentiable at (x1 , x2 ) where f (x1 , x2 ) = k and
f2 (x1 , x2 ) 6= 0, then there exists a function g on an open ball B (x1 ) such
that
1. f (x̂1 , g(x̂1 )) = k for all x̂1 ∈ B (x1 )
2. g(x1 ) = x2
f1 (x1 , x2 )
3. g0 (x1 ) = − .
f2 (x1 , x2 )

Another related question is: at x, in which direction is f the “steepest”, i.e. the
directional derivative is the greatest?
I Intuitively, it should be the direction orthogonal to the contour curve.
I Since the slope of the contour curve is −f1 (x)/f2 (x), the “slope” of that
steepest direction should be f2 (x)/f1 (x).
I Exactly the “slope” of the gradient vector ∇f (x) = (f (x), f (x))
1 2

Proposition 5.7 If f : R2 → R is differentiable at x, then


∇∇f (x) f (x) ≥ ∇v f (x) ∀v ∈ R2 .

(Proof left as exercise)

You might also like