Professional Documents
Culture Documents
N. L. Carothers
Preface
These are notes for a six week summer course offered to graduate students at Bowling Green
State University. The audience consisted primarily of first and second year students in pure
mathematics, but also included a few students of applied mathematics and statistics.
I had no course outline or plan of attack other than to begin with the most widely
known classical inequalities and see where that led us. For the most part I simply followed
my nose and steered the discussion toward whatever held my interest at the moment. I
should quickly add, however, that the presentation consisted of more than lectures: From
the outset, the students were expected to present (and discuss) their solutions to the
problem sets in class, which added immensely to the experience. On several occasions we
devoted more than half the class period to discussing the problems. This material would
easily lend itself to a student seminar, a course on problem solving, or even a course taught
using the so-called Moore method.
I assumed very little by way of prerequisites and tried to make the course accessible to
a wide audience, which may help to explain the selection of topics. There are a few topics
that I would like to have covered (topics that required a knowledge of Lebesgue integration,
for example, or rudimentary functional analysis) that I feared would be beyond the grasp
of some of the students. Still, I think I managed to demonstrate a variety of techniques and
methods of proof in this brief survey, although my personal preferences, which lean toward
applications of convexity (and, more generally, toward an analytic rather than algebraic
perspective) are plainly reflected in these notes.
Preliminaries
Obviously, the comparison of quantities is an essential tool in mathematics. In this course
well be concerned with all manner of inequalities and, in particular, their systematic study
and the tools used in their creation.
The art of inequalities is found in the clever, often subtle methods used to generate
and verify them. The science of inequalities lies in their careful interpretation and in the
knowledge of their scope and limitations.
We will take great pains to distinguish between strict inequalities (<) and weak inequalities (). In the case of weak inequalities, we will carefully examine the case for
equality. While we will freely use techniques from calculus, we will be sparing with limits,
lest strict inequalities turn into weak inequalities.
Although the study of inequalities is arguably a subfield of real analysis, we will
occasionally find recourse to tools from complex analysis, linear algebra, geometry, and, of
course, algebra. Moreover, we will uncover applications to all of these fields (and more!).
As a starting point, well begin with two simple axioms:
(1) A given real number a satisfies precisely one of the following: a < 0, a = 0, a > 0.
(2) If a > 0 and b > 0, then ab > 0 and a + b > 0.
There are obvious variations and extensions of these axioms; for example:
(10 ) Given a, b R, precisely one of the following holds: a < b, a = b, a > b.
and:
Theorem. If a > b > 0 and c > d > 0, then ac > bd.
Proof. Given a > b, we have ab > 0 and, hence, (ab)c > 0. That is, ac > bc. Similarly,
c d > 0 leads to b(c d) > 0 and, hence, bc > bd. Combining these two observations
yields ac > bd.
Please consult the exercises for more direct algebra proofs.
As you can well imagine, we will also have frequent use of mathematical induction.
1
(because 1 + x 0)
(1)
= 1 + (n + 1)x + x2
1 + (n + 1)x.
(2)
Thus, the inequality holds for all n = 1, 2, 3, . . .. Note that equality in (2) forces x = 0.
Refinement. If x 1, x 6= 0, then (1 + x)n > 1 + nx for all n = 2, 3, 4, . . ..
Corollary. Given x, y > 0, we have (x + y)n > xn + nxn1 y for all n = 2, 3, 4, . . ..
Equality can only occur if y = 0.
n
1
Application. 1 +
increases with n.
n
Proof. It suffices to show that 1 +
n+1
1
n+1
1+
1 n
n
Bernoullis inequality:
!n+1
n+1
1
1
1 + n+1
1
+
1
n+1
= 1+
1 n
n
1 + n1
1+ n
2
n+1
1
n + 2n
= 1+
n
(n + 1)2
n+1
1
1
= 1+
1
n
(n + 1)2
1
1
> 1+
1
= 1 (by Bernoulli).
n
n+1
2
Example. n! <
n n
2
for n 6.
Proof. Its not hard to check directly that 6! < 36 . For the inductive step:
n+1
n
n+1
n n
n+1
n+1
n+1
=
>
2 n! = (n + 1)!
2
2
n
2
2
Bernoullis inequality is deceptively powerful. As evidence of this, well use it to prove
a classical inequality due to Cauchy.
The Arithmetic-Geometric Mean Inequality. Given n N and positive numbers
a1 , a2 , . . . , an we have
a1 a2 an
1/n
a1 + a2 + + an
n
(AGM)
1/k
and Ak =
a1 + a2 + + ak
.
k
Then
An+1 =
nAn + an+1
an+1 An
= An +
n+1
n+1
(3)
= an+1 Ann
an+1 Gnn = Gn+1
n+1 .
(4)
Note that equality in (3) would force an+1 = An while equality in (4) would force a1 = =
an (by hypothesis). Hence, equality throughout would yield a1 = a2 = = an+1 .
3
A=
ab
P
a+b
=
2
4
(a constant).
V
r2
2V
r
V
V
= 2r2 + +
r
r
1/3
V
2 V
3 2r
r r
1/3
= 3 2V 2
,
= 2r2 +
(5)
(6)
where rewriting in (5) facilitates the application of the AGM in (6). Now equality occurs
when 2r2 = V /r; i.e., when r = (V /2)1/3 . For this value of r we have
2/3
1/3
V
V
2
V
h= 2 =
=2
= 2r.
r
V
2
In other words, the can should be as tall as it is wide. Not very realistic, but easy to solve.
4
Cauchys original proof of the AGM used a technique called backward induction: Suppose that P (n) is a proposition about the natural number n and suppose that
P (n) holds for infinitely many n, and
P (n 1) holds whenever P (n) holds.
Then P (n) holds for all n.
Cauchys proof of the AGM. We first establish (AGM) for n = 2m , m = 1, 2, . . .. To begin,
note that
a1 a2 =
a1 + a2
2
2
a1 a2
2
2
<
a1 + a2
2
2
(8)
2
4
a1 + a2 + a3 + a4
.
=
4
Equality in (7) forces a1 = a2 and a3 = a4 , while equality in (8) forces a1 + a2 = a3 + a4 ;
thus, we have strict inequality in either (7) or (8) unless a1 = a2 = a3 = a4 . Continuing
leads to
a1 a2 a2m <
a1 + + a2m
2m
2m
unless a1 = = a2m . Now to see how the case of general n follows, consider this: Given
n < 2m and a1 , . . . , an > 0, set b1 = a1 , . . . , bn = an , bn+1 = = b2m = An . Then
m
a1 an A2n
= b1 b2m
2m
b1 + + b2m
<
2m
2m
m
nAn + (2m n)An
= A2n .
=
m
2
(9)
mx + (n m)y
.
n
rx + (1 r)y.
We conclude this section with a few stray facts from algebra and a few tools from
calculus that will prove helpful in the sequel.
n
n
X
n
k=0
xk .
n
n
X
1
1
<
for n = 1, 2, . . ..
Application. 1 +
n
k!
k=0
Proof.
1
1+
n
n
=1+
n
X
n(n 1) (n k + 1)
nk
k=1
As it happens,
n
X
k=0
n
X
1
k! <
.
k!
k=0
n+1
1
1
< 1+
, but this is harder to prove.
k!
n
X
k=0
Application.
n
X
k=0
rk =
1 rn+1
. Thus, for
1r
1
. (See the exercises for a proof that rn 0 for |r| < 1.)
1r
1
< 3 for all n = 1, 2, . . ..
k!
k=0
1+1+
k=2
n
X
k=2
= 1+
n1
X
k=0
< 1+
1
2
1
2k1
k
1
= 3.
1 (1/2)
(0 < r < 1)
= xr < ry r1 (x y) + y r
= xr y r1 < r(x y) + y = rx + (1 r)y.
Look familiar?
7
Problem Set 1
1. If a > b > 0 and if p and q are positive integers, prove that ap/q > bp/q . In particular,
given a, b > 0, conclude that a > b if and only if ap/q > bp/q .
2. If a b > 0, p is a nonnegative integer, and q is a positive integer, show that
ap/q bp/q , with equality occurring if and only if either (i) a = b or (ii) p = 0.
3. If a1 b1 > 0, a2 b2 > 0, . . . , an bn > 0, show that a1 a2 an b1 b2 bn .
Moreover, equality holds if and only if a1 = b1 , a2 = b2 , . . ., an = bn .
4. If m and n are positive integers, show that 2 lies between m/n and (m+2n)/(m+n).
7. Amongst all rectangles having a fixed area A, prove that the square (with sides of
2
ab
(1/a) + (1/b)
for all positive a, b. Under what circumstances does equality hold?
9. Show that a + (1/a) 2 for all positive values of a. When does equality occur?
10. If 0 < c < 1, use Bernoullis inequality to show that cn 0. [Hint: Write c = 1/(1+x)
where x > 0.]
11. If c > 0, use Bernoullis inequality to show that c1/n 1. [Hint: If c > 1, write
c1/n = 1 + xn , where xn > 0, and estimate xn .]
12. Given x > 0, use Bernoullis inequality to show that (1 + (x/n))n increases while
(1 + (x/n))n+1 decreases as n increases.
13. Given a > 0, show that (1 + a)r > 1 + ra for any rational r > 1. [Hint: Write r = p/q,
where p > q are positive integers, and note that (1 + (x/p))p (1 + (x/q))q for x > 0.]
n n
n n
14. Prove by induction that
< n! <
for n 6.
3
2
8
k1
n
X
1
< 3 for all
, for all k = 1, 2, 3, . . ., to prove that
k!
k=0
n = 0, 1, 2, . . ..
16. Suppose that 0 xi 1 for i = 1, 2, . . .. Prove that
(1 x1 ) (1 xn ) 1 (x1 + + xn ) + (x1 x2 + + xn1 xn )
for all n = 2, 3, . . ..
1
17. For n = 1, 2, 3, . . ., show that 2( n + 1 n ) < < 2( n n 1 ) and use
n
this to prove that, for n = 2, 3, . . .,
n
X
1
< 2 n 1.
2 n2 <
k
k=1
1
1 3
2n 1
1
<
<
2 4
2n
4n + 1
3n + 1
for n = 2, 3, 4, . . ..
Convex Functions
A function f : I R, defined on a nontrivial interval I, is said to be convex if
f (x + y) f (x) + f (y)
(1)
zx
yx
yz
(2)
zx
yz
x +
y
yx
yx
(where the coefficients of x and y are nonnegative and add to 1). Thus,
f (z)
yz
zx
f (x) +
f (y).
yx
yx
zx
f (y) f (x)
yx
f (z) f (x)
f (y) f (x)
,
zx
yx
which is the first half of (1). The other half is entirely similar.
Now suppose that (2) holds. Take x < y in I, , 0, + = 1. Then z = x + y
satisfies
z x = (y x)
That is, z =
and y z = (y x).
yz
zx
x +
y. Thus, from (2),
yx
yx
zx
f (y) f (x)
yx
= f (x) + f (y) f (x)
f (z) f (x) +
= f (x) + f (y).
Corollary. If f : (a, b) R is differentiable and convex, then
f 0 (x)
f (y) f (x)
f 0 (y)
yx
(i) implies (ii) is clear. (ii) implies (iii) follows from the mean value theorem.
Indeed,
f (y) = f (x) + f 0 (c)(y x),
f (z) f (x)
f (y) f (z)
f 0 (z)
.
zx
yz
As in the proof of the three chords lemma, writing z = x + y leads to f (z) f (x)
f (y) f (z) or f (z) f (x) + f (y).
Claim. If f : (a, b) R is twice-differentiable, then f is convex if and only if f 00 0
(that is, f is concave up). If f 00 > 0, then f is strictly convex. (See the exercises.)
Examples.
(1) xp is convex on (0, ) for p 1, and strictly convex for p > 1.
(2) ex is strictly convex on R.
(3) log x is strictly concave on (0, ).
(4) x log x is strictly convex on (0, ).
12
x
= 1 log x1 + 2 log x2 + + k log xk
1
2
k
log 1 x1 + 2 x2 + + k xk ,
with equality if and only if x1 = = xk . And, because log x (or ex , if you prefer) is
strictly increasing, the result follows.
The case 1 = = k = 1/k is the familiar AGM. Another special case is also
familiar and will lead us to two new inequalities.
Youngs Inequality. If p, q > 1 satisfy
1
p
1
q
= 1, then xy
1 p 1 q
x + y for all x,
p
q
H
olders Inequality. Let x1 , . . . , xn , y1 , . . . , yn > 0 and let p, q > 1 satisfy
1
p
1
q
= 1.
Then
n
X
xi yi
i=1
(xpi )
and
n
X
!1/p
xpi
i=1
q
(yi ) are
n
X
!1/q
yiq
(3)
i=1
, , not both zero, we have xpi = yiq for all i = 1, . . . , n. The case p = q = 2 is usually
referred to as the Cauchy-Schwarz inequality.
Pn
Pn
1/p
1/q
Proof. We may assume that u = ( i=1 xpi )
6= 0 and v = ( i=1 yiq )
6= 0, in which
case we have, from Youngs inequality, that
1 xi p 1 yi q
xi yi
+
u v
p u
q v
(4)
for each i, with equality if and only if (xi /u)p = (yi /v)q . Summing over i, we get
n
n
n
1 X
1 X p
1 X q
xi yi
x + q
y = 1.
uv i=1
pup i=1 i
qv i=1 i
Thus,
Pn
i=1
xi yi uv. Please note that equality in (3) would force equality in (4) for all
i and, hence, (xi /u)p = (yi /v)q for all i; that is, (xpi ) and (yiq ) would be proportional.
Corollary. Let x1 , . . . , xn , y1 , . . . , yn > 0, let 0 < p < 1, and let q satisfy
n
X
xi yi
i=1
n
X
!1/p
xpi
i=1
n
X
1
p
+ 1q = 1. Then
!1/q
yiq
i=1
This version actually follows from our previous version after a bit of judicious
rewriting:
n
X
xpi
i=1
n
X
(xi yi )p yip .
i=1
Now we apply H
olders inequality using the conjugate pair of exponents
p0 =
1
>1
p
and q 0 =
14
q
= 1 q > 1.
p
(5)
Note that
1
p
1
1
+ 0 =p+
=p 1
= 1.
p0
q
q
p
Applying H
olders inequality with this pair of exponents yields
!p n
!p/q
n
n
X
X
X (p)(q/p)
p
xi
(xi yi )p(1/p)
yi
i=1
i=1
n
X
i=1
!p
xi yi
i=1
n
X
!p/q
yiq
i=1
Taking p-th roots and rearranging the terms finishes the proof.
We will see a few more variations and extensions of Holders inequality. For now well
settle for the following:
Theorem. Let a = (ai ), b = (bi ), . . . , h = (hi ) denote elements of Rn in which all entries
are positive, let , , . . . , > 0 with + + + = 1. Then
!
! n !
n
n
n
X
X
X
X
ai
bi
hi
ai bi hi
i=1
i=1
i=1
i=1
15
Problem Set 2
1. If f : I R is convex, show that f (1 x1 + + k xk ) 1 f (x1 ) + + k f (xk ), for
any x1 , . . . , xk I and any 1 , . . . , k > 0 with 1 + + k = 1. [This is sometimes
called Jensens inequality, and follows easily by induction on k.] If f is strictly convex,
prove that equality occurs in Jensens inequality (if and) only if all the xi are equal.
2. Let f , g : I R be convex functions and let R. Determine which of the following
are convex functions: f , f + g, f g, |f |, f g, max{f, g}, min{f, g}.
3. Let f , g : R R be convex functions with g increasing. Show that g f is convex.
Deduce that if h : R R is a positive function and if log h is convex, then h is convex.
Give an example of a positive convex function on R whose logarithm is not convex.
4. Show that the reciprocal of a positive concave function is convex. Is the reciprocal of
a positive convex function always concave?
5. Prove that f : (0, ) R is convex if and only if g(x) = xf (1/x) is convex on (0, ).
6. If f is convex on (0, ), and if x1 , . . . , xm , y1 , . . . , ym > 0, show that
y1 + + ym
y1
ym
(x1 + + xm ) f
x1 f
+ + xm f
.
x1 + + xm
x1
xm
Show that f (x) = (1 + xp )1/p is convex on (0, ) for p 1, and deduce from the first
1/p
part of the exercise that (x1 + + xm )p + (y1 + + ym )p
(xp1 + y1p )1/p +
p 1/p
) .
+ (xpm + ym
I, show that
0
f+
(a)
f (b) f (a)
0
f
(b).
ba
16
14. Prove that a convex function defined on a bounded interval is bounded below.
15. If f : R R is convex and bounded above, prove that f is constant.
16. Given f : I R, the epigraph of f is the set epif = {(x, y) : x I, y f (x)} in R2 .
Show that f is convex if and only if epif is a convex subset of R2 .
17. We say that f : I R is lower semicontinuous (l.s.c.) if, for each real , the set
{x I : f (x) } is closed. Prove that a convex function f : I R is lower
semicontinuous if and only if epif is closed. For this reason, l.s.c convex functions are
often referred to as closed convex functions.
17
tx
ts
f (s) +
xs
ts
f (t) = f (s) +
f (t) f (s)
ts
(x s) = L(x).
Each of the sets above is closed and bounded, so s0 and t0 exist and satisfy s0 < x < t0
and f (s0 ) = L(s0 ), f (t0 ) = L(t0 ). (The continuity of both f and L has come into play
here.) Moreover, f (y) > L(y) for all s0 < y < t0 . (We cant have f (y) = L(y), and the
intermediate value theorem tells us that we cant have f (y) < L(y).) All that remains
is to note that the line through (s0 , f (s0 )) and (t0 , f (t0 )) coincides with the line through
(s, f (s)) and (t, f (t)); that is, the chord joining (s0 , f (s0 )) and (t0 , f (t0 )) lies on the chord
joining (s, f (s)) and (t, f (t)). (See the figure on the back.)
18
What makes this problem especially interesting is that theres a companion result (see
Theorem 125 in Hardy, Littlewood, and Polya).
100 . Let f : (a, b) R be continuous. Then f is convex if and only if, for all a < s < t < b,
1 R t
we have (t s)
f (x) dx f ((s + t)/2).
s
The forward implication is easy: If f is convex and if a < s < t < b, we can consider
a supporting line at (s + t)/2; that is, we can find an m such that
s+t
s+t
+m x
f (x) f
2
2
for all x. Integration over [ s, t ] then yields:
Z t
s+t
f (x) dx (t s)f
2
s
because
Z t
s+t
x
dx = 0.
2
s
Im not sure how to prove the backward implication. Anyone interested?
19
Elementary Means
Before we pursue more inequalities, lets introduce some notation. Given a vector x = (xi )
in Rn with positive entries and a real number p 6= 0, we write
!1/p
!1/p
n
n
X
X
1
and
Mp (x) =
Sp (x) =
xpi
.
xpi
n
i=1
i=1
The expression Sp (x) is sometimes written as kxkp . The expression Mp (x) is called the
(simple) mean of of x of order p. Of course, were somewhat familiar with such expression
when p > 0, but we will have occasion to explore the full range of valueseven p = !
Given a set of positive weights 1 , . . . , n > 0 with 1 + + n = 1, we write
!1/p
n
X
Mp (x, ) =
i xpi
.
i=1
1
Mp (1/x, )
n
1 2
2. M0 (x, ) lim p0 Mp (x, ) = x
1 x2 xn . Thus, in particular, we also have
n
X
1
log
i xpi
p
i=1
!)
.
Next, we appeal to lH
opitals rule to find
P
n
p
Pn
log
i=1 i xi
i xpi log xi
i=1
P
= lim
lim
n
p
p0
p0
p
i=1 i xi
n
X
=
i log xi
i=1
n
1 2
= log x
x
x
.
n
1
2
3. M (x) lim p+ Mp (x, ) = max{ x1 , . . . , xn } and
M (x) lim p Mp (x, ) = min{ x1 , . . . , xn }.
1/p
Proof.
1/p
and
1 as p +. The other case is similar but, again, we could just appeal to the
Proof.
1
t
1
t0
= 1. Then, from H
olders inequality,
n
X
i=1
because
Pn
i=1
i xi =
n
X
1/t
1/t0
i xi i
i=1
n
X
!1/t
i xti
i=1
proportional to (i ); i.e., if and only if (xti ) is proportional to (1, 1, . . .); i.e., if and only if
x is constant.
Now consider the case 0 < s < t. In this case, t/s > 1 and, hence,
!1/s
!(s/t)(1/s)
!1/t
n
n
n
X
X
X
s(t/s)
i xsi
i xi
=
i xti
.
i=1
i=1
i=1
21
i x1
= x
i x1
.
n
1
1 x2 xn
i
i
i=1
i=1
Of course, equality always occurs (in both (i) and (ii)) when p = 1, so we may
suppose that p 6= 1.
First suppose that 1 < p < and let p0 satisfy 1/p + 1/p0 = 1. Then (p 1)p0 = p.
Next,
p
Mp (x + y, ) =
n
X
i (xi + yi ) =
i=1
n
X
n
X
i=1
i xi (xi + yi )
p1
i=1
n
X
i yi (xi + yi )p1
i=1
n
X
i=1
!1/p
i xpi
n
X
i=1
!1/p
i yip
n
X
i=1
!1/p0
i (xi + yi )(p1)p
(1)
x1 + y1
xn + yn
x1 + y1
xn + yn
xn
y1
yn
x1
+ + n
+ 1
+ + n
x1 + y1
xn + yn
x1 + y1
xn + yn
= 1!
Equality can only occur if each of the sequences (xi /(xi + yi )) and (yi /(xi + yi )) are
constant, from which youll quickly deduce that x and y must be proportional.
Minkowskis inequality also holds in the cases p = , but the case for equality is a
bit different. For example,
M (x + y) = max{x1 + y1 , . . . , xn + yn } M (x) + M (y).
Equality can only occur if x and y attain their maximum values at the same coordinate;
i.e., if and only if, for some k, we have xk = M (x) and yk = M (y).
The Passage to Infinite Series
Its natural to ask whether our work on finite sums (and elementary means) extends to
infinite sums. For the most part, everything weve done has a satisfactory analogue in the
23
infinite series case, however the proofs can be surprisingly difficultsimply passing to a
limit is often not enough.
For now, well settle for stating a few of these extensions (with the occasional proof).
As well see, there are more convenient paths to all of this (and more) through integrals.
Given a sequence of positive weights = (i ) satisfying
i=1
i = 1 and a positive
!1/p
i xpi
i=1
for x = (xi ) nonnegative. Well forego the case p < 0, but we will consider
!
Y
X
i
M0 (x, ) =
xi = exp
i log xi
i=1
i=1
and
M (x) = sup xi = l.u.b.{ xi : i 1 }.
i1
Facts.
1. If Ms is finite for some 0 < s < , then Mr is finite for all 0 < r < s and, in fact,
Mr Ms with equality if and only if x is constant. Moreover, M0 is finite (or zero)
in this case (i.e., M0 does not diverge to +) and, in fact, M0 = lim r0+ Mr .
2. M0 M1 (AGM) with equality if and only if x is constant.
Proof. From the inequality log x x 1 we have log xk log M1 (xk /M1 ) 1. Thus,
X
X
xk
1
(2)
k log xk log M1
k
M1
k=1
k=1
or, log M0 log M1 1 1 = 0. Equality here would force equality in (2), which would
mean that xk = M1 for all k.
3. If x = (xk ) is bounded, then lim r+ Mr = M .
Virtually all of our main results hold in the infinite series case with, at worst, minor
modifications.
24
f 6== and
with equality (everywhere) if and only if f /u = g/v. Now we integrate both sides:
Z
Z
Z
1
f g
f+
g = 1.
u v
u
v
A particular case is worth repeating:
Corollary. Let 1 < p < and let
1
p
1
q
Z
fg
1/p Z
g
1/q
.
Equality can only occur if Af p = Bg q for some constants A and B, not both zero.
As you can imagine, if p < 1, p 6= 0, and if
1/p
Z
1/p
|f |
Z
+
1/p
|g|
(3)
If p = 1, equality can only occur if f g 0. If p > 1, equality can only occur if f g 0 and
Af = Bg for some nonnegative constants A and B, not both zero.
Proof. If p = 1, then
Z
|f + g|
Z
|f | + |g| =
Z
|f | +
|g|,
(4)
|f |
+
|g|
|f + g|
.
(6)
And (3) follows. Note that equality in (3) forces equality in both (5) and (6). Thus we
have f g 0 and
A|f |p B|g|p C|f + g|p
(where means is proportional to), which forces A0 f = B 0 g for some nonnegative
constants A0 and B 0 , not both zero.
26
We next, all too briefly, consider integral means. Given a positive, integrable weight
function (x), we define
Z
1/p
Mp (f, ) =
f (x) (x) dx
I
for 0 < p < and f 0 (and measurable, say, or even continuous). We also define
Z
M0 (f, ) = exp
(x) log f (x) dx
I
and
M (f ) = sup f (x) ( = ess.sup f (x)).
xI
xI
We could then develop the theory of the means Mp (f, ) in complete analogy to our
previous cases. We will, however, settle for a single observation:
If Ms (f, ) < for some 0 < s < , then Mr (f, ) < for all 0 < r < s and
Mr (f, ) Ms (f, ), with equality if and only if f is constant.
Proof. As before, the proof follows from Holders inequality, applied to p = s/r > 1.
Z
1/r Z
1/r
r
r r/s 1(r/s)
f
=
f
Z
(r/s)(1/r) Z
f
Z
(1/r)(1/s)
1/s
This result should be compared with the non-weighted setting, the so-called Lp -norms
Z
1/p
p
kf kp =
f (x) dx
(0 < p < ),
I
which are incomparable if I has infinite length and which otherwise satisfy
kf kr (b a)(1/r)(1/s) kf ks
if r < s and I = [ a, b ] (to see this, just set 1 in our previous calculation).
27
Problem Set 3A
A few calculus problems
1. Let 1 < p < and let q satisfy 1/p + 1/q = 1; in
other words, (p 1)(q 1) = 1. By examining the
graph of y = xp1 (a.k.a., x = y q1 ), pictured at
right, argue that
Z
ab
p1
y q1 dy,
dx +
28
a + nb
for all n = 1, 2, . . ., with equality if and only if a = b.
n+1
1
1
3
1
+ +
> for x > 1.
6. (i) Show that
x1 x x+1
x
P
(ii) Use (i) to show that the harmonic series n=1 n1 diverges.
5. Show that
n+1
abn
n
X
!
ai
i=1
if a1 = = an .
29
n
X
1
a
i=1 i
!
n2 , with equality if and only
1
1+
x
1
1
1+
1+
for x, y, z > 0, x + y + z = 1.
y
z
because f 00 (x) =
1
x2
1
(1+x)2
1
x
30
Problem Set 3B
Notation: Fix n N and = (1 , . . . , n ), where i > 0 for all i and 1 + + n = 1.
For each nonzero real number t and each x = (x1 , . . . , xn ) with xi > 0, i = 1, . . . , n, we
define the weighted mean Mt (x, ) of order t by Mt (x, ) = (1 xt1 + n xtn )1/t . In the
n
1
remaining cases, t = 0, , we define M0 (x, ) = x
1 xn , M (x, ) = M (x) =
10. Show that the sums St satisfy Minkowskis inequality: For t > 1 we have St (x + y)
St (x) + St (y), where x + y denotes the sequence (x1 + y1 , . . . , xn + yn ). For 0 < t < 1,
we have St (x + y) St (x) + St (y). In every case, equality can only occur if x and y
are proportional. (Of course, equality always holds if t = 1.)
11. Show that S (x + y) S (x) + S (y). When does equality occur in this case?
For 1 p , the expression kxkp = Sp (x) defines a norm on Rn . In other words, kxkp
satisfies: (i) kxkp 0 for all x Rn and kxkp = 0 if and only if x = 0; (ii) kaxkp = |a| kxkp
for any scalar a R; and (iii) kx + ykp kxkp + kykp for any x, y Rn (which is typically
called the triangle inequality in this context). For 0 < p < 1, (iii) reverses, thus kxkp
is not a norm in this case. However, as well see directly, for 0 < p < 1, the expression
d(x, y) = kx ykpp defines a translation invariant metric on Rn .
Pn
Pn
Pn
12. If p 1, show that i=1 (xi + yi )p i=1 xpi + i=1 yip . If 0 < p 1, the inequality
reverses. If p 6= 1, equality can only occur if xi yi = 0 for all i (in other words, xi and
yi cannot both be nonzero).
13. For 0 < p < 1, show that the expression d(x, y) = kx ykpp satisfies: (i) d(x, y) 0
for all x, y Rn and d(x, y) = 0 if and only if x = y; (ii) d(x, y) = d(y, x) for all
x, y Rn ; (iii) d(x, y) = d(x y, 0) = d(x + z, y + z) for all x, y, z Rn ; and
(iv) d(x, y) d(x, z) + d(z, y) for all x, y, z Rn .
32
Stirlings Formula
Recall (from Problem 14, Problem Set 1) that (n/3)n < n! < (n/2)! for n 6. Thus, its
reasonable to ask whether n! (n/a)n for some constant a (where means that the ratio
n!/(n/a)n 1 as n ). If you guessed that a = e, youd be rightbut the precise
order of magnitude (including error estimates) takes a bit more work.
We begin with a clever summation formula, due essentially to Euler.
Euler-Maclaurin Summation. If f has a continuous derivative on [ 1, n ], then
Z n
Z n
n
X
f (k) =
f (x) dx +
x [x] f 0 (x) dx + f (1),
1
k=1
(If you happen to know Stieltjes integration, theres a very short proof! For
simplicity, well settle for a slightly longer but still elementary proof.) We begin with the
error:
n1
X
Z
f (k)
f (x) dx =
1
k=1
n1
X Z k+1
k=1
f (k) f (x) dx.
(Please note that the upper limit on the sums is now n 1.) We are going to integrate by
parts on the right-hand side, using the variables u = f (k) f (x) and v = x k 1 (!).
With this choice well have u(k) = 0 = v(k + 1), which will simplify things considerably.
Watch closely!
n1
X
k=1
Z
f (k)
f (x) dx =
1
n1
X Z k+1
k=1
n1
X Z k+1
k=1
n
Z
=
x k 1 f 0 (x) dx
x [x] 1 f 0 (x) dx
x [x] f 0 (x) dx + f (1) f (n).
y = x [x]
1/2
y=
Corollary.
n
X
Z
f (k) =
x [x] 1/2 f 0 (x) dx + [ f (1) + f (n) ]/2.
f (x) dx +
1
k=1
1
n
Z
Corollary. log( n! ) = (n + 1/2) log n n + 1 +
1
log( n! ) =
Proof.
n
X
x [x] 1/2
x
x [x] 1/2
dx.
x
log k
k=1
n
1
x [x] 1/2
dx + log n
x
2
1
1
Z n
x [x] 1/2
= (n + 1/2) log n n + 1 +
dx.
x
1
Z
log x dx +
(1)
We next show that (an ) converges using a standard technique from advanced calculus.
Z
[x]
Z
1
1
x [x] 1/2 dx
x [x] 1/2 dx = .
8
1/2
x [x] 1/2
dx exists. This proves that (an ) converges to a finite limit and,
x
1
hence, that bn = ean converges to a positive, finite limit C; that is, weve shown that
Z
Thus,
n! en
n nn+1/2
C = lim
34
exists.
This result is essentially due to DeMoivre in 1730. Finding the precise value of C, together
with some quantitative error estimates, will take a bit more work.
The Gamma Function
We begin with a continuous extension of n! initiated by an observation of Eulers:
Z 1
Z
n
( log x) dx =
tn1 et dt.
n! =
0
x > 0.
Its not hard to see that (x) < for all x > 0. Indeed, given x > 0, we can find
t0 = t0 (x) sufficiently large so that tx1 et et/2 for all t t0 . Thus,
Z t0
Z
tx
x1
(x)
t
dt +
et/2 dt = 0 + 2et0 /2 < .
x
0
t0
We next show that (x) is continuousby showing that its convex!
Theorem. (i) (1) = 1.
(ii) (x + 1) = x(x) hence, (n + 1) = n!
(iii) is log-convex; that is, log (x) is convex.
(iv) is convex, hence continuous.
Proof. (i) is clear:
et xtx1 dt
t=0
0
Z
= x
tx1 et dt = x(x).
0
35
Z
=
tx1 et
i h
ty1 et
dt
(x) (y) ,
and it follows that log is convex. Finally, (iv) is a general principle: Every log-convex
function is also convex. This follows from Youngs inequality:
(x + y) (x) (y) (x) + (y).
Corollary. As x 0+ we have x(x) 1 and, hence, (x) +.
Proof. x(x) = (x + 1) (1) = 1 as x 0+ . It follows that (x) =
x(x)
+ as
x
x 0+ .
We next prove a remarkable characterization due to Artin in 1964.
Theorem. Suppose that f : (0, ) (0, ) satisfies (i) f (1) = 1; (ii) f (x + 1) = xf (x);
and (iii) f is log-convex. Then f = .
Proof. Of course, by (i) and (ii), f (n + 1) = n! for n = 0, 1, 2, . . .. Next, given 0 < x 1,
we use (ii) and (iii) to estimate f (n + 1 + x). Watch closely!
f (n + 1 + x) = f (n + 1 + x nx x + nx + x)
= f (1 x)(n + 1) + x(n + 2)
f (n + 1)1x f (n + 2)x
x
= f (n + 1)1x (n + 1)f (n + 1)
= (n + 1)x f (n + 1)
= (n + 1)x n!
36
Also,
n! = f (n + 1) = f x(n + x) + (1 x)(n + 1 + x)
f (n + x)x f (n + 1 + x)1x
= (n + x)x f (n + 1 + x)x f (n + 1 + x)1x
= (n + x)x f (n + 1 + x).
That is, f (n + 1 + x) (n + x)x n!.
Now we use (ii) to write these inequalities in terms of f (x).
f (n + 1 + x) = (n + x) f (n + x)
= (n + x)(n 1 + x) f (n 1 + x)
..
.
= (n + x)(n 1 + x) x f (x).
Thus,
(n + x)x n! (n + x)(n 1 + x) x f (x) (n + 1)x n!
or, after dividing by nx n!,
(n + x)(n 1 + x) x
x x
f (x)
1+
n
nx n!
1
1+
n
x
.
f (x) = lim
for 0 < x 1. We next show that (2) holds for x > 1 as well.
Given x > 1, there is a positive integer m with 0 < x m 1. Then
f (x) = (x 1)(x 2) (x m) f (x m)
nxm n!
n (n + x m) (x m)
= (x 1)(x 2) (x m) lim
nxm n!
= lim
n (n + x m) x
37
(2)
nx n!
(n + x) (n + x + (m 1))
n (n + x) x
nm
nx n!
= lim
n (n + x) x
(3)
= lim
because the second factor in (3) consists of m terms each, top and bottom, and m and x
are fixed ; thus, the limit of this factor is 1 as n . Thus, (2) holds for all x > 0. In
other words, there can be only one function satisfying the hypotheses of the theorem.
nx n!
.
n (n + x)(n 1 + x) x
n1/2 n!
. But we can also find this
n (n + 1/2)(n 1/2) (1/2)
1/2
d t
Z
= 2
eu du =
n! en
b > 0. Thus,
nn+1/2
b2n
(n!)2 e2n (2n)2n+1/2
=
b.
b2n
n2n+1
(2n)! e2n
But
22n n!
22n n!
=
(2n)!
(2n)(2n 1)(2n 2) 3 2 1
=
n!
n(n 1/2)(n 1) (3/2) 1 (1/2)
1
.
(n 1/2)(n 3/2) (3/2) (1/2)
So,
b2n
n1/2 n!
n + 1/2
=
2 2 (1/2) = 2.
b2n
(n + 1/2)(n 1/2) (3/2) (1/2)
n
Thus, weve finally arrived at Stirlings Formula.
lim
n! en
=
2
nn+1/2
2 nn+1/2 en .
Stirlings Series
Finally, lets talk about error estimates. In addition to the qualitative statement that
n! en
1
2 nn+1/2
as
n ,
A
B
C
3 + 5
n
n
n
for certain constants A, B, etc., where means that rn lies between successive partial
sums of the (divergent) alternating series (called Stirlings series). For example,
B
A
A
3 < rn <
n
n
n
(n = 1, 2, 3, . . .)
(n + 1)! en+1
2 nn+1/2
rn+1 rn = log
n! en
2 (n + 1)n+1+1/2
1
= 1 (n + 1/2) log 1 +
< 0
n
for all n. For this, it suffices to show that
1
1
log 1 +
> 0
n
n + 1/2
39
for all n. To this end, notice that the function g(x) = log(1 + 1/x) 1/(x + 1/2) tends to
0 as x . If we can show that g is decreasing, well know that g(x) > 0. But
g 0 (x) =
1
1
1
1
= 2
2
<0
2
(x + 1/2)
x(x + 1)
x + x + 1/4 x + x
for x > 0. Thus weve shown that (rn ) decreases to 0 and, hence, that rn > 0 for all n.
We next find a constant A > 0 so that rn < A/n for all n. For this, it suffices to show
that an = rn (A/n) < 0 for all n. But an 0, so it suffices to find A such that (an ) is
increasing. That is, we want
1
= 1 (n + 1/2) log 1 +
n
an+1 an
+A
1
1
n n+1
> 0
g(x) =
1
1
log 1 +
x + 1/2
x
1
x
+A
1
x+1
x + 1/2
> 0
for all x > 0. Again, we know that g(x) 0 as x , so it would suffice to find A so
that g 0 (x) < 0 for all x > 0. Hang on!
2
g(x) =
log
2x + 1
x+1
x
+
2A
x(x + 1)(2x + 1)
x(x + 1)
= h(x),
12x2 + 12x + 2
x > 0.
(4)
2x + 1
> 0
2(6x2 + 6x + 1)2
for x > 0. That is, h(x) strictly increases to 1/12 as x , so A = 1/12 is the smallest
possible upper bound in (4). Because this is such a roundabout proof, lets summarize what
weve done: A = 1/12 = g 0 < 0 = g > 0 = (an ) increasing = an < 0 =
rn < 1/12n. Phew!!
Conclusion. For all n = 1, 2, 3, . . ., we have 1 <
40
n! en
< e1/12n .
2 nn+1/2
Problem Set 4
Notation: A positive function f : I (0, ) defined on an interval I is said to be log
convex if log f is convex. Equivalently, f is log-convex if f (x + y) f (x) f (y)
whenever x, y I, , 0, + = 1. From Youngs inequality, a log-convex function is
also convex; thus, the log-convex functions form a subset of the convex functions.
1.
for
1
n+1
n
( 2 (n + 2))
n = 1, 2, 3, . . ..
Z
3
5. Evaluate
t1/2 et dt.
0
(2n)!
1
6. Show that n +
=
for n = 0, 1, 2, . . ..
2
n! 4n
2n
22n
7. Show that
. [Recall that an bn means that an /bn 1.]
n
n
22 44
(2n)(2n)
= lim
.
n
2
13 35
(2n 1)(2n + 1)
[Hint: /2 = (1/2)((1/2))2 .]
x x + 1
= x1 (x).
9. Prove Legendres duplication formula:
2
2
2
[Hint: Show that f (x) = (2x1 / )(x/2)((x + 1)/2) satisfies the hypotheses of
Artins theorem.]
Z
10. Consider f (x) =
tx1
dt for 0 < x < 1. [Euler showed that f (x) = / sin(x).]
1+t
0
R
(a) Use the fact that (1 + t)1 = 0 e(1+t)s ds to show that f (x) = (x)(1 x).
41
Pn
k=1
xk y k .
Rb
a
f (x) g(x) dx
n
X
!1/2
|xk |2
on Cn
k=1
and
Z
kf k =
a
!1/2
|f (x)|2 dx
Z
or
!1/2
|f (x)|2 w(x) dx
on C[ a, b ].
A key step in verifying that equation (1) defines a norm is given by:
42
(1)
The Cauchy-Schwarz Inequality. In any inner product space, h x, y i kxk kyk, with
equality if and only if x and y are linearly dependent (that is, parallel).
In terms of our initial examples,
n
X
xk y k
k=1
!1/2
n
X
|xk |
k=1
n
X
!1/2
|yk |
k=1
and
Z
b
f (x) g(x) dx
a
|f (x)|2
!1/2 Z
!1/2
|g(x)|2
Proof. By our earlier remarks, we may assume that x, y 6= 0. Now, given C, note
that
0 kx yk2 = h x y, x y i
= kxk2 h x, y i h x, y i + ||2 kyk2
= kxk2 2 Re ( h x, y i) + ||2 kyk2 .
In particular, setting = h x, y i/h y, y i = h x, y i/kyk2 , leads to
0 kxk2 2
|h x, y i|2
|h x, y i|2
|h x, y i|2
2
2
+
kyk
=
kxk
.
kyk2
kyk4
kyk2
We leave the case for equality as an exercise (just examine the proof closely).
| Reh x, y i|
|h x, y i|
1. Thus
kxk kyk
kxk kyk
we can (and will!) define the angle between (nonzero) vectors x and y by declaring
It follows from the Cauchy-Schwarz inequality that
cos =
Reh x, y i
.
kxk kyk
(2)
h x, x i defines a norm on X.
At the risk of a bit of repetition, lets take another look at the proof of the CauchySchwarz inequality.
Lemma. Let x, y X with y 6= 0 and let = h x, y i/h y, y i. Then:
(i) (x y) y and, of course, x = (x y) + y. Thus, x can be written as the sum
of a vector orthogonal to y and a vector parallel to y.
(ii) kxk2 = kx yk2 + ||2 kyk2 . In particular, note that kxk kx yk.
(iii) kx yk < kx yk for all 6= . Thus, y = y is the unique point in span y
nearest to x; it is characterized by the requirement that (x y ) span y.
Proof. (i) follows by design; that is, is chosen to satisfy h xy, y i = 0. (ii) follows easily
from (i) and the Pythagorean theorem. Likewise, (iii) follows from (i) and the Pythagorean
theorem. Indeed, from (i) and the linearity of the inner product, it follows that x y is
perpendicular to every multiple of y; thus, given C, we have
kx yk2 = k(x y) + ( )yk2
= kx yk2 + | |2 kyk2
> kx yk2
unless = .
44
h z,ei i
i=1 h ei ,ei i
Pn
ei .
h x,ej i
j=1 h ej ,ej i
Pn
h x,ei i
i=1 h ei ,ei i
Pn
ei is orthogonal to E.
|h x,ej i|2
j=1 h ej ,ej i
Pn
Pn
i=1
h z, ek i =
= ke k2 .
i ei , then
n
X
i h ei , ek i = i h ei , ei i,
i=1
h x,ei i
i=1 h ei ,ei i
Pn
ei . A similar calculation
then yields
h y, ek i = h x, ek i h x, ek i = 0,
and it follows that y is orthogonal to every vector in E. Parts (iii) and (iv) are now almost
immediate. Indeed, from (ii), x e is orthogonal to E. Thus, given any vector e E we
have e e E and, hence,
kx ek2 = k(x e ) + (e e )k2 = kx e k2 + ke e k2 > kx e k2 ,
45
1
2
R 2
0
2
i(mn)x
dx =
0,
if m 6= n
2, if m = n.
In the case of real scalars, real-valued functions, and the inner product h f, g i =
R
1 2
f
(x)
g(x)
dx,
the
vectors
{1/
2, cos x, sin x, . . . , cos nx, sin nx} are orthonormal.
0
We next show how the ideas developed in our first two lemmas can be used to construct
orthogonal sequences.
The Gram-Schmidt Process. Let E be a subspace of X with basis {x1 , . . . , xn }. Then
we can find an orthogonal basis {e1 , . . . , en } for E satisfying
(i) span{e1 , . . . , ek } = span{x1 , . . . , xk } for k = 1, . . . , n, and
(ii) 0 < kek k kxk k for k = 1, . . . , n.
Proof. To begin, set e1 = x1 6= 0 and let e2 = x2
h x2 ,e1 i
h e1 ,e1 i
e1 . Because x2
/ span{e1 } =
span{e1 } = span{x1 }. In particular, e1 and e2 are linearly independent and it follows that
span{e1 , e2 } = span{x1 , x2 }. To see that e2 satisfies (ii), notice that x2 = e2 + e1 and,
hence, kx2 k2 = ke2 k2 + ||2 ke1 k2 ke2 k2 .
Continue by induction: Assuming that {e1 , . . . , ek } have been chosen, set ek+1 =
Pk h x ,ej i
/ span{x1 , . . . , xk } = span{e1 , . . . , ek }, we have
xk+1 j=1 h k+1
ej ,ej i ej . Because xk+1
ek+1 6= 0. Also, span{e1 , . . . , ek , ek+1 } span{x1 , . . . , xk , xk+1 }. But, as before, ek+1
is orthogonal to span{e1 , . . . , ek } and, hence, {e1 , . . . , ek , ek+1 } are linearly independent.
Thus we must have span{e1 , . . . , ek , ek+1 } = span{x1 , . . . , xk , xk+1 }. Finally, kxk+1 k2 =
Pn
kek+1 k2 + j=1 |j |2 kej k2 kek+1 k2 .
Clearly, once we have an orthogonal basis, we simply normalize to arrive at an orthonormal basis (alternatively, by slightly altering the process outlined above, we could
construct the vectors ek to have norm one).
Corollary. If E is a finite-dimensional subspace of X, then E has an orthonormal basis
{e1 , . . . , en }.
All of these ideas make very short work of:
The Projection Theorem. Let E be a subspace of X with orthonormal basis {e1 , . . . , en }.
Pn
Define P : X X by P (x) = k=1 h x, ek i ek for x X. Then
(i) P (x) E and (x P (x)) E; hence, kxk2 = kx P (x)k2 + kP (x)k2 .
(ii) P (x) is the unique nearest point to x in E.
(iii) P is a projection; that is, P 2 = P .
(iv) P is linear, continuous, and satisfies kP (x)k kxk for all x X.
P is called the orthogonal projection onto E or, sometimes, the nearest point map on
E. It follows from (ii) that P is actually independent of the choice of basis.
47
Hadamards Inequality
As an application of the ideas in this section, we next present a classical matrix inequality
due to Hadamard in 1893.
Theorem. Let A = [ aij ] be a real or complex n n matrix and let Aj = (aij )ni=1 denote
the j-th column of A. Then
| det A |
n
Y
kAj k =
j=1
n
Y
n
X
j=1
i=1
!1/2
|aij |2
Equality can only occur if one of the columns Aj = 0 or if the columns are orthogonal;
i.e., h Aj , Ak i = 0 for j 6= k.
It is well known that | det A | represents the volume of the parallelepiped with edges
(Aj ). Thus, Hadamards inequality states that the volume is maximal when the edges are
orthogonal.
Proof.
We may certainly suppose that det A 6= 0, in which case the columns (Aj ) are
j1
X
h Aj , bj i
i=1
h bj , bj i
(j 2),
bi
(3)
and kbj k kAj k for all j. In particular, the matrix B = [ b1 bn ] with columns (bj ) can
be obtained from A by elementary column operations. It follows that det B = det A.
But the columns of B are orthogonal, so
B B = diag kb1 k2 , . . . , kbn k2
and, hence, det(B B) =
Qn
j=1
| det B |2 . Consequently,
| det A | = | det B | =
det(B B)
1/2
n
Y
j=1
kbj k
n
Y
kAj k.
j=1
Equality can only occur if kbj k = kAj k for all j, which can only occur if bj = Aj for all j;
that is, if and only if the (Aj ) were already orthogonal.
48
Equality would mean that the Aj are orthogonal and it would also mean that kAj k = n,
1
1
1
1
1 1
1A
1A
1 1
AA =
=
1
1 1 1
1 A 1 A
1 1 1
1
of A with itself is
B,
49
Problem Set 5
Throughout, X denotes a finite-dimensional inner product space over C with inner product
p
h, i and with associated norm kxk = h x, x i.
1. Show that h x, y i = kxk kyk if and only if x and y are linearly dependent.
2. Show that kx + yk = kxk + kyk if and only if one of x or y is a nonnegative multiple
of the other.
3. Show that the parallelogram law : kx + yk2 + kx yk2 = 2 kxk2 + kyk2 holds for any
x, y X.
4. Show that the polarization identity: 4h x, y i = kx+yk2 kxyk2 +ikx+iyk2 ikxiyk2
holds for any x, y X.
5. Show that a linear map T : X Y between inner product spaces X and Y is an
isometry (into) if and only if it preserves inner products. That is, kT xk = kxk for
all x X if and only if h T x, T y i = h x, y i for all x, y X. In short, a linear map
preserves distances if and only if it preserves angles.
6. Let E be a subspace of X and let x0 X. Show that the nearest point to x0 in E
can be characterized as the (unique) point y0 E satisfying Re h x0 y0 , y y0 i 0
for all y X.
R1
7. Consider C[1, 1 ] with the inner product h f, g i = 1 f (x) g(x) dx (where we consider
real-valued functions and real scalars).
(i) Apply the Gram-Schmidt process to the linearly independent set {1, x, x2 } to find
the first three Legendre polynomials P1 (x) = 1, P2 (x) = x, and P3 (x) = x2 31 .
nR
o
2
1
(ii) Compute min 1 ex (ax2 + bx + c) dx : a, b, c real .
8. Given a subset A of X, let A = {x X : h x, a i = 0 for all x A}. The set A is
called the orthogonal complement of A. Verify the following for subsets A, B of X.
(a) A is a subspace of X and A A {0}.
(b) A B A B, and A B = A B .
(c) A A and, hence, span(A) A . (In fact, for X finite-dimensional, we
actually have span(A) = A .)
9. Let E be a subspace of X and let P : X X be the orthogonal projection onto E.
Show that I P is the orthogonal projection onto E (where I denotes the identity
map on X).
50
51
X
X
X
am bn
a2m
b2n
<
m
+
n
m=1 n=1
m=1
n=1
(1)
unless one of (an ) or (bm ) is identically zero. The constant is best possible.
Its unusual to see a strict inequality with a sharp bound! Hilbert proved this inequality, with constant 2, in his famous series of lectures on integral equations (roughly,
19041911); his result was first published by Weyl in 1908. The best constant, together
with various generalizations, was provided by Schur in 1911. We will present two proofs,
along with a generalization to integral inequalities.
Our first approach uses the Cauchy-Schwarz inequality in the form:
!1/2
!1/2
X
X
X
xk yk
x2k
yk2
,
kI
kI
(2)
kI
where (xk ) and (yk ) are real sequences and where I is a countable set; in this case,
I = {(m, n) : m, n = 1, 2, 3, . . .}.
This is a relatively straightforward generalization of the Rn version because
(
)
X
X
x2k = sup
x2k : J I, J finite .
kI
kJ
That is, (2) follows from the finite-sum version of Cauchy-Schwarz by showing that all
(finite length) partial sums satisfy (2).
The idea behind our first proof of Hilberts theorem is to take advantage of the symmetry of the summand, relative to m and n. We begin by rewriting:
n
am bn
am m
bn
=
,
m+n
m+n n
m+n m
52
where, if possible, will be chosen so that each of the factors on the right is squaresummable (over the index set I). Applying Cauchy-Schwarz we get:
!2
X a2 m 2 X b2 n 2
X am bn
n
m
.
m
+
n
m
+
n
n
m
+
n
m
m,n
m,n
m,n
(3)
But
X a2 m 2
X
X
1 m 2
m
2
=
am
m+n n
m+n n
m,n
m=1
n=1
and, similarly,
X
X
b2n n 2
1 n 2
2
=
bn
.
m+n m
m+n m
m,n
n=1
m=1
Thus, well have a proof of Hilberts inequality (with some constant) if we can find a finite
bound B such that
1 m 2
B
m+n n
n=1
for all m; that is, B may depend on , but not on m.
Now for > 0 (and any m), the sequence
1
m+n
m 2
n
decreases as n increases, so we
X
f (x) dx <
f (n) <
f (x) dx.
1
n=1
X
1 m 2
1 m 2
1
1
<
dx =
dy.
m+n n
m+x x
1 + y y 2
0
0
n=1
As weve seen (in a slightly different form), this integral exists provided that 0 < 2 < 1
and, in fact, equals (2)(1 2); that is, we can take B = (2)(1 2). In other
words, weve just shown that
X a m bn
< B
m+n
m,n
X
m=1
!1/2
a2m
!1/2
b2n
n=1
But weve also seen that the minimum value of B occurs when = 1/4, in which case
B1/4 = ((1/2)2 = . This proves Hilberts theorem, save the claim that is best possible.
53
Corollary. Let (am ) and (bn ) be nonnegative, let 1 < p < , and let q satisfy
1
p
+ 1q = 1.
Then
X
am bn
< (1/p)(1/q)
m+n
m=1 n=1
!1/p
apm
m=1
!1/q
bqn
n=1
1 m 1/q
m+n n
1/p
1 n 1/p
m+n m
1/q
and apply H
olders inequality, which will lead to the bounds
Z
0
1
1
dy = (1/p)(1/q)
1/p
1+y y
Z
and
0
1
1
dy = (1/q)(1/p).
1/q
1+y y
As it happens, (x)(1 x) = /(sin x) for 0 < x < 1 and so the bound in the
Corollary can be written as /(sin(/p)), which is actually best possible.
The integral version of Hilberts inequality can be proved in essentially the same way
as the discrete version and, in fact, will yield the discrete version as a corollary.
Z
0
f (x) g(y)
dx dy <
x+y
sin(/p)
Z
1
q
1/p Z
1
p
= 1. Then
f (x) dx
1/p
g(y) dy
1/q #1/p
1
1
x
1
=
x+y
x+y y
x+y
"
y
x
!1/p 1/q
and proceed as before. Instead of completing this calculation, lets opt for a slightly more
general theorem:
54
1/p Z
f (x) dx
1/p
g(y) dy
(4)
(5)
Z
K(x, y) f (x) g(y) dx dy =
Z
f (x)
K(x, y) g(y) dy dx
Z
Z
=
f (x)
xK(x, ux) g(ux) du dx
0
0
Z
Z
=
f (x)
K(1, u) g(ux) du dx
0
0
Z
Z
=
K(1, u)
f (x) g(ux) dx du.
We now apply H
olders inequality to the inner integral, using the same change of variable
y = ux:
1/p
Z
f (x) g(ux) dx
|f (x)| dx
1/q
Z
1/q
|g(y)| dy
Thus,
Z
I
K(1, u) u
1/q
Z
du
1/p Z
|f (x)| dx
p
|g(y)| dy
1/q
.
The case for strict inequality in (4) follows from a careful examination of the case for
equality in H
olders inequality (which we will forego). The fact that the two integrals in
(5) are equal is left as an exercise.
Now the integral version of Hilberts inequality can be used to deduce the discrete
version; indeed, because we have
Z m Z
m1
n1
dy dx
1
,
x+y
m+n
55
X
X
X
am bn
<
a2m
b2n
m
+
n
+
1
m=0 n=0
m=0
n=0
unless one of (an ) or (bm ) is identically zero.
We now apply our Corollary to the moment sequence of a square-integrable function.
For this well find it helpful to have a converse to Holders inequality, a result thats useful
in its own right.
Lemma. Let 1 < p < and let
1
p
1
q
an bn C
n=1
X
n=1
56
!1/q
|bn |q
whenever (bn ) is a q-th power summable sequence. Then (an ) is p-th power summable
P
and, moreover, n=1 |an |p C p .
Proof. It suffices to show that
PN
n=1
Dividing by
P
N
n=1
|an |
n=1
1/q
n=1
P
N
n=1
|an |
1/p
C.
Corollary. Let f : [ 0, 1 ] R be square-integrable and nonzero. For each n = 0, 1, 2, . . .,
R1
define an = 0 xn f (x) dx. Then
Z 1
X
2
f (x)2 dx.
an
0
n=0
n=0
Z
f (x)2 dx
1/2
Z
0
n=0
N
X
Z
1/2
1/2
!2
bn xn
dx
n=0
N X
N
X
bm bn
=
f (x)2 dx
m+n+1
0
m=0 n=0
!1/2
Z 1
1/2 X
N
<
f (x)2 dx
b2n
.
0
n=0
57
!1/2
that is the best constant follows from considering the function f (x) = (1 x) 2 (and
letting 0), a calculation we will omit.
Our second proof of Hilberts inequality has the advantage of being entirely elementary,
but has the disadvantage of giving a slightly weaker result (given our present state of
knowledge). The proof, due to Toeplitz in 1910, begins with a simple observation.
i
Lemma.
2
(t ) eint dt =
1
for n Z, n 6= 0.
n
N
X
bn eint .
n=1
Finally, we apply the Cauchy-Schwarz inequality (and the fact that the functions eint
are orthonormal on [ 0, 2 ]) to arrive at:
N N
Z 2
XX a b
i
m n
(t ) f (t) g(t) dt
=
m+n
2 0
m=1 n=1
Z 2
1
|f (t)| |g(t)| dt
2 0
1/2
1/2
Z 2
Z 2
1
1
2
2
|f (t)| dt
|g(t)| dt
2 0
2 0
!1/2 N
!1/2
N
X
X
=
a2m
b2n
.
m=1
n=1
58
Because of the special nature of the representation in our previous Lemma, its not so
clear that this proof will yield the infinite sum version (unless we assume that (am ) and
(bn ) are nonnegative). Moreover, we would be hard pressed to prove strict inequality by
this method alone. Nevertheless, Toeplitzs proof is not only simple, but has the benefit
of amply demonstrating the interplay between discrete and continuous inner product
spaces.
Our final result in this section is Schurs generalization of Hilberts inequality from
1911. Surprisingly, the proof we will give is short, elementary, and does not appeal in any
way to Hilberts inequality.
Lemma. Let ck be a complex number, let k be an integer, and let 0 < < 1. Then
n
Z 2 X
n
X c
k
ik x
ck e
dx 2 sin
.
k
0
k=1
k=1
Theorem. Let am and bn be complex numbers and let 0 < < 1. Then
!1/2 N
!1/2
N X
N
N
X
X
X
a
b
m n
b2n
.
a2m
m
+
n
sin
n=1
m=1
m=1 n=1
Proof. This is a simple matter of applying the Cauchy-Schwarz inequality and appealing
to our previous Lemma. First, we essentially repeat a calculation from Toeplitzs proof:
!1/2 N
!1/2
Z 2 X
N
N
N
X
X
X
1
a2m
b2n
.
(6)
am eimx
bn einx dx
2 0
m=1
m=1
n=1
n=1
2 sin
2
m+n
m=1 n=1
(7)
(by taking ck = am bn and k = m + n in our previous Lemma). Combining (6) and (7)
completes the proof.
60
Hardys Inequality
In his search for a new proof of Hilberts double series theorem, Hardy discovered the
following inequalities (in 1920, but without the best constant in Theorem 1; this was later
rectified by Landau in 1926).
Theorem 1. Let 1 < p < , let (an ) be nonnegative and p-th power summable, and let
An = a1 + a2 + + an . Then
p
X
An
n=1
<
p
p1
p X
apn
(1)
n=1
1
0
0
unless f is identically zero. The constant is best possible.
Our proof of Theorem 1 is due to Elliot (1926). Our proof of Theorem 2 is (essentially)
Hardys original proof.
It may help to outline the heuristics that led Hardy to Theorem 1. Hardys approach
to the double series theorem was to treat the sums above and below the diagonal as
essentially identical; that is, he reasoned that it would suffice to examine
X
n
X
n
X
X
X
An
am bn
am bn
=
bn .
m+n
n
n
n=1 m=1
n=1
n=1 m=1
(3)
Now the conclusion of Hilberts theorem is, essentially, that the sum on the left converges
P
P
whenever m apm and n bqn converge (this by way of an application of Holders inequality).
P
But, Hardy reasoned, perhaps the convergence of m apm already implies the convergence
61
of
n (An /n)
of (3) would lead to a more direct proof of Hilberts theorem; whence Theorem 1.
Proof of Theorem 1. The proof begins with an observation that will help us establish
strict inequality: We may suppose that a1 > 0. Indeed, if the theorem has been proved in
that case, then it will also hold for a sequence (bn ) with b1 = 0 for, in this case, setting
an = bn+1 , we would have
p
p
a p a + a p
b2 + b3
b2
1
1
2
+
+ =
+
+
2
3
2
3
a p a + a p
1
1
2
+
+
1
2
p X
p X
p
p
p
an =
bpn .
<
p1
p
1
n
n
We now set xn = An /n and estimate:
xnp
p
p
xnp1 an = xnp
{nxn (n 1)xn1 } xnp1
p1
p1
(n 1)p p1
np
xnp +
xn xn1
= 1
p1
p1
(n 1)p
p1
1 p
np
p
p
1
xn +
xn + xn1
p1
p1
p
p
1
p
=
(n 1)xn1
nxnp ,
p1
(4)
where weve used Youngs inequality in (4). Next we sum over n = 1, . . . , N , noting that
the terms on the right, above, will telescope, and conclude that
N
X
n=1
xnp
N
p X p1
N xnp
xn an
0.
p 1 n=1
p1
That is,
N
X
xnp
n=1
N
p X p1
x
an .
p 1 n=1 n
Applying H
olders inequality then yields
N
X
n=1
xnp
p
p1
N
X
!11/p
xnp
n=1
62
N
X
n=1
!1/p
apn
(5)
Collecting terms and raising both sides to the p-th power gives us the finite version of our
result (but with a weak inequality):
N
X
xnp
n=1
p
p1
p X
N
p
p1
p X
apn .
n=1
It follows that
X
n=1
xnp
apn ,
(6)
n=1
where both sides of the inequality are finite. Finally, to see that we actually have strict
inequality, note that equality in (5) would force (xnp ) and (apn ) to be proportional. But
we took a1 = x1 > 0, so the constant of proportionality would have to be 1. That is, we
would have an = An /n for all n, which can only occur if (an ) is constant. (And this is
consistent with equality in (4), which would force (xn ) to be constant.) This is obviously
inconsistent with the fact that (an ) is p-th power summable; thus, (6) (and (1)) actually
holds with strict inequality.
To see that {p/(p 1)}p is the best constant, one approach is to consider the sequence
1
an = n p and let 0. Its a straightforward estimate, but well skip the details.
We next give (what is essentially) Hardys proof of Theorem 2. It is very similar in
spirit (if not in detail) to the proof weve just given for Theorem 1.
Proof of Theorem 2. As in the proof of Theorem 1, we will first establish a slightly weaker
version of (2). In particular, we will first show that
p
p Z T
Z T
p
F (x)
dx
f (x)p dx
x
p1
0
0
(20 )
R
for all T sufficiently large. To this end, note that if f is not identically zero, then 0 f p > 0
RT
and, hence, 0 f p > 0 for all T sufficiently large. (This expression is analogous to the sum
PN
p
n=1 an , and we will need to divide by it later in the proof, much as we did in the proof
of Theorem 1.)
63
Next, note that if f is p-th power integrable, then f is integrable over [ 0, x ] for any
x > 0 and, moreover,
Z
F (x) =
p
Z
p1
f (t) dt x
f (t)p dt
1
0
0
p1
Z T
p
1
F (x)
=
f (x) dx
T 1p F (T )p
(7)
p1 0
x
p1
p1
Z T
F (x)
p
f (x) dx
(8)
p1 0
x
!1/p Z
p !11/p
Z T
T
F
(x)
p
f (x)p dx
dx
,
(9)
p1
x
0
0
where (in (7)) weve used the fact that x1p F (x)p 0 as x 0+ and (in (8)) the fact that
F 0. (Note that if (F (x)/x)p is integrable over [ 0, ), as the theorem suggests, then we
would expect to have x(F (x)/x)p 0 as x . Thus, dropping the term T 1p F (T )p is
unlikely to cause any problems.)
Weve been here before!
RT
0
different powers. Divide, then raise both sides to the p-th power to arrive at (2):
p
p Z T
Z T
F (x)
p
dx
f (x)p dx.
x
p
1
0
0
Because this inequality holds for all T sufficiently large, weve proved that
p
p Z
Z
F (x)
p
dx
f (x)p dx,
x
p1
0
0
(10)
which is very nearly the conclusion we were hoping for. All thats missing is strict inequality, which follows from a closer examination of our application of Holders inequality
64
in (9). Indeed, equality in (10) would force equality in (9), which in turn would force
(F (x)/x)p and f (x)p to be proportional. But this would force f to be a power of x, which
is inconsistent with the assumption that f is p-th power integrable over [ 0, ). (If we
Rx
assume for the moment that f is continuous, then the equation xf (x) = C 0 f would
imply that f is differentiable and satisfies the differential equation xf 0 (x) = (C 1)f (x),
which has solution f (x) = BxC1 .)
As with Theorem 1, the proof that the constant {p/(p 1)}p is best would follow from
1
65
1/p
1/p
1/p p
X
p
a1 + a2 + + an
an .
n
p 1 n=1
n=1
(n)
Note that the summand on the left can be written as M1/p (a), where the super(n)
script (n) is a reminder that were summing only n terms. Recall, too, that M0 (a) =
(n)
(n)
(a1 a2 an )1/n M1/p (a); in fact, if we let p (with n still fixed), M1/p (a) decreases
(n)
to M0 (a) while the constant on the right tends to e. Thus, we have another name
inequality, due to Carleman in 1923 (but without a proof of strict inequality):
Carlemans Inequality. For (an ) nonnegative,
1/n
(a1 a2 an )
<e
an
(1)
n=1
n=1
provided that (an ) is not identically zero. The constant is best possible.
An elementary yet elegant proof of Carlemans inequality, due to Polya in 1926, is
outlined in the exercises. Well give a second proof shortly. The proof that the constant e
is best possible will be left for another day.
There is an integral analogue of Carlemans inequality. To find it, we first rewrite (1)
P
P
P
1
log
a
e
as
exp
k
k=1
n=1 an , which suggests:
n=1
n
Knopps Inequality. If f : [ 0, ) (0, ) is integrable, then
Z x
Z
Z
1
exp
log f (t) dt dx < e
f (x) dx
x 0
0
0
unless f is identically zero (and we interpret exp(log(0)) = 0).
Curiously, Knopps original inequality (from 1928) was stated with 1 as the lower
limit of integration (in all instances), and that inequality is actually false! It fails for the
66
function f (x) = 1/x2 , which can be verified by direct calculation. Nevertheless, Knopps
original proof can be used to prove the corrected inequality.
Well give two proofs of Knopps inequality. First, though, well take a short detour
to prove the integral version of Jensens inequality, which is not only of interest in its own
right, but which is also needed for Knopps proof.
Jensens Inequality. Let f , w : I R be integrable functions with w(x) 0 and
R
w(x) dx = 1. Let : J R be a convex function defined on an interval J containing
I
the range of f . Then
Z
f (x) w(x) dx
Proof. Let =
R
I
the graph of at . (Recall that we have T (x) (x) for all x and T () = ().) Then
() (f (x)) m( f (x)) for all x I. Multiplying by w and integrating over I then
R
leads to: () I (f (x)) w(x) dx m( ) = 0.
If we set (x) = ex , we have the following integral analogue of the arithmeticgeometric mean inequality:
R
Corollary. Let f , w : I (0, ) with I w(x) dx = 1. Then
Z
Z
exp
log(f (x)) w(x) dx
f (x) w(x) dx.
I
f (t) dt
f (t) dt.
ba a
ba a
67
Rb
a
f (t) dt
m+1 Z n+1
1
dy dx
>
.
x+y
m+n+1
To see how this follows from Jensens inequality, again note that the function h(x) =
1/(c + x) is strictly convex for x > c. Thus,
Z n+1
1
dy
1
>
.
=
R n+1
x+y
x + n + 1/2
x+
y dy
n
n
Similarly,
Z
m+1
dx
1
1
=
>
.
R m+1
x + n + 1/2
m+n+1
n + 1/2 + m x dx
One more easy calculation and well be ready to prove Knopps inequality.
1
Lemma. Let f : [ 0, ) (0, ) be integrable and define g(x) = 2
x
Z
Z
f (x) dx =
g(x) dx.
0
Proof of Knopps inequality. Recall that an antiderivative for log x is x log x x (which
vanishes at 0). Thus,
Z x
Z x
Z
1
1 x
1
log(f (y)) dy = exp
log(yf (y)) dy
log y dy
exp
x 0
x 0
x 0
Z x
e
1
= exp
log(yf (y)) dy
x
x 0
Z
e 1 x
exp log(yf (y) dy
x x 0
Z x
e
= 2
yf (y) dy,
x 0
(2)
where, in (2), weve applied Jensens inequality. An appeal to our previous Lemma yields
Z x
Z
Z
1
exp
log f (t) dt dx e
f (x) dx.
(3)
x 0
0
0
Now equality in (3) would force equality in (2) for all x, which would in turn force log(yf (y))
to be constant. If f is not identically zero, this would mean that f (y) = c/y for some
positive constant c. This is clearly inconsistent with the fact that f is integrable over
(0, ). Thus, we have strict inequality in (3) unless f = 0.
We next present an inequality due to Lennart Carleson (1954) which is at least partly
related to Jensens inequality, through its style of proof, and which is at least partly related
to Hardys inequality, in a way that will become apparent later. Moreover, it will yield
both Carlemans inequality and Knopps inequality as corollaries.
Carlesons Inequality. Let : [ 0, ) R be a differentiable convex function with
(0) = 0. Then, for any 1 < < , we have
Z
Z
(x)
+1
x exp
dx e
x exp( 0 (x)) dx.
x
0
0
(4)
(5)
which we will write as: (py) (y) (p 1)y0 (y). Now we estimate the left-hand
side of (4) using (5) and H
olders inequality with q = p/(p 1), the conjugate to p.
Z A
Z A/p
(x)
(py)
+1
x exp
y exp
IA =
dx = p
dy
x
py
0
0
Z A
(y) (p 1)y0 (y)
+1
p
y exp
dy
py
0
Z A
0 (y)
(y)
/q
+1
/p
y
exp
dy
=p
y
exp
py
q
0
(Z
)1/q
A
1/p
p+1 IA
y exp( 0 (y)) dy
1/p
We now divide by IA and raise both sides to the q-th power to arrive at
Z A
+1 Z A
(x)
p/(p1)
x exp
dx p
x exp( 0 (x)) dx.
x
0
0
To finish the proof, we first let A and then let p 1+ or, equivalently, let q .
Written in terms of q we have
q
q
1
q
p/(p1)
= 1+
e
p
=
q1
q1
as q .
(x)
(n)
1X
=
log(1/ak ).
x
n
n
k=1
70
(6)
Equality in (6) for all n and x would force (ak ) to be constant, which is plainly inconsistent
with the summability of (ak ). Thus, we must have strict inequality in (6), at least for
certain n and x (which, by the continuity of , will be good enough for our purposes).
Finally, notice that
n
Y
!1/n
ak
k=1
(n)
= exp
n
(x)
exp
dx,
x
n1
Z
where the inequality is strict, at least for certain values of n, per our discussion of (6).
Summing this inequality and applying Carlesons inequality leads to
!1/n
Z
Z
X
Y
X
(x)
0
ak
<
exp
dx e
exp ( (x)) dx = e
an ,
x
0
0
n=1
n=1
k=1
where, again, strict inequality follows from our remarks concerning (6). That is, we have
Carlemans inequality for decreasing sequences (an ). But that hardly matters:
Claim: If (ak ) is decreasing and if (bk ) is any rearrangement of (ak ), then b1 b2 bn
a1 a2 an for all n.
Indeed, if n is fixed, we may suppose that b1 , b2 , . . . , bn have been arranged in decreasing order, in which case its clear that b1 a1 , b2 a2 , and so on. Thus, b1 b2 bn
a1 a2 an . It follows that
!1/n
X
Y
X
bk
n=1
k=1
n=1
n
Y
!1/n
ak
k=1
< e
X
n=1
an = e
bn .
n=1
In light of our discussion of decreasing sequences and our knowledge of Knopps result,
this Corollary raises a natural question: If f is decreasing, and if g is a rearrangement
of f (whatever that might mean), does it follow that
Z x
Z x
Z
Z
1
1
log g(t) dt dx
exp
log f (t) dt dx ?
exp
x 0
x 0
0
0
72
Problem Set 6
1. (a) Given positive weights w1 , . . . , wn , show that
! n
!
!2
n
n
X
X
X
1
a2k wk .
ak
wk
k=1
k=1
k=1
(b) If we set wk = wk (t) = t + k 2 /t for t > 0, show that the first factor on the right
is bounded above by /2.
)
( n
X
2
(c) Show that: min
ak wk (t) : t > 0 = 2
!1/2
a2k
k=1
k=1
n
X
n
X
!4
ak
k=1
n
X
!
a2k
k=1
!1/2
n
X
k 2 a2k
k=1
n
X
!
k 2 a2k
. This is known as
k=1
X
X
X
am bn
< 4
a2m
b2n
.
max{m, n}
m=1 n=1
m=1
n=1
[Hint: Mimic either of the proofs we gave for Hilberts inequality (with p = 2). In the
case of the homogeneous kernel approach, take K(x, y) = [max{x, y}]1 and note
i1
R h
that 0
u max{1, u}
du = 4.]
3. Let (an ) be a nonnegative sequence and let (bn ) denote a rearrangement of (an ). If (bn )
p P
p
P
is decreasing, show that n=1 Ann n=1 Bnn , where An = a1 + a2 + + an
and Bn = b1 +b2 + +bn . [Hint: Argue, for example, that the left-hand side increases
if a pair of out of order terms ai < aj , where i < j, are swapped.] Conclude that it
suffices to prove Hardys inequality (for series) in the case where (an ) is decreasing.
4. Let (an ) be nonnegative and decreasing, and define f (x) = an for n 1 x < n.
R
Rx
P
Then n=1 apn = 0 f (t)p dt. As usual, we define F (x) = 0 f (t) dt.
(i) Show that F (x)/x decreases from An /n to An+1 /(n + 1) over n < x < n + 1 and
R
P
conclude that 0 (F (x)/x)p dx n=1 (An /n)p .
(ii) Deduce that Hardys series inequality for (an ) follows from his integral inequality
for f . Thus, from 3, the integral inequality implies the series inequality for any
nonnegative (not necessarily decreasing) sequence.
73
X
X
c1 a1 c1 a2 cn an
1/n
(a1 a2 an )
=
c1 c1 cn
n=1
n=1
(c1 c2 cn )
n=1
1/n
n
1 X
cm am
n m=1
X
1
(c1 c2 cn )1/n .
=
am cm
n
n=m
m=1
(b) If cm = (m + 1)m /mm1 , show that (c1 c2 cn )1/n = 1/(n + 1) and conclude
that
X
X
1
1
1
1/n
(c1 c2 cn )
=
= .
n
n(n + 1)
m
n=m
n=m
1/n
(a1 a2 an )
n=1
m
X
X
X
am cm
1
=
am 1 +
< e
am .
m
m
m=1
m=1
m=1
7. Show that the constant e+1 in Carlesons inequality is best possible. [Hint: Let a >
and define (x) = (a + 1)x log x for x > 1, (x) = 0 otherwise.]
74
wk
k=1
k=1
k=1
(b) If we set wk = wk (t) = t + k 2 /t for t > 0, show that the first factor on the right
is bounded above by /2.
( n
)
X
(c) Show that: min
a2k wk (t) : t > 0 = 2
k=1
n
X
!1/2
a2k
k=1
n
X
!4
ak
n
X
k=1
n
X
!1/2
k 2 a2k
k=1
n
X
a2k
k=1
!
k 2 a2k .
k=1
1
1
ak =
a2k wk
.
ak wk
wk
wk
k=1
k=1
k=1
k=1
t
1
<
dx =
dx = .
=
2
2
2
2
2
wk
t +k
2
0 t +x
0 1+x
k=1
k=1
n
X
a2k wk (t)
= t
k=1
t = (B/A)
1/2
n
X
k=1
a2k
n
B
1 X 2 2
k ak = At + , which is minimized when
+
t
t
k=1
n
X
!2
ak
k=1
n
X 2
ak wk (t) for
2
k=1
!2
ak
n
X
k=1
!1/2
a2k
n
X
!1/2
k 2 a2k
k=1
Squaring both sides completes the solution (except for the claim that the constant 2 is
best possible).
75
n
X
xi yn+1i
i=1
n
X
i=1
xi yi
n
X
xi yi .
i=1
Proof. We only need to prove the second inequality; the first then follows by considering
x and y. To begin, we may suppose that one of the sequences, lets say x, is already in
decreasing order but y is not. Thus, there are a pair of indices j < k such that yj < yk
while xj xk . Now consider:
xj yk + xk yj (xj yj + xk yk ) = (xk xj )(yj yk ) 0.
It follows that the sum
Pn
i=1
x1 y1
x1 + x2 y1 + y2
..
(1)
.
x1 + + xn1 y1 + + yn1
x1 + x2 + + xn = y1 + y2 + + yn
76
.
xj
xk
Proof. Because f is symmetric, it follows that f is Schur convex on (Rn )+ if and only if
f is Schur convex on D = {x : x1 x2 xn 0}.
Given x (Rn )+ , lets agree to write x
= (
x1 , . . . , x
n ) for the vector with entries
x
k = x1 + + xk , 1 k n. With this notation, the majorization x y is equivalent to
77
f(
x)
x
j
for all
1 j < n.
But f(
x) = f (
x1 , x
2 x
1 , . . . , x
n x
n1 ) so, by the chain rule,
0
f(
x)
f (x) xj
f (x) xj+1
f (x) f (x)
=
+
=
x
j
xj x
j
xj+1 x
j+1
xj
xj+1
(2)
f (x) f (x)
xj
xk
for
1j<kn
xj
xk
for
and x D.
x (Rn )+ ,
which is just a way of saying that the the variables must be in the same order as the
corresponding partial derivatives.
By way of an example, lets check that f (x) = x1 x2 . . . xn is Schur concave on (Rn )+ .
Q
In this case, fxj = i6=j xi , and so
(xj xk )(fxj fxk ) = (xj xk )2
xi 0.
i6=j,k
example are, in fact, listed in increasing order of p-norm. In this case, 41/p (2 + 2p )1/p
(2p + 2p )1/p (1 + 3p )1/p 4.
Rather than catalogue more examples, lets turn our attention to a better understanding of the relation x y. This will lead us to another test for Schur convexity.
Whats missing is some justification for the convexity in Schur convexity and, ultimately, its connection with majorization. Filling in the details will take some time, and
will take us on something of a side trip, but the journey will have its rewards.
A Bit of Matrix Theory
We begin with the notion of a permutation matrix. Given a permutation of {1, . . . , n},
the matrix associated with the transformation T (x) = x is called a permutation matrix.
Note that T permutes the basis vectors according to ; i.e., T (ej ) = e(j) , and, hence, its
matrix representation is the corresponding permutation of the columns of I, the identity
matrix. Said in other words, a permutation matrix is a matrix with entries 0 and 1 in
which every row and every column has precisely one nonzero entry. The simplest example
of a permuation matrix is a transposition matrix, which corresponds to a simple exchange,
or transposition, of
1 0
0 1
0 0
0 0
0 0
0 0
swap e2 and e4
1 0
0 1
1
0
0
0
0
0
0
1
0
0
1
0
0
1
.
0
0
It wont surprise you to learn that the set of permutation matrices on Rn forms a group
(under multiplication) that is isomorphic to Sn , the symmetric group on n letters (that is,
the group of all permutations on {1, . . . , n}). In particular, note that every permuation
matrix is invertible and its inverse is again a permutation matrix; indeed, (T )1 = T1 .
We will also need the notion of a transfer matrix, defined by T = (1 )I + T ,
where 0 1 and where T is a transposition matrix. Symbolically, if T exchanges ei
79
T =
1
..
.
1
..
.
..
.
..
.
1
..
.
1
where all other diagonal entries are 1 and all other off-diagonal entries are 0. To understand
the action of T , suppose (for sake of argument) that we have xi > xj ; then T subtracts a
bit of the excess (namely, xi xj ) from xi and transfers it to xj ; that is,
(T (x))k = xk , k 6= i, j,
where (T (x))k denotes the k-th coordinate of T (x). Also notice that we have (T (x))i +
(T (x))j = xi + xj ; thus, x and T (x) are comparable and, in fact, T (x) x because T (x)
is more evenly distributed than x (but it also follows from direct calculation, as well see
shortly). Finally, notice that every transposition matrix (including I) is also a transfer
matrix (by taking = 0 or = 1).
More generally, suppose that were given a convex combination of several permutation
matrices:
D = 1 T1 + + k Tk ,
i 0,
1 + + k = 1.
Now the permutation matrices Ti each have a single 1 in any given row or column, so
their convex combination D must satisfy
n
X
i=1
di,j =
n
X
i = 1
and
i=1
n
X
j=1
di,j =
n
X
j = 1
j=1
for all i and j; that is, every row and every column of D sums to 1. Such a matrix is said to
be doubly stochastic. In other words, an n n matrix is doubly stochastic if di,j 0, for all
80
Pn
i=1
di,j = 1 and
Pn
j=1
i and j. (It follows that di,j 1 for all i, j.) As it happens, every doubly stochastic matrix
can be written as a convex combination of permutation matrices (Birkhoffs theorem), so
our initial supposition is actually the general case.
Now might be a good time to say a couple of words about permutations and transpositions. From the proof of our first Proposition, its not hard to see that any permutation can
be written as the product of finitely many transpositions. To see this, lets first establish
a useful claim:
Claim. Given y Rn , we can write y = Tk Tk1 T1 y for some (finite) sequence of
transposition matrices.
Weve sketched the proof already: If i1 is the index of the largest coordinate of y and
if 1 is the transposition that exchanges i1 with 1, then T1 = T1 moves the largest entry
of y into the first coordinate; that is T1 y now has its largest entry in the first coordinate.
If i2 6= 1 is the index of the second largest coordinate in T1 y and if 2 exchanges i2 and
2, then T2 = T2 moves the second largest entry of T1 y into the second coordinate. That
is, T2 T1 y agrees with y in its first two coordinates. And so on. After at most n 1 such
exchanges, well arrive at y .
Now this argument has very little to do with decreasing rearrangements and everything
to do with matching a specific permutation of the coordinates of y, so weve actually
established:
Claim. Given y Rn and Sn , we can write y = Tk Tk1 T1 y for some (finite)
sequence of transposition matrices. In other words, T = Tk Tk1 T1 or, in still other
words, = k k1 1 . That is, every permutation can be written as the product of
finitely many transpositions.
Were now ready to put some of the pieces together.
81
n
X
j=1
di,j yj =
k
X
xi =
i=1
k X
n
X
di,j yj =
i=1 j=1
n
X
cj yj
where
cj =
j=1
j=1
cj =
k X
n
X
i=1 j=1
82
di,j = k.
di,j .
i=1
Pn
i=1
xi =
Pn
i=1
the fact that D is doubly stochastic implies that 0 cj 1 for all j and
n
X
k
X
yi . For k < n,
xi
k
X
yi =
i=1
n
X
cj yj
j=1
i=1
n
X
k
X
cj yj
j=1
k
X
n
X
yi + yk k
i=1
n
X
cj
j=1
k
X
cj (yj yk ) +
(yk yi )
j=1
k
X
yi
i=1
(yk yi )(1 ci ) +
i=1
n
X
(yi yk )ci 0.
i=k+1
yj
yj (yj yk )
xj
=
=
1
yk
yk + (yj yk )
xk
(and because T fixes all other entries of y). This proves the case N = 2.
Suppose now that (iii) implies (iv) holds whenever x y and x and y differ in N 1 or
fewer coordinates and that were given a pair of (decreasing) sequences with x y where
83
x and y differ in N coordinates. Then, as in the case N = 2, the first place where x and
y differ will have to satisfy xj < yj and, for some k > j well have xk > yk to compensate.
But, in fact, we must have some pair of coordinates j < k which satisfy:
xj < yj ,
xk > yk ,
and xi = yi
(We might have k = j + 1, of course; what matters here is that x and y do not differ in
any intermediate coordinates.)
Because weve assumed that x and y are in decreasing order, this again means that
yj > xj xk > yk . In particular, yj yk > xj xk and well transfer a bit of yj to yk to
arrive at x. However, because x and y might differ in several coordinates, we can no longer
be assured that yj xj = xk yk ; instead, well transfer (a portion of) the smaller of
the two amounts. Specifically, set T = (1 )I + T(j,k) where T(j,k) is the transposition
matrix that exchanges ej and ek , and where = min{yj xj , xk yk }/(yj yk ). If we
let y = T y, then, as above, T acts on only two coordinates of y; in this case,
1
yj
yj (yj yk )
yj min{yj xj , xk yk }
yj
=
=
=
1
yk
yk + (yj yk )
yk + min{yj xj , xk yk }
yk
and yi = yi for i 6= j, k. Its reasonably clear from the definition of y that we have
yj yj xj xk yk yk .
Moreover, because we have either = (yj xj )/(yj yk ) or = (xk yk )/(yj yk ), it
follows that we have either yj = xj (in the first case) or yk = xk (in the second case).
Claim. x y = T y y.
That y = T y y follows from our proof that (ii) implies (iii). What remains is to show
that x y. The key to this is the fact that yj + yk = yj + yk . Indeed, we have
x1 + + xi y1 + + yi = y1 + + yi
if
x1 + + xi y1 + + yi y1 + + yi
if j i < k.
84
1i<j
or k i n;
(4)
(5)
That (4) holds is immediate; that (5) holds follows from the facts that xj yj and yi = yi
for j < i < k.
In summary, we shown that there is a transfer matrix T such that x y = T y y
and such that x and y differ in at most N 1 coordinates. By our induction hypothesis, there exist finitely many transfer matrices T1 , . . . , Tk such that x = T1 Tk (
y) =
T1 Tk (T y) = (T1 Tk T )(y). Thus, weve shown that (iii) implies (iv).
Finally, the proof that (iv) implies (i) follows by direct calculation and the observation
that if T is a transfer matrix, then T (y) = (1 )y + y is a convex combination of
permutations of y. (In other words, if we write H for the set of all convex combinations
of the vectors y , then for any transfer matrix T we have T (H) H.)
Corollary. : (Rn )+ R is Schur convex if and only if (T (x)) (x) for every
transfer matrix T if and only if (D(x)) (x) for every doubly stochastic matrix D.
Schurs Majorization Inequality. If f : I R is convex, then the function : I n R
defined by
(x1 , . . . , xn ) =
n
X
f (xi )
i=1
f (xi )
i=1
n
X
f (yi )
i=1
whenever x y.
Proof.
The proof is virtually immediate from the preceding Corollary and (3). Indeed,
if T is a transfer matrix that acts only on the j-th and k-th coordinates, and if we write
x
= T (x), then, from (3), we have x
j = (1 )xj + xk , x
k = xj + (1 )xk , and x
i = xi
for i 6= j, k. Its then immediate from the convexity of f that (T (x)) (x).
Its clear that if (x) is Schur convex and if h(x) is increasing, then h((x)) is again
Schur convex. This explains why f (x) = kxkp is Schur convex for 1 p < . The
85
function (x) =
Pn
i=1
xpi is Schur convex (from the previous Corollary) and h(x) = x1/p
is increasing.
By way of another example, we present a classical inequality, due to Muirhead (1903),
who introduced the transfer method.
Muirheads Theorem. Let a1 , . . . , an be positive. For x Rn with x 0 define
(x) =
1
2
n
ax(1)
ax(2)
ax(n)
.
Sn
xi = m + d,
(T (x))i = m + d,
(T (x))j = m d,
Sn
ax(k)
Sn
md
am+d
(i) a(j)
xj
i
ax(j)
a(i)
ax(k)
Sn
xj
i
ax(i)
a(j)
k6=i, j
m+d
amd
(i) a(j)
k6=i, j
Similarly,
2(T (x)) =
Sn
h
i
k
am+d amd + amd am+d .
ax(k)
(i)
(j)
(i)
(j)
k6=i, j
Consequently,
Sn
86
k6=i, j
k
(),
ax(k)
where
m
() = am
(i) a(j)
h
i
d
d
d d
d d
ad(i) ad
+
a
a
a
a
+
a
a
(j)
(i) (j)
(i) (j)
(i) (j)
d
m
d
d
d
= am
) ,
(i) a(j) (y + y ) (y + y
and where y = a(i) /a(j) > 0. But for y > 0, the function f (s) = y s + y s is even and
increasing on [ 0, ), thus () 0 because f (d) f (d). Thus, (T (x)) (x).
Now if we insist that
Pn
Sn
defines a mean (of a); its an average of geometric-like means over the n! members of Sn .
With this slight change in notation, we get an interesting extension of the arithmeticgeometric mean inequality:
M0 (a) (a, x) M1 (a)
because b = (1/n, 1/n, . . . , 1/n) x (1, 0, . . . , 0) = c and because
1 X
(a, b) =
(a(1) a(2) a(n) )1/n = (a1 a2 an )1/n
n!
Sn
and
n
1X
1 X
a(1) =
ai .
(a, c) =
n!
n i=1
Sn
We complete this section with a proof of Garrett Birkhoffs theorem on doubly stochastic matrices (1946), a pivotal result in the theory of majorization. Curiously, this is yet
another example of a famous theorem of uncertain provenance. Some sources claim that
it was first proved by K
onig in 1936 (in an article on graph theory). The theorem was
discovered independently by John von Neumann in 1953 (in an article on game theory).
Many contemporary sources refer to it as the Birkhoff-von Neumann theorem.
Theorem. Every doubly stochastic matrix can be written as a convex combination of
permutation matrices.
Proof. Well prove the theorem by induction on the number of nonzero entries. Its clearly
true for doubly stochastic matrices with precisely n nonzero entries, as any such matrix
87
is necessarily a permutation matrix. So we suppose that the theorem holds for doubly
stochastic matrices with fewer than k nonzero entries, where n + 1 k n2 , and that
were given a doubly stochastic matrix D with precisely k nonzero entries.
In particular, D has some entry di1 ,j1 with 0 < di1 ,j1 < 1. But the sum of entries in
the i1 -th row is 1, so there must be some j2 6= j1 such that 0 < di1 ,j2 < 1. But the sum of
the entries in the j2 -th column is 1, so there is some index i2 6= i1 such that 0 < di2 ,j2 < 1,
and so on. This process can be iterated until some pair (i, j) is repeated. Thus we may
suppose that weve chosen a circuit that begins and ends at (i1 , j1 ) and, moreover, that
weve chosen the shortest such circuit whose entries will then satisfy
0 < dis ,js < 1,
1 s m,
where i1 , . . . , im are distinct, j1 , . . . , jm are distinct, and where jm+1 = j1 . (If the circuit
produced three consecutive pairs in the same row or column, we could delete one of those
pairs, arriving at a shorter circuit with the property we need. Consequently, a minimal
circuit will have an even number of corners, with no more than two corners in any one
row or column.)
We now define an auxiliary n n matrix A by setting ais ,js = 1, ais ,js+1 = 1, for
1 s m, and ai,j = 0 otherwise. Note that by our construction every row and every
column of A sums to zero. Finally, set
= min dis ,js > 0
1sm
and
Then each of the matrices D A and D + A has nonnegative entries and each has row
and column sums equal to 1. Thus, each is doubly stochastic and, moreover, each has
fewer than k nonzero entries. By induction, each can be written as a convex combination
of permutation matrices. It follows that the same will be true of D because
D =
(D A) +
(D + A).
+
+
88
MATH 6820
Problem Set 7
August 1, 2011
s(x) =
1 X
(xk x
)2 ,
n1
k=1
where x
= (x1 + +xn )/n, to measure the dispersion of sample data x = (x1 , . . . , xn ).
Information theorists, on the other hand, use the entropy
h(p) =
n
X
pk log pk
k=1
.
1x
1y
1z
1 12 (x + y + z)
4. Show that every doubly stochastic matrix has a positive diagonal. [Hint: If D is
doubly stochastic, then D = 1 P1 + + k Pk , where each Pj is a permutation
matrix, and where some j > 0.]
5. Let 1 n be the eigenvalues of a Hermitian matrix A Mn (C), arranged in
decreasing order. Show that for any orthonormal sequence v1 , . . . , vk Cn we have
k
X
j=1
nj+1
k
k
X
X
h Avj , vj i
j .
j=1
j=1
Conclude that
n inf{h Av, v i : kvk = 1} sup{h Av, v i : kvk = 1} 1 .
[Hint: For x Rn and 1 k n, the function g(x) =
89
Pk
i=1
xi is Schur convex.]
References
Books
Beckenbach, E., and Bellman, R., An Introduction to Inequalities, The Mathematical
Association of America, 1961 (Volume 3 in the New Mathematical Library).
A brief introduction to a handful of classical inequalities. Easy to read and includes full solutions to
the exercises.
Horn, Roger A., and Johnson, Charles R., Matrix Analysis, Cambridge University
Press, 1985; reprinted 1990.
Advanced and extremely thorough; the first of two volumes.
Marcus, M. and Minc, H., A Survey of Matrix Theory and Matrix Inequalities, Dover
Publications, 1992 (originally published in 1969 by Prindle, Weber & Schmidt).
The emphasis here is on survey: Somewhat telegraphic, and hard to read in spots, but packed with
information.
Marshall, A. W., and Olkin, I., Inequalities: Theory of Majorization and its Applications, Academic Press, 1979.
An excellent reference for applications of majorization to probability and statistics.
90
Articles
Apostol, T. M., An elementary view of Eulers summation formula, The American
Mathematical Monthly, 106 (1999), 409418.
Carleson, L., A proof of an inequality of Carleman, Proceedings of the American Mathematical Society, 5 (1954), 932933.
Duncan, J., and McGregor, C. M., Carlemans inequality, The American Mathematical Monthly, 110 (2003), 424431.
Hurlbert, G., A short proof of the Birkhoff-von Neumann theorem, preprint (unpublished), 2008.
Impens, C., Stirlings series made easy, The American Mathematical Monthly, 110 (2003),
730735.
Redheffer, R., and Volkmann, P., Schurs generalization of Hilberts inequality, Proceedings of the American Mathematical Society, 87 (1983), 567568.
Romik, D., Stirlings approximation for n!: the ultimate short proof, The American Mathematical Monthly, 107 (2000), 556557.
91