You are on page 1of 7

Convexity

Po-Shen Loh
June 2013

1 Warm-up
1. Prove that there is an integer N such that no matter how N points are placed in the plane, with no 3
collinear, some 10 of them form the vertices of a convex polygon.
2. Let 9 points P1 , P2 , . . . , P9 be given on a line. Determine all points X which minimize the sum of
distances
P1 X + P2 X + · · · + P9 X .

2 Tools
Recall that a set S (say in the plane) is convex if for any x, y ∈ S, the line segment with endpoints x and y
is completely contained in S. It turns out that this concept is very useful in the theory of functions.
Definition. We say that a function f (x) is convex on the interval I when the set {(x, y) : x ∈ I, y ≥ f (x)}
is convex. On the other hand, if the set {(x, y) : x ∈ I, y ≤ f (x)} is convex, then we say that f is concave.
Note that it is possible for f to be neither convex nor concave. We say that the convexity/concavity is strict
if the graph of f (x) over the interval I contains no straight line segments.
Remark. Plugging in the definition of set-theoretic convexity, we find the following equivalent definition.
The function f is convex on the interval I iff for every a, b ∈ I, the line segment between the points (a, f (a))
and (b, f (b)) is always above or on the curve f . Analogously, f is concave iff the line segment always lies
below or on the curve. This definition is illustrated in Figure 1.

Figure 1: The function in (i) is convex, (ii) is concave, and (iii) is neither. In each diagram, the dotted line segments represent
a sample line segment as in the definition of convexity. However, note that a function that fails to be globally convex/concave
can be convex/concave on parts of their domains. For example, the function in (iv) is convex on the part where it is solid and
concave on the part where it is dotted.

After the above remark, the following famous and useful inequality should be quite believable and easy
to remember.

1
Theorem. (Basic version of Jensen’s Inequality) Let f (x) be convex on the interval I. Then for any
a, b ∈ I,  
a+b f (a) + f (b)
f ≤ .
2 2
On the other hand, if f (x) is concave on I, then we have the reverse inequality for all a, b ∈ I:
 
a+b f (a) + f (b)
f ≥ .
2 2

Remark. This can be interpreted as “for convex f , the average of f exceeds f of the average,” which
motivates the general form of Jensen’s inequality:

Theorem. (Jensen’s Inequality) Let f (x) be convex on the interval I. Then for any x1 , . . . , xt ∈ I,

f (average of {xi }) ≤ average of {f (xi )}.

If f were concave instead, then the inequality would be reversed.

Remark. We wrote the above inequality without explicitly stating what “average” meant. This is to allow
weighted averages, subject to the condition that the same weight pattern is used on both the LHS and RHS.

2.1 That’s great, but how do I prove that a function is convex?


1. If you know calculus, take the second derivative. It is a well-known fact that if the second derivative
f 00 (x) is ≥ 0 for all x in an interval I, then f is convex on I. On the other hand, if f (x) ≤ 0 for all
x ∈ I, then f is concave on I.
2. By the above test or by inspection, here are some basic functions that you should safely be able to
claim are convex/concave.

• Constant functions f (x) = c are both convex and concave.


• Powers of x: f (x) = xr with r ≥ 1 are convex on the interval 0 < x < ∞, and with 0 < r ≤ 1
are concave on that same interval. (Note that f (x) = x is both convex and concave!)
• Reciprocal powers: f (x) = x1r are convex on the interval 0 < x < ∞ for all powers r > 0. For
negative odd integers r, f (x) is concave on the interval −∞ < x < 0, and for negative even
integers r, f (x) is convex on the interval −∞ < x < 0.
• The logarithm f (x) = log x is concave on the interval 0 < x < ∞, and the exponential f (x) = ex
is convex everywhere.

3. f (x) is convex iff −f (x) is concave.

4. You can combine basic convex functions to build more complicated convex functions.

• If f (x) is convex, then g(x) = c · f (x) is also convex for any positive constant multiplier c.
• If f (x) is convex, then g(x) = f (ax + b) is also convex for any constants a, b ∈ R. But the interval
of convexity will change: for example, if f (x) were convex on 0 < x < 1 and we had a = 5, b = 2,
then g(x) would be convex on 2 < x < 7.
• If f (x) and g(x) are convex, then their sum h(x) = f (x) + g(x) is convex.

5. If you are brave, you can invoke the oft-used claim “by observation,” e.g,

f (x) = |x| is convex by observation.

2
The above statement might actually pass, depending on the grader, but the following desparate state-
ment definitely will not:
(x+1)2
f (x) = 2x2 +(3−x)2 is concave by observation.

(It is false anyway, so that would ruin your credibility. . . )

Now prove the following results.


1
1. f (x) = 1−x is convex on −∞ < x < 1.
x2
2. For any constant c ∈ R, f (x) = c−x is convex on −∞ < x < c.
Solution:
x2 x2 c2 c2
 
=− =− x+c+ = −x − c −
c−x x−c x−c x−c

Each summand is convex.


x(x−1)
3. f (x) = 2 is convex.
Solution: Expand: x2 /2 − x/2. This is sum of two convex functions.
x(x−1)···(x−r+1)
4. f (x) = r! is convex on r − 1 < x < ∞.
Solution: Forget about the r! factor. Think of f (x) as a product of r linear terms. Then by the
product rule, f 0 is the sum of r products, each consisting of r − 1 linear terms. For example, one of
the terms would be x(x − 1)(x − 3)(x − 4) · · · (x − r + 1); this corresponds to the x − 2 factor missing.
But then the second derivative f 00 is a sum of even more products, but each product consists of r − 2
linear terms (with two factors missing). But for x > r, every factor is > 0, and we are taking a sum of
products of them, so f 00 > 0.

2.2 Endpoints of convex functions


• If f (x) is convex on the interval a ≤ x ≤ b, then f (x) attains a maximum, and that value is either
f (a) or f (b).
• If f (x) is concave on the interval a ≤ x ≤ b, then f (x) attains a minimum, and that value is either
f (a) or f (b).

2.3 Smoothing and unsmoothing


Theorem. Let f (x) be convex on the interval I. Suppose a < b are both in I, and suppose  > 0 is a real
number for which a +  ≤ b − . Then f (a) + f (b) ≥ f (a + ) + f (b − ). If f is strictly convex, then the
inequality is strict.

This means that:

• For convex functions f , we can decrease the sum f (a) + f (b) by “smoothing” a and b together, and
increase the sum by “unsmoothing” a and b apart.
• For concave functions f , we can increase the sum f (a) + f (b) by “smoothing” a and b together, and
decrease the sum by “unsmoothing” a and b apart.
• In all of the above statements, if the convexity/concavity is strict, then the increasing/decreasing is
strict as well.

3
This “smoothing principle” gives another way to draw conclusions about the assignments to the variables
which bring the LHS and RHS closest together (i.e., sharpening the inequality). Hopefully, this process will
give us a simpler inequality to prove.

Warning. Be careful to ensure √ that your smoothing process terminates. For example, if we are trying
to prove that (a + b + c)/3 ≥ 3 abc for a, b, c ≥ 0, we can observe that if we smooth any pair of variables
together, then the LHS remains constant while the RHS increases. Therefore, a naı̈ve smoothing procedure
would be:
As long as there are two unequal variables, smooth them both together into their arithmetic mean.
At the end, we will have a = b = c, and the RHS will be exactly equal to the LHS, so we are done.
Unfortunately, “the end” may take infinitely long to occur, if the initial values of a, b, c are unfavorable!
Instead, one could use a smoothing argument for which each iteration increases the number of a, b, c equal
to their arithmetic mean (a + b + c)/3.

2.4 Compactness
Definition. Let f (x1 , . . . , xt ) be a function with domain D. We say that the function attains a maximum
on D at some assignment (xi ) = (ci ) if f (c1 , . . . , ct ) is greater than or equal to every other value f (x1 , . . . , xt )
with (x1 , . . . , xt ) ∈ D.
Remark. Not every function has a maximum! Consider, for example, the function 1/x on the domain
0 < x < ∞, or even the function x on the domain 0 < x < 1.
Definition. Let D be a subset of Rn .
• If D “includes its boundary”, then we say that D is closed.
• If there is some finite radius r for which D is contained within the ball of radius r around the origin,
then we say that D is bounded.
• If D is both closed and bounded, then1 we say D is compact.
Theorem. Let f be a continuous function defined over a domain D which is compact. Then f attains a
maximum on D, and also attains a minimum on D.

3 Problems
1. (India 1995, from Kiran) Let x1 , . . . , xn be positive numbers summing to 1. Prove that
r
x1 xn n
√ + ··· + √ ≥ .
1 − x1 1 − xn n−1

Solution: Done immediately by Jensen, just need to prove that x/ 1 − x is convex on the interval √
0<√x < 1. √Use the substitution t = 1 − x, and prove convexity (on the same interval) of (1 − t)/ t =
1/ t + (− t). But this is the sum of two convex functions, hence convex!
2. Let a, b, c be positive real numbers with a + b + c ≥ 1. Prove that
a2 b2 c2 1
+ + ≥ .
b+c c+a a+b 2
x2
Solution: Let S = a + b + c. Then the fractions are of the form S−x , which is convex by the exercise
(S/3)2
in the previous section. Hence LHS is ≥ 3 · (2/3)S = S/2, and we assumed that S ≥ 1.
1 Strictly speaking, this is not the proper definition of “compactness,” but rather is a consequence of the Heine-Borel Theorem.

4
2
3. Let m ≥ n be positive integers. Prove that every graph with m edges and n vertices has at least mn
“V-shapes,” which are defined to be (x, {y, z}), where {y, z} is an unordered pair of distinct vertices,
both of which are adjacent to x.
P di 
Solution: The number is exactly 2 , where the d i are the degrees. Average degree is 2m/n,
and x2 is convex, so Jensen implies that this is ≥ n 2m/n

2 ≥ n(2m/n)(2m/n − 1)/2 ≥ m2 /n.
4. (Application of Kövári-Sós-Turán.) Given n points in the plane, prove that the number of pairs of
points which are distance 1 apart is at most n3/2 .
Extra credit: Improve the bound to 100n4/3 .
Mega bonus: Improve the bound to 100100 n1.3333333333333333333333333333333333333333333333333333333 .
Solution: Zarankiewicz counting for K2,3 . Assume we have average degree at least 2n1/2 and no
K2,3 . For each pair of vertices u, v, let the codegree d(u, v) be their
 number of common neighbors. By
assumption, all of these are at most 2, so their sum is at most n2 · 2 ≤ n2 .
Now consider an arbitrary vertex x. It contributes to exactly d(x) codegrees. Since 2t is quadratic
 
2
and convex, the total contribution to codegrees is
X d(x)  
d
≥n· .
x
2 2

Since we assumed d ≥ 2n1/2 , we get a total contribution of at least roughly 2n2 , contradiction.
5. (Bulgaria, 1995) Let n ≥ 2 and 0 ≤ xi ≤ 1 for all i = 1, 2, . . . , n. Show that
jnk
(x1 + x2 + · · · + xn ) − (x1 x2 + x2 x3 + · · · xn x1 ) ≤ .
2
Bonus: determine when there is equality.
Solution: This is actually linear in each variable separately. Therefore, we can iteratively unsmooth
each variable to an endpoint {0, 1}. Rewrite LHS as
x1 (1 − x2 ) + x2 (1 − x3 ) + · · · xn (1 − x1 ).
With {0, 1}-assignments, each term is zero unless xi = 1 and xi+1 = 0, in which case the term is 1.
This is a “1, 0” pattern in consecutive variables when reading the xi cyclically. Clearly, the maximum
possible number of “1, 0” patterns is the claimed RHS.
6. (USAMO 1980/5) Show that for all real numbers 0 ≤ a, b, c ≤ 1,
a b c
+ + + (1 − a)(1 − b)(1 − c) ≤ 1.
b+c+1 c+a+1 a+b+1
Solution: Consider b, c as constant, and see what happens if we perturb a. Then, the LHS is a
·
convex function of a, because first term is linear, 2nd/3rd terms are of the form x+· , and the third
term is a purely linear function of a. Hence LHS is maximized at one of the extreme values of a (either
0 or 1), so we may assume it is one of these. Similarly, b, c ∈ {0, 1}. This gives 8 assignmets of (a, b, c)
to check, and they all work.
7. (Zvezda 1998) Prove for all reals a, b, c ≥ 0:
(a + b + c)2 √ √ √
≥ a bc + b ca + c ab.
3
Solution:
√ √ √ If√we smooth,
√ say,
√ a and b together, then
√ LHS
√ is invariant, and RHS is of the form
c ab + c ab( a + b). But ab will grow, as will a + b. Smooth until each element hits the
arithmetic mean. Therefore, RHS will grow, so we may assume that all are equal. This corresponds
to (3t)2 /3 ≥ 3t2 , which is indeed true.

5
8. (T. Mildorf’s Inequalities, problem 3) Let a1 , . . . , an ≥ 0 be real numbers summing to 1. Prove that
1
a1 a2 + a2 a3 + · · · + an−1 an ≤ .
4

Solution: Observe that if any of the ai are zero, then we could increase the LHS by deleting
the zero entries (thus reducing the value of n). For example, if n = 4 and the sequence of ai ’s was
(0.5, 0, 0.4, 0.1), then the corresponding LHS with n = 3 and a sequence of (0.5, 0.4, 0.1) would be
higher. Therefore, we may assume that the sequence of ai has no zeros.
Next, consider unsmoothing the pair corresponding to a1 and a3 . Observe that a1 is only multiplied
by a2 , but a3 is multiplied by a2 + a4 > a2 if n ≥ 4, because we just showed all ai > 0. Therefore,
we may increase the LHS by pushing a1 all the way to zero, and giving its mass to a3 . Repeat this
process (at most n < ∞ times) until n ≤ 3.
Finally, since we only have a1 , a2 , and a3 left, the LHS is simply equal to a2 (a1 + a3 ) = a2 (1 − a2 )
because they sum to 1. Clearly, this has maximum value 1/4. (There are also two other cases which
correspond to ending up at n = 2 or n = 1, but those are trivial.)
9. (Hong Kong 2000) Let a1 ≤ · · · ≤ an be real numbers such that a1 + · · · + an = 0. Show that

a21 + · · · + a2n + na1 an ≤ 0.

Solution: If there are at least two intermediate ai , neither of which are equal to a1 or an , then
we can unsmooth them and increase LHS. So we may assume that there are x of the ai equal to some
a = a1 , at most one of the ai equal to some b, and n − x or n − 1 − x of the ai equal to some c.
Case 1 (when there is no b): using xa + (n − x)c = 0, we solve and get x = nc/(c − a). Then
xa2 + (n − x)c2 = −nac, exactly what we needed.
Case 2 (when there is one b): using xa + b + (n − 1 − x)c = 0, we solve and get x = [b + (n − 1)c]/(c − a).
Then xa2 +b2 +(n−1−x)c2 = −(n−1)ac+b(−a−c)+b2 . It suffices to show that b(−a−c)+b2 ≤ −ac. If
b ≥ 0, then use b ≤ c to get b2 ≤ bc, hence b(−a−c)+b2 ≤ −ab ≤ −ac, final inequality using that a ≤ 0
and b ≤ c. On the other hand, if b ≤ 0, then use b ≥ a to get b2 ≤ ab, hence b(−a−c)+b2 ≤ −bc ≤ −ac,
final inequality using that c ≥ 0 and b ≥ a.

10. (IMO 1984) For x, y, z > 0 and x + y + z = 1, prove that xy + yz + xz − 2xyz ≤ 7/27.
Solution: Smooth with the following expression: x(y + z) + yz(1 − 2x). Now, if x ≤ 1/2, then we
can push y and z together. The mushing algorithm is as follows: first, if there is one of them that is
greater than 1/2, pick any other one and mush the other two until all are within 1/2. Next we will
be allowed to mush with any variable taking the place of x. Pick the middle term to be x; then by
contradiction, the other two terms must be on opposite sides of 1/3. Hence we can mush to get one of
them to be 1/3. Finally, use the 1/3 for x and mush the other two into 1/3. Plugging in, we get 7/27.
11. (MOP 2008/6) Prove for real numbers x ≥ y ≥ 1:
x y 1 y x 1
√ +√ +√ ≥√ +√ +√ .
x+y y+1 x+1 x+y x+1 y+1
Q
12. (MOP Team Contest 2008) Let x1 , x2 , . . . , xn be positive real numbers with xi = 1. Prove:
X 1
≤ 1.
n − 1 + xi

6
13. (MOP 1998/5/5) Let a1 ≥ · · · ≥ an ≥ an+1 = 0 be a sequence of real numbers. Prove that:
v
u n n √
uX X √ √
t ak ≤ k( ak − ak+1 ).
k=1 k=1

Solution: Since the inequality is homogeneous, we can normalize the ak so that a1 = 1. (If they are

all zero, it is trivial anyway.) Now define the random variable X such that P (X ≥ k) = ak . Then
STS p √
E[min{X1 , X2 }] ≤ E[ X],
where X, X1 , X2 are i.i.d. Prove by induction on n. Base case is if n = 1, trivial. Now if you go
to n + 1 by shifting
√ q amount
√ of probability from P (X = n) to P (X = n + 1), RHS will increase
by exactly q( n + 1 − n). Yet LHS increases by exactly q 2 under the square root. Now since the
probability shiftedpfrom P (X =pn), the square root was originally at least q 2 n. In the worst case, the
LHS increases by q 2 n + q 2 − q 2 n, which equals the RHS increase.
14. (Putnam 1957/B3.) Let f : [0, 1] → R+ be a monotone decreasing continuous function. Show that
Z 1 Z 1 Z 1 Z 1
f (x)dx xf (x)2 dx ≤ xf (x)dx f (x)2 dx .
0 0 0 0

Solution: Apply the next problem, changing the measure to dy = f (x)dx, normalized to be a
probability measure. Then x and f (x) are anticorrelated, and Chebyshev applies with respect to that
measure.
15. (Chebyshev.) Let a1 ≤ a2 ≤ · · · ≤ an and b1 ≥ b2 ≥ · · · ≥ bn be two sequences of nonnegative real
numbers. Show that
a1 b1 + · · · + an bn a1 + · · · + an b1 + · · · + bn
≤ · ,
n n n
i.e., that the expected value of the product of two negatively correlated random variables is at most
the product of their expected values.
Solution: Smoothing: push ai closer together, R 1 Rand obviously the LHS increases while maintaining
1
RHS. Also, this can be shown by considering 0 0 (f (x) − g(x))(f (y) − g(y))dxdy, as this is always
≤ 0 due to the anticorrelation, and we use Fubini on the terms which mix x and y.

4 Real problems
1. (L., Lubetzky.) Let g0 (s), g1 (s), g2 (s), . . . be an infinite family of functions defined recursively by:
−s

 g0 (s) = e ,
gt+1 (s) = 1 gt (αs)2 − gt (αs + 1 )gt (αs) + gt (αs + 1 ) + gt ( 1 )gt (αs) where α = 1 [1 + gt ( 1 )] .
2α 2 2 2 2 2

Prove that the sequence gt (1/2) converges to 1 as t → ∞.


2. (L., Pikhurko, Sudakov.) Fix an integer q ≥ 2 and a real parameter γ. Consider the following objective
and constraint functions:
X X X
obj(α) := αA log |A| ; v(α) := αA , e(α) := αA αB .
A6=∅ A6=∅ A∩B=∅
q
The vector α has 2 − 1 coordinates αA ∈ R indexed by the nonempty subsets A ⊂ [q], and the sum
in e(α) runs over unordered pairs of disjoint nonempty sets {A, B}. Let Feas(γ) be the feasible set of
vectors defined by the constraints α ≥ 0, v(α) = 1, and e(α) ≥ γ. In terms of q and γ, determine the
maximum possible value obj(α) can take for any α ∈ Feas(γ).

You might also like