Professional Documents
Culture Documents
Probability Inequalities
Inequalities are useful for bounding quantities that might otherwise be hard to compute.
They will also be used in the theory of convergence.
2e
P(|X| > )
If X1 , . . . , Xn N (0, 1) then
2
2
P(|X n | > ) en /2 .
n
.
By symmetry,
2
2e /2
.
P
d
Now let X1 , . . . , Xn N (0, 1). Then X n = n1 ni=1 Xi N (0, 1/n). Thus, X n = n1/2 Z
where Z N (0, 1) and
P(|X| > )
2
2
n ) en /2 .
n
P(X > t)
(1)
x p(x)dx t
xp(x)dx
x p(x)dx +
x p(x)dx =
E(X) =
Theorem 3 (Chebyshevs inequality) Let = E(X) and 2 = Var(X). Then,
P(|X | t)
2
t2
and
P(|Z| k)
1
k2
(2)
where Z = (X )/. In particular, P(|Z| > 2) 1/4 and P(|Z| > 3) 1/9.
Proof. We use Markovs inequality to conclude that
P(|X | t) = P(|X |2 t2 )
2
E(X )2
=
.
t2
t2
2
2
n
4n2
since p(1 p) 14 for all p.
Hoeffdings Inequality
E(etX ) et
2
Recall that a function g is convex if for each x, y and each [0, 1],
g(x + (1 )y) g(x) + (1 )g(y).
Proof. Since a X b, we can write X as a convex combination of a and b, namely,
X = b + (1 )a where = (X a)/(b a). By the convexity of the function y ety we
have
X a tb b X ta
e +
e .
etX etb + (1 )eta =
ba
ba
Take expectations of both sides and use the fact that E(X) = 0 to get
EetX
a tb
b ta
e +
e = eg(u)
ba
ba
(3)
where u = t(b a), g(u) = u + log(1 + eu ) and = a/(b a). Note that
00
g(0) = g 0 (0) = 0. Also, g (u) 1/4 for all u > 0. By Taylors theorem, there is a (0, u)
such that
0
u2 00
u2
t2 (b a)2
u2 00
g () = g ()
=
.
2
2
8
8
.
where c = (b a)2 .
Proof. Without los of generality, we asume that = 0. First we have
P(|Y n | ) = P(Y n ) + P(Y n )
= P(Y n ) + P(Y n ).
Next we use Chernoffs method. For any t > 0, we have, from Markovs inequality, that
!
n
Pn
X
P(Y n ) = P
Yi n = P e i=1 Yi en
i=1
P
t n
i=1 Yi
= P e
= etn
Pn
etn etn E et i=1 Yi
i
2 (ba)2 /8
. So
2 n(ba)2 /8
P(Y n ) etn et
2 /(ba)2
So far we have focused on sums of random variables. The following result extends Hoeffdings
inequality to more general functions g(x1 , . . . , xn ). Here we consider McDiarmids inequality,
also known as the Bounded Difference inequality.
4
22
exp Pn
2
i=1 ci
.
(7)
Proof.
Let Vi = E(g|X1 , . . . , Xi )E(g|X1 , . . . , Xi1 ). Then g(X1 , . . . , Xn )E(g(X1 , . . . , Xn )) =
Pn
i=1 Vi and E(Vi |X1 , . . . , Xi1 ) = 0. Using a similar argument as in Hoeffdings Lemma we
have,
2 2
E(etVi |X1 , . . . , Xi1 ) et ci /8 .
(8)
Now, for any t > 0,
P (g(X1 , . . . , Xn ) E(g(X1 , . . . , Xn )) ) = P
n
X
!
Vi
i=1
Pn
=P e
e et E et i=1 Vi
!!
Pn1
= et E et i=1 Vi E etVn X1 , . . . , Xn1
Pn1
2 2
et et cn /8 E et i=1 Vi
Pn
i=1 Vi
t
..
.
Pn
2
2
et et i=1 ci .
P
The result follows by taking t = 4/ ni=1 c2i .
P
Example 10 If we take g(x1 , . . . , xn ) = n1 ni=1 xi then we get back Hoeffdings inequality.
Example 11 Suppose we throw m balls into n bins. What fraction of bins are empty? Let
Z be P
the number of empty bins and let F = Z/n be the fraction of empty bins. We can write
Z = ni=1 Zi where Zi = 1 of bin i is empty and Zi = 0 otherwise. Then
= E(Z) =
n
X
i=1
and = E(F ) = /n em/n . How close is Z to ? Note that the Zi s are not independent
so we cannot just apply Hoeffding. Instead, we proceed as follows.
5
Recall that a function g is convex if for each x, y and each [0, 1],
g(x + (1 )y) g(x) + (1 )g(y).
If g is twice differentiable and g 00 (x) 0 for all x, then g is convex. It can be shown that if
g is convex, then g lies above any line that touches g at some point, called a tangent line.
A function g is concave if g is convex. Examples of convex functions are g(x) = x2 and
g(x) = ex . Examples of concave functions are g(x) = x2 and g(x) = log x.
Theorem 13 (Jensens inequality) If g is convex, then
Eg(X) g(EX).
(10)
Eg(X) g(EX).
(11)
If g is concave, then
Proof. Let L(x) = a + bx be a line, tangent to g(x) at the point E(X). Since g is convex,
it lies above the line L(x). So,
Eg(X) EL(X) = E(a + bX) = a + bE(X) = L(E(X)) = g(EX).
aA =
aG
aH
Then aH aG aA .
Suppose we have an exponential bound on P(Xn > ). In that case we can bound E(Xn ) as
follows.
Theorem 17 Suppose that Xn 0 and that for every > 0,
2
(12)
C
.
n
(13)
Equation (12) implies that P(Xn > t) c1 ec2 nt . Hence,
Z
Z
Z
2
2
E(Xn ) a +
P(Xn t)dt = a +
P(Xn t)dt a + c1
a
ec2 nt dt = a +
c1 ec2 na
.
c2 n
log(c1 )
1
1 + log(c1 )
+
=
.
nc2
nc2
nc2
Finally, we have
s
p
E(Xn )
E(Xn2 )
1 + log(c1 )
.
nc2
Now we consider bounding the maximum of a set of random variables.
Theorem 18 Let X1 , . . . , Xn be random variables. Suppose there exists > 0 such
2 2
that E(etXi ) et /2 for all t > 0. Then
p
E max Xi 2 log n.
(14)
1in
1in
max exp {tXi }
= E
1in
n
X
2 2 /2
i=1
Thus,
E
The result follows by setting t =
max Xi
1in
log n t 2
+
.
t
2
2 log n/.
OP and oP
We write an bn if both an /bn and bn /an are eventually bounded. In computer sicence this
s written as an = (bn ) but we prefer using an bn since, in statistics, often denotes a
parameter space.
Now we move on to the probabilistic versions. Say that Yn = oP (1) if, for every > 0,
P(|Yn | > ) 0.
Say that Yn = oP (an ) if, Yn /an = oP (1).
Say that Yn = OP (1) if, for every > 0, there is a C > 0 such that
P(|Yn | > C) .
Say that Yn = OP (an ) if Yn /an = OP (1).
Lets use Hoeffdings inequality to show that sample proportions are OP (1/ n) within the
the true mean. Let Y1 , . . . , Yn be coin flips i.e. Yi {0, 1}. Let p = P(Yi = 1). Let
n
1X
Yi .
pbn =
n i=1
C
pn p| > C) = P |b
pn p| >
P( n|b
n
2
2e2C <
if we pick C large enough. Hence,
n(b
pn p) = OP (1) and so
1
pbn p = OP
.
n