Professional Documents
Culture Documents
Appendix
A.1
Inequalities
P∗ kSn k > λ
∗
P max kSk k > λ + µ ≤ .
k≤n 1 − maxk≤n P∗ kSn − Sk k > µ
Proof. Let Ak be the event that kSk k∗ is the first kSj k∗ that is strictly
greater than λ + µ. The event on the left is the disjoint union of A1 , . . . , An .
†
This is the case, for instance, if each (Ωi , Ui , Pi ) is a product (Xi , Ai , Pi )2 of two iden-
tical probability spaces and Xi (x1 , x2 ) = Zi (x1 ) − Zi (x2 ) for some stochastic process Zi .
A.1 Inequalities 593
Proof. Let Ak be the event that kSk k∗ is the first kSj k∗ that is
strictly greater than λ. The event on the left in the first inequality is
the disjoint union of A1 , . . . , An . Write Tn for the sum of the sequence
X1 , . . . , Xk , −Xk+1 , . . . , −Xn . By the triangle inequality, 2kSk k∗ ≤ kSn k∗ +
kTn k∗ . It follows that
Proof. The implications (i) ⇒ (ii) ⇒ (iii) are true for general random se-
quences. We must prove the implications in the converse direction.
(iii) ⇒ (ii). Since any Cauchy sequence is convergent, it suffices to show
that Snk+1 − Snk converges in outer probability to zero as k → ∞ for any
sequence n1 < n2 < · · ·. The sequence (Snk+1 , Snk ) is asymptotically tight
and asymptotically measurable. By Prohorov’s theorem, every subsequence
594 A Appendix
EeisSnk+1 (t)
EeisYk (t) = → 1.
EeisSnk (t)
Thus the characteristic function of Y (t) is 1 in a neighborhood of zero.
Conclude that Y = 0 almost surely.
(ii) ⇒ (i). Write S for the limit in probability of Sn . First assume that
the processes Xj are symmetric. There exists a subsequence n1 < n2 < · · ·
such that P∗ kSnk − Sk > 2−k < 2−k for every k. By the Borel-Cantelli
lemma, Snk as∗
→ S as k → ∞. By a Lévy inequality,
P∗ max kSn − Snk k > 2−k+1 ≤ 2P∗ kSnk+1 − Snk k > 2−k+1 .
nk <n≤nk+1
The right side is smaller than a multiple of 2−k . Hence maxnk <n≤nk+1 kSn −
Snk k∗ converges almost surely to zero by the Borel-Cantelli lemma. This
concludes the proof that Sn converges outer almost surely for symmetric
Xi .
Given general processes, construct an independent copy Y1 , Y2 , . . . de-
fined on a copy of the original probability space (Ω, U, P), and let Tn be
the corresponding partial sums. Then the elements of the sequence Sn − Tn
are the partial sums of the symmetric variables Xi − Yi and converge in
outer probability. (It is formally defined on (Ω, U, P) × (Ω, U, P).) By the
preceding paragraph, Sn − Tn converges outer almost surely. By Fubini’s
theorem there exists ω such that Sn − Tn (ω) converges outer almost surely.
Then it converges also in distribution. Since Sn converges in distribution
as well, it follows that the sequence Tn (ω) is convergent.
Proof. Let Ak be the event that kSk k∗ is the first kSj k∗ that is strictly
greater than λ. The (disjoint) union of A1 , . . . , An is the event that
maxk≤n kSk k∗ is greater than λ. By the triangle inequality, kSj k∗ ≤
A.1 Inequalities 595
On Ak this remains true if the maximum on the left is taken over all kSj k∗ .
Since the processes are independent, we obtain for every k
P Ak , max kSk k∗ > 3λ + η
k≤n
≤ P Ak , max kXk k∗ > η + P(Ak )P max kSj − Sk k∗ > 2λ .
k≤n j>k
In the probability on the far right the variable maxj>k kSj − Sk k∗ can be
bounded by 2 maxk≤n kSk k∗ . Next sum over k to obtain the first inequality
of the proposition.
To prove the second inequality, first establish by the same method that
P Ak , kSn k∗ > 2λ+η ≤ P Ak , max kXk k∗ > η +P(Ak )P kSn −Sk k∗ > λ .
k≤n
P kSn k∗ > 2λ + η
≤ P max kXk k∗ > η + P max kSk k∗ > λ P max kSn − Sk k∗ > λ .
k≤n k≤n k≤n
The processes Sk and Sn − Sk are the partial sums of the symmetric pro-
cesses X1 , . . . , Xn and Xn , . . . , X2 , respectively. Apply Lévy’s inequality to
both probabilities on the far right to conclude the proof.
where G−1 is the quantile function of the random variable kSn k∗ . For p ≥
1, the last inequality is also valid for mean-zero processes (with different
constants).
596 A Appendix
ranging terms, yield the claimed inequality. The second inequality can be
proved in a similar manner, this time using the second inequality of the
preceding proposition.
The inequality for mean-zero processes follows from the inequality for
symmetric processes by symmetrization and desymmetrization: by Jensen’s
inequality, E∗ kSn kp is bounded by E∗ kSn − Tn kp if Tn is the sum of n
independent copies of X1 , . . . , Xn .
Here 1/p + 1/q = 1, and K and Kp are a universal constant and a constant
depending on p only, respectively.
Proof. The first part with the inferior constant 24(21/p )(3p ) follows
from the preceding
R1 Hoffmann-Jørgensen inequality by noting that (1 −
v)G−1 (v) ≤ v G−1 (s) ds ≤ E∗ kSn k for every v, and then substitution
of vp = 1 − 3−p /8 and Kp = 2(3p ). The proofs of the first part with the im-
proved constant p/ log p and of the second and third parts are long and rely
on the isoperimetric methods developed by Talagrand (1989). See Ledoux
and Talagrand (1991), pages 172–175.
if and only if
E∗ sup kXn kp < ∞.
n≥1
A.1.8 Corollary. Let 0 < p < ∞ and let {an } be a sequence of positive
numbers that increases to infinity. Let X1 , X2 , . . . be independent
∗ stochastic
processes indexed by an arbitrary set T . If supn≥1 kSn k/an < ∞ almost
surely, then
kSn kp
E∗ sup <∞
n≥1 apn
598 A Appendix
if and only if
kXn kp
E∗ sup < ∞.
n≥1 apn
Proof. See Hoeffding (1963) and Marshall and Olkin (1979), Corollary
A.2.e, page 339.
Proof. Since the γi can be taken out one at a time, it suffices to show that
E∗ kξγX + Y k ≤ E∗ kξX + Y k
whenever ξ has zero mean and is independent of (γ, X, Y ). By the triangle
inequality, the left side is bounded by
E∗
(ξX + Y )γ
+ E∗
Y (1 − γ)
= E∗
(ξX + Y )
γ + E∗ kY k (1 − γ).
A.1 Problems and Complements 599
By
Jensen’s inequality and Fubini’s theorem, it is bounded by E∗
Y +
ξX
(1 − γ). The result follows since γ is measurable.
It can be shown that this pair of inequalities define a unique number M (X)
(see the proof of the following theorem). Furthermore, we define
This implies that the function f (z) = maxi |Az|i is Lipschitz of norm
bounded by supt σ(Xt ) = σ(X). Apply Lemma A.2.2 to both f and −f ,
and combine the results to obtain the theorem for the case where the index
set is finite.
Since X is separable by assumption, the supremum kXk is the almost-
sure, monotone limit of a sequence of finite suprema. The supremum M
of the sequence of medians can be seen to be a median (the smallest) of
kXk. Approximate kXk − M (X) by a sequence of similar objects for finite
suprema to obtain the first inequality of the theorem with M = M (X).
The proof of this inequality is complete if it is shown that M is the only
median of kXk.
Since the median of |Xt | is bounded above by the median of kXk,
it follows that P |Xt | ≤ M (X) ≥ 21 for every t. Taking into account
the normal distribution of Xt , we obtain that σ(Xt ) is bounded above by
M (X)/Φ−1 (3/4). Therefore, σ(X) is finite and the exponential inequality
obtained previously is nontrivial. The argument actually shows that both
the left-tail probability
P kXk ≤ M − λ and the right-tail probability
P kXk ≥ M + λ are bounded by 1/2 times the exponential upper bound.
This means that these probabilities are strictly less than 1/2 for λ > 0,
whence M is a unique median of kXk.
To obtain the second inequality of the theorem, we note first that EkXk
is finite in view of the exponential tail bound for kXk − M (X). Next we
use the following lemma and take limits along finite subsets as before.
The third inequality is trivially satisfied if 0 ≤ λ < 2EkXk, because in
that case the exponential is larger than exp(− 21 ) ≥ 0.6. For λ > 2EkXk,
602 A Appendix
the probability is bounded by P kXk > EkXk + λ/2 , which can be further
bounded by the second inequality.
Proof. For the first inequality consider, the set A = z: f (z) ≤ med f (Z) .
λ
Since f has Lipschitz norm bounded by 1, the set A of points at distance
at most λ from A is contained in the set z: f (z) ≤ med f (Z) + λ . It
follows that
P f (Z) ≤ med f (Z) + λ ≥ P Z ∈ Aλ .
By definition of the median, the set A has probability at least 1/2 under
the standard normal distribution. According to the isoperimetric inequality
for the normal distribution, Corollary A.2.9, a half-space H with boundary
at the origin is an extreme set in the sense that P(Z ∈ Aλ ) ≥ P(Z ∈ H λ )
for every λ > 0, for any other measurable set A with probability at least
1/2. The proof of the first inequality of the lemma is complete upon noting
that P(Z ∈ H λ ) = Φ(λ) and 1 − Φ(λ) ≤ 21 exp(− 12 λ2 ).
For the proof of the second inequality, assume without loss of generality
that Ef (Z) = 0. An arbitrary Lipschitz function f can be approximated
by arbitrarily smooth functions with compact supports and of no larger
Lipschitz norms. Therefore, it is no loss of generality to assume that f is
sufficiently regular. In particular assume without loss of generality that f
is differentiable with gradient ∇f uniformly norm bounded by 1.
Let Zt be normally distributed with mean zero and covariance matrix
(1 − e−2t )I. For t ≥ 0, define functions Pt f by
Pt f (x) = Ef e−t x + Zt .
Since the functions log G and 12 r2 e−2t are both zero at ∞ and hence equal,
it follows that log G is smaller than 12 r2 e0 at zero. This concludes the proof
that E exp rf (Z) = G(0) ≤ exp( 12 r2 ).
By Markov’s inequality, it follows that P f (Z) ≥ λ ≤ exp( 12 r2 − rλ)
The process X having bounded sample paths is the same as kXk being
a finite random variable. In that case the median M (X) is certainly finite
and σ(X) is finite by the argument in the proof of the preceding proposi-
tion. Next, the inequalities in the preceding proposition show that kXk has
moments of all orders. In fact, we have the following proposition.
for any separable Gaussian process X for which kXk is finite almost surely.
604 A Appendix
Proof. For the probability bound, see Ledoux and Talagrand (1991), Corol-
lary 3.12. This implies the bound E supt∈T Xt ≤ 2E supt∈T Yt , as shown in
Corollary 3.14 of the same reference. The improved inequality without the
factor 2 and the comparison of the moduli require more elaborate argu-
ments, which can be found in the original papers.
If Xt = 0, for some t ∈ T , then supt |Xt | ≤ sups,t (Xs − Xt ) and hence
E supt |Xt | ≤ 2E supt Xt , by the symmetry of the normal distribution. If the
variances of the processes X and Y are equal (or at least EXt2 ≤ EYt2 ), then
we can add zero variables X∞ and Y∞ to the processes, giving processes
X̄ and Ȳ with extended index set T̄ = T ∪ {∞} such that E(X̄s − X̄t )2 ≤
E(Ȳs − Ȳt )2 , for every s, t ∈ T̄ . Then E supt |Xt | ≤ 2E supt X̄t ≤ 2E supt Ȳt ,
by the first assertion.
ρ(s, t) = σ(Xs − Xt ), s, t ∈ T.
Proof. Ehrhard (1993) proved the inequality when both A and B are
convex. The generalization given here is due to Latala (1996).
Proof. The sets C(t) = {z: f (z) ≤ t} are convex, and, by convexity of
f , satisfy C µs + (1 − µ)t ⊃ µC(s) + (1 − µ)C(t), for any s, t ∈ R and
µ ∈ [0, 1]. The given function is h(t) = Φ−1 P(Z ∈ C(t)) , and hence
h(µs + (1 − µ)t) ≥ Φ−1 P(Z ∈ µC(s) + (1 − µ)C(t) . This is lower bounded
by µh(s)+(1−µ)h(t), by Proposition A.2.7, applied with the sets A = C(s)
and B = C(t).
The median m of f (Z) satisfies P(f (Z) < m) ≤ 1/2 ≤ P(f (Z) ≤ m),
which implies that h(m−) ≤ 0 ≤ h(m). Since the function h is continuous
in view of its concavity, it must satisfy h(m) = 0. Because h is increasing
606 A Appendix
λ
As µ ↑ 1, the sets µA + λB increase to A + λB = A , and hence the
−1 λ
left side increases to Φ ∈ A ) . The first term on the right clearly
P(Z
increases to Φ−1 P(Z ∈ A), while the second term on the right is (1 −
µ)Φ−1 P kZk ≤ λ/(1 − µ) , and tends to λ as µ ↑ 1, as follows from the
fact Φ−1 (v) ∼ F −1 (v) ∼ 2 log+ (1 − v) 1/2 , as v ↑ 1, for F the distribution
function of kZk, which is the root of chisquare variable, or by repeated
application of l’Hôpital’s rule. Thus, letting µ ↑ 1 in the preceding display,
λ
and
−1 right of the limits, we conclude that P(Z ∈ A ) ≥
applying Φ left and
Φ Φ (P(Z ∈ A)) + λ .
By the rotation symmetry of the standard normal distribution it is not
a loss of generality that the half-space in the assumption takes the form
H = {z ∈ Rd : z1 ≤ r}. In that case P(Z ∈ H) = Φ(r), and H λ = {z ∈
Rd : z1 ≤ r + λ}, so that P(Z ∈ H λ ) = Φ(r + λ). If P(Z ∈ A) = P(Z ∈ H),
then the right side of the final equality in the preceding paragraph reduces
to P(Z ∈ H λ ).
The final statement of the corollary follows from the inequality P(Z ∈
Aλ ) ≥ Φ(r + λ), where r ≥ 0 since P(Z ∈ A) ≥ 1/2, so that Φ(r + λ) ≥
Φ(λ).
Corollary A.2.9 (see its proof for the equivalent formulation needed) this is
bounded below by Φ Φ−1 P(Z n ∈ An ) +λ , for any λ > 0. Since the event
{Z n ∈ An } contains
theevent {Z ∈ A0 }, this is further bounded below by
Φ Φ−1 P(Z ∈ A0 ) + λ , for any λ > 0.
This concludes the proof when A0 is compact. Because every Borel
measure on R∞ is inner regular, for a general A0 there exists a sequence
of compact sets Kn ⊂ A0 with P(Z ∈ Kn ) ↑ P(Z ∈ A0 ). The argument of
the preceding
paragraph applied with Kn instead of A0 gives that P kZ −
Kn k2 ≤ λ ≥ Φ Φ−1 P(Z ∈ K n + λ . Since Kn ⊂ A0 , the left side is
)
smaller than P kZ − A0 k2≤ λ , for every n, while the right side increases
to Φ(Φ−1 P(Z ∈ A0 ) + λ , as n → ∞.
measure) on [0, 1]2 . Then σ 2 (B 0 ) = 1/4, and this supremum is achieved for
every t ∈ [0, 1]2 with t1 t2 = 1/2. It can be shown that (A.2.14) holds with
V = 4 and W = 1. Therefore, for some constant M ,
P sup Bt0 ≥ λ ≤ M λ2 exp(−2λ2 ).
t∈T
It has been shown by Hogan and Siegmund (1986) and also by Aldous
(1989), page 202, that as λ → ∞,
P sup Bt0 ≥ λ ∼ (4 log 2)λ2 exp(−2λ2 ).
t∈T
A.2.17 Example (Tucked brownian sheet on [0, 1]2 ). The tucked Brow-
nian sheet is the zero-mean Gaussian process indexed by [0, 1]2 , with co-
variance function
This can be obtained from the Brownian sheet given in Example A.2.15,
by pinning it down to 0 on the entire boundary of [0, 1]2 , i.e.
cov Z(s1 , t1 ), Z(s2 , t2 ) = (s1 ∧ s2 )(t1 ∧ t2 − t1 t2 ).
This process is a special case of the P −Kiefer process in Chapter 2.12 with
P the uniform distribution on [0, 1] and F = {1[0,t] : t ∈ [0, 1]}. In this case
σ 2 (Z) = 1/4, and this is uniquely achieved for (s, t) = (1, 1/2). It can be
shown that (A.2.14) holds with V = 4 and W = 3. Therefore, for some
constant M ,
P sup Zt ≥ λ ≤ M exp(−2λ2 ).
t∈T
1
(A.2.20) log N[ ] ε, C, L2 (P ) ≤ K ,
ε
for some Borel probability measure µ on (T, ρ). Here B(t, ε) is the ρ-ball of
radius ε around t.
‡
Ledoux and Talagrand (1991), pages 320–321, and (11.15) on page 317.
A.2 Gaussian Processes 613
This explains the name “majorizing measure” and readily yields one direc-
tion of the preceding proposition. In the other direction, the proposition is
harder to prove. This converse part is used in the proof of Theorem 2.11.11
in combination with the following lemma.
Proof. The set T is totally bounded: the existence of infinitely many dis-
joint balls B(ti , δ) of fixed radius δ > 0 would require that µ B(ti , δ) → 0
as i → ∞. However,
s Z δs
1 1
sup δ log ≤ sup log dε.
t µ B(t, δ) t 0 µ B(t, ε)
Z 2−q0 q
log 1/µ B(t, ε) dε
0
if µ B(t, ε) is bounded away from 1, which may be assumed.
The present sequence of partitions need not be nested. Replace the qth
partition by the partition in the sets ∩qp=1 T̄p,ip , where i1 , . . . , iq range over
all possible values. For each set in the qth partition, define
Q
q
mq ∩qp=1 T̄p,ip = p=1 m̄p (T̄p,ip ).
2−q mq .
P
(The exact location of the mass is irrelevant.) Next, set m = q
Then m is a subprobability measure and
v
q
s u
X 1 X X 1
2−q −q t
u
log ≤ 2 q
log 2 + log .
q>q0
m(Tq t) q>q p=1
m̄p T̄p t
0
6. The Brownian sheet B satisfies the tail bound, for every λ > 0,
P sup B(t1 , t2 ) ≥ λ ≤ 4 1 − Φ(λ) .
0≤t1 ,t2 ≤1
7. The Kiefer-Müller process Z satisfies the tail bound, for every λ > 0,
Z(t1 , t2 ) ≥ λ ≤ 2 exp(−2λ2 ).
P sup
0≤t1 ,t2 ≤1
A.3.1 Proposition.
Pn Let M be a median of the norm of the Rademacher
process i=1 εi xi . Then there exists a universal constant C such that, for
every λ > 0,
P
λ2
P
εi xi
− M > λ ≤ 4 exp − 2 ,
8σ
P
P
λ2
P εi xi
− E
εi xi
> λ ≤ C exp − 2 .
9σ
Proof. For the first inequality, see Ledoux and Talagrand (1991), Theo-
rem 4.7, on page 100.
A.3 Rademacher Processes 617
P
To obtain the second inequality
set µ√= Ek εi xi k. Integrate over
the first inequality to obtain µ − M ≤ 4 2πσ. For cλ >P |µ − M |, the
event in the second inequality is contained in the event k εi xi k − M >
(1 − c)λ . Choose c > 0 sufficiently small to bound the probability of this
event by 4 exp −λ2 /9σ 2 . For cλ ≤ |µ − M |, the right side
in the second
inequality is never smaller than C exp −(µ − M )2 /9c2 σ 2 , which is at least
1 for sufficiently large C.
Proof. See Ledoux and Talagrand (1991), Theorem 4.12, on page 112.
1 2
The last proposition
P with
the choices
P φi (x)
= 2 x is applied in Chap-
2
ter 2.14.5 to obtain E
εi xi
≤ 4E
εi xi
.
A.4
Isoperimetric Inequalities
for Product Measures
Combine the last two displays to see Qthat their common left side is bounded
−1 q −1
above by q ∧ min 1≤l≤q lZ times l=1 (Bl )
P for the random variables
Zl = P Al (Z) /P (Bl ). An application of the inequality in the first para-
graph completes the proof.
Actually the preceding proof ignores issues of measurability and is false
in general, because maps of the type X 7→ f A1 , . . . , Aq , X need not be
measurable, as required for the application of Fubini’s theorem.
With some effort it can be seen that this problem does not arise in the
case that the sample space is Polish with Borel σ-field and compact sets
A1 , . . . , Aq . This follows from the identity
\ q
[
x: n − f A1 , . . . , Aq , x ≤ k = x: πIi x ∈
/ π I i Ai ,
|∪Ii |>k i=1
is Borel measurable into the Polish space {0, 1}∞ ≡ R and A = φ−1 (B)
n n
for the Borel sets B. Let φn : X → R be the map (x1 , . . . , xn ) 7→
φ(x1 ), . . . , φ(xn ) , and write P for the underlying measure on X .
By construction there exists for every A ∈ An a Borel set B ∈ B n
with A = φ−1 0
n (B). Suppose that there exists a Borel set B ⊂ B of the
same measure under (P ◦ φ ) with the property that for every z 0 ∈ B 0 ,
−1 n
there exists z ∈ B ∩ φ(X )n such that zi = zi0 for all i such that zi0 ∈ φ(X ).
Thus, the coordinates of z 0 that are not in φ(X ) can be changed, meanwhile
leaving the other coordinates the same, so as to obtain a point in B with
A.4 Isoperimetric Inequalities for Product Measures 621
Consequently,
∗
(P n )∗ f (A1 , . . . , Aq , x) > k ≤ (P ◦ φ−1 )n z: f B10 , . . . , Bq0 , z > k .
Since (P ◦ φ−1 )n (B 0 ) = P n (A), the problem has been reduced to the case
of a Polish sample space.
The existence of the sets B 0 can be argued as follows. For every parti-
tion {1, . . . , n} = I ∪ J, define
n o
BI,J = x ∈ Rn : (P ◦ φ−1 )J B(πI x) = 0 .
This means that the set B(πI z 0 )∩φ(X )J cannot be empty. If the coordinates
{zi0 : i ∈ J} are not contained in φ(X ), then they can be changed in the
desired manner.
In this chapter we present the strong law of large numbers and the central
limit theorem uniformly in the underlying measure, as well as the rank
central limit theorem.
Let X̄n be the average of the first n variables from a sequence
X1 , X2 , . . . of independent and identically distributed random vectors in
Rd . Let P be a class of underlying probability measures. For instance, let
Xi be the ith coordinate projection of (Rd )∞ , and let P consist of Borel
probability measures on Rd .
Then the strong law of large numbers holds uniformly in P ∈ P in the sense
that, for every ε > 0,
lim sup PP sup |X̄m − EP X1 | ≥ ε = 0.
n→∞ P ∈P m≥n
Here C is an absolute constant and g(ε) = EkXk2 kXk ≥ ε .
For each n, let an1 , . . . , ann and bn1 , . . . , bnn be real numbers, and let
(Rn1 , . . . , Rnn ) be a random vector that is uniformly distributed on the n!
permutations of {1, . . . , n}. Consider the rank statistic
n
X
Sn = bni an,Rni .
i=1
The mean and variance of Sn are equal to ESn = nān b̄n and var Sn =
A2n Bn2 /(n − 1), where A2n and Bn2 are the sums of squared deviations from
the meanPn of the numbers an1 , . . . , ann and bn1 , . . . , bnn , respectively. Thus
A2n = i=1 (ani − ān )2 .
A.5 Some Limit Theorems 625
Here g(µ) is equal to (1 − 2µ)−1 log(1/µ − 1), for 0 < µ < 1/2, and 2µ(1 −
−1
µ) , for 1/2 ≤ µ < 1.
This inequality is valid for any n and p. For p close to zero or one, it is
possible to improve on the factor 2 in the exponent considerably.
√ λ2 λ
P n|Ȳn − p| ≥ λ ≤ 2 exp − ψ √ , λ > 0.
2p np
A.6.3 Corollary (Kiefer’s inequality). For all n and 0 < p < e−1 ,
√
n|Ȳn − p| ≥ λ ≤ 2 exp −λ2 log(1/p) − 1 .
P
√
Proof. Since the probability
√ on the
√ left side is zero if λ > n, we may
assume that λ ≤ n. Then ψ(λ/ np) ≥ ψ(1/p), because ψ is decreas-
ing. Now apply Bennett’s inequality and finish the proof by noting that
ψ(1/p)/(2p) ≥ log(1/p) − 1 for 0 < p < e−1 .
A.6.8 Proposition (Mason and van Zwet inequality). Let the random
vector (N1 , . . . , Nk ) be multinomially distributed with parameters n and
(p1 , . . . , pk ) such that pi > 0 for i < k. Then for every C > 0 and δ > 0
there exist constants a, b, c > 0, such that for P all n ≥ 1 and λ, p1 , . . . , pk−1
k−1
satisfying λ ≤ Cn min{pi : 1 ≤ i ≤ k − 1} and i=1 pi ≤ 1 − δ, we have
k−1
X (Ni − npi )2
P > λ ≤ a exp(bk − cλ).
i=1
npi
We do not give a full derivation of this inequality here, but note that
the chi-square statistic is an example of a supremum over an elliptical class
of functions. Therefore, the preceding inequality can be deduced from Ta-
lagrand’s general empirical process inequality, Theorem 2.15.5.
To prove the last inequality, interpret the right side of this identity as an
expectation.]
3. The probability
in Kiefer’s inequality is bounded below by exp −λ2 (1 −
−2 √
p) log(1/p) when λ = n(1 − p).
4. The constant K resulting from the proof of Proposition A.6.7 may be min-
imized by choosing an optimal cut-off point instead of e−12 . This involves
understanding the dependence of the constant K2 in Proposition A.6.4 on
p0 .
5. Independent binomial (1, pi ) variables X1 , . . . , Xn satisfy
n k
k e np̄n
X
P Xi > k ≤ exp −np̄n h < ,
np̄n k
i=1
for the function h(x) = x(log x − 1) + 1 and p̄n the average of the success
probabilities.
A
Notes
A.1. Ottaviani (1939) proves his inequality for the case of real-valued
random variables. Hoffmann-Jørgensen (1974) proves his inequalities for
Banach-valued random variables. For both cases Dudley (1984) gives careful
proofs for random elements in nonseparable Banach spaces.
Proposition A.1.6 is due to Talagrand (1989).
A.4. The basic inequality in this chapter was first proved by Talagrand
(1989) and Ledoux and Talagrand (1991). The present elegant proof by
induction is taken from Talagrand (1995), who discusses many extensions
and applications. The name “isoperimetric inequality” appears not entirely
appropriate. It resulted from the analogy with exponential inequalities for
Gaussian variables, which can be derived from isoperimetric inequalities for
the standard normal distribution. This asserts that, among all sets A with
P(A) equal to a fixed number, the probability P(Aε ) increases the least as ε
increases from zero for A of the form {x: |x| > c}. See Ledoux and Talagrand
(1991), Ledoux (1996) and Ledoux (2001) for further references.