© All Rights Reserved

3 views

© All Rights Reserved

- Math Concepts Everyone Should Know by Metin Bektas
- 99ebook_com-0132311232
- psqt imp questions[1][1].doc
- ML103430339
- ch03a
- Statistic Paper -2
- probability
- 4594827 Probability Concepts and Applications
- Error analysis lecture 3
- quest old323232
- ch06
- Module 18 Probability Distributions
- Statistics Homework Help
- lognamrm
- 466_Chapter7s
- JIM 104 2011-12.doc
- is.397.2.2003
- Chapter 05_intan Revised
- Lecture 8.pdf
- 6.2.3.2

You are on page 1of 34

In the study of random graphs or any randomly chosen objects, the tools

of the trade mainly concern various concentration inequalities and martingale

inequalities.

able guess is the expected value of the object. However, how can we tell how good

the expected value is to the actual outcome of the event? It can be very useful

if such a prediction can be accompanied by a guarantee of its accuracy (within a

certain error estimate, for example). This is exactly the role that the concentration

inequalities play. In fact, analysis can easily go astray without the rigorous control

coming from the concentration inequalities.

In our study of random power law graphs, the usual concentration inequalities

are simply not enough. The reasons are multi-fold: Due to uneven degree distri-

bution, the error bound of those very large degrees offset the delicate analysis in

the sparse part of the graph. Furthermore, our graph is dynamically evolving and

therefore the probability space is changing at each tick of the clock. The problems

arising in the analysis of random power law graphs provide impetus for improving

our technical tools.

Indeed, in the course of our study of general random graphs, we need to use

several strengthened versions of concentration inequalities and martingale inequali-

ties. They are interesting in their own right and are useful for many other problems

as well.

In the next several sections, we state and prove a number of variations and

generalizations of concentration inequalities and martingale inequalities. Many of

these will be used in later chapters.

of coin flips. For some fixed value p, where 0 p 1, the outcome of the coin

tossing process has probability p of getting a head. Let Sn denote the number of

heads after n tosses. We can write Sn as a sum of independent random variables

Xi as follows:

Sn = X 1 + X 2 + + X n

23

24 2. OLD AND NEW CONCENTRATION INEQUALITIES

Pr(Xi = 1) = p,

(2.1) Pr(Xi = 0) = 1 p.

A classical question is to determine the distribution of Sn . It is not too difficult to

see that Sn has the binomial distribution B(n, p):

n k

Pr(Sn = k) = p (1 p)nk , for k = 0, 1, 2, . . . , n.

k

The expectation and variance of B(n, p) are

E(Sn ) = np, Var(Sn ) = np(1 p).

compare it with the normal distribution N (a, ), whose density function is given

by

1 (x)2

f (x) = e 22 , < x <

2

where denotes the expectation and 2 is the variance.

The case N (0, 1) is called the standard normal distribution whose density func-

tion is given by

1 2

f (x) = ex /2 , < x < .

2

0.008 0.4

0.007 0.35

0.006 0.3

Probability density

0.005 0.25

Probability

0.004 0.2

0.003 0.15

0.002 0.1

0.001 0.05

0 0

4600 4800 5000 5200 5400 -10 -5 0 5 10

value value

nomial distribution dard normal distribu-

B(10000, 0.5). tion N (0, 1).

the standard normal distribution and can be viewed as a special case of the Central

Limit Theorem, sometimes called the DeMoivre-Laplace Limit Theorem [53].

2.1. THE BINOMIAL DISTRIBUTION AND ITS ASYMPTOTIC BEHAVIOR 25

satisfies, for two constants a and b,

Z b

1 2

lim Pr(a < Sn np < b) = ex /2 dx

n a 2

p

where = np(1 p), provided np(1 p) as n .

n

n! = (1 + o(1)) 2n( )n

e

n n

or, equivalently, n! 2n( ) .

e

For any constant a and b, we have

Pr(a < Sn np < b)

X

n k

= p (1 p)nk

k

a<knp<b

X r

1 n nn

pk (1 p)nk

2 k(n k) k k (n k)nk

a<knp<b

X 1 np k+1/2 n(1 p) nk+1/2

= p

2np(1 p) k nk

a<knp<b

X 1 k np k1/2 k np n+k1/2

= 1+ 1 .

2 np n(1 p)

a<knp<b

To approximate the above sum, we consider the following slightly simpler expres-

sion. Here, to estimate the lower order term, we use the fact that k = np + O()

2 3

and 1 + x = eln(1+x) = exx +O(x ) , for x = o(1). To proceed, we have

Pr(a < Sn np < b)

X 1 k np k k np n+k

1+ 1

2 np n(1 p)

a<knp<b

X 1 k(knp) + (nk)(knp) + k(knp)

2

+ (nk)(knp)

2

1

+O( )

e np n(1p) n 2 p2 n2 (1p)2

a<knp<b

2

X 1 1 knp 2 1

= e 2 ( ) +O( )

a<knp<b

2

X 1 1 knp 2

e 2 ( ) .

a<knp<b

2

Now, we set x = xk = knp , and the increment dx= xk xk1 = 1/. Note that

a < x1 < x2 < < b forms a 1/-net for the interval (a, b). As n approaches

infinity, the limit exists. We have

Z b

1 2

lim Pr(a < Sn np < b) = ex /2 dx.

n a 2

26 2. OLD AND NEW CONCENTRATION INEQUALITIES

Thus, the limit distribution of the normalized binomial distribution is the normal

distribution.

true. For example, for p = n , the limit distribution of B(n, p) is the so-called

Poisson distribution P ():

k

Pr(X = k) = e , for k = 0, 1, 2, . . . .

k!

The expectation and variance of the Poisson distribution P () is given by

E(X) = , and Var(X) = .

binomial distribution B(n, p) is the Poisson distribution P ().

Proof. We consider

n k

lim Pr(Sn = k) = lim p (1 p)nk

n n k

Qk1

k i=0 (1 ni ) p(nk)

= lim e

n k!

k

= e .

k!

0.25 0.25

0.2 0.2

Probability

Probability

0.15 0.15

0.1 0.1

0.05 0.05

0 0

0 5 10 15 20 0 5 10 15 20

value value

nomial distribution distribution P (3).

B(1000, 0.003).

distribution B(n, p) changes from the normal distribution to the Poisson distribu-

tion. (Some examples are illustrated in Figures 5 and 6). Theorem 2.1 states that

the asymptotic behavior of B(n, p) within the interval (np C, np + C) (for any

constant C) is close to the normal distribution. In some applications, we might

need asymptotic estimates beyond this interval.

2.2. GENERAL CHERNOFF INEQUALITIES 27

0.05

0.14

0.04 0.12

0.1

Probability

Probability

0.03

0.08

0.02 0.06

0.04

0.01

0.02

0 0

70 80 90 100 110 120 130 0 5 10 15 20 25

value value

nomial distribution nomial distribution

B(1000, 0.1). B(1000, 0.01).

dependent variables, it is possible to derive

Pngood estimates. The binomial distri-

bution is one such example where Sn = i=1 Xi and the Xi s are independent

and identical. In this section, we consider sums of independent variables that are

not necessarily identical. To control the probability of how close a sum of random

variables is to the expected value, various concentration inequalities are in play. A

typical version of the Chernoff inequalities, attributed to Herman Chernoff, can be

stated as follows:

Theorem 2.3. [28] Let X1 , . . . , Xn be independent

Pn random variables such that

E(Xi ) = 0 and |Xi | 1 for all i. Let X = i=1 Xi and 2 be the variance of Xi .

Then

2

Pr(|X| k) 2ek /4 ,

for any 0 k 2.

the following version of Chernoff inequalities is often useful.

Theorem 2.4. [28] Let X1 , . . . , Xn be independent random variables with

Pr(Xi = 1) = pi , Pr(Xi = 0) = 1 pi .

Pn Pn

We consider the sum X = i=1 Xi , with expectation E(X) = i=1 pi . Then we

have

2

(Lower tail) Pr(X E(X) ) e /2E(X)

,

2

2(E(X)+/3)

(Upper tail) Pr(X E(X) + ) e .

We remark that the term /3 appearing in the exponent of the bound for the

upper tail is significant. This covers the case when the limit distribution is Poisson

as well as normal.

28 2. OLD AND NEW CONCENTRATION INEQUALITIES

There are many variations of the Chernoff inequalities. Due to the fundamen-

tal nature of these inequalities, we will state several versions and then prove the

strongest version from which all the other inequalities can be deduced. (See Fig-

ure 7 for the flowchart of these theorems.) In this section, we will prove Theorem

2.8 and deduce Theorems 2.6 and 2.5. Theorems 2.10 and 2.11 will be stated and

proved in the next section. Theorems 2.9, 2.7, 2.13, and 2.14 on the lower tail can

be deduced by reflecting X to X.

Theorem 2.5

Theorem 2.4

binomial distribution:

Pr(Xi = 0) = 1 pi .

Pr(Xi = 1) = pi ,

Pn Pn

For X = i=1 ai Xi with ai > 0, we have E(X) = i=1 ai pi and we define =

P n 2

i=1 ai pi . Then we have

2

(2.2) Pr(X E(X) ) e /2

2

(2.3) Pr(X E(X) + ) e 2(+a/3)

where a = max{a1 , a2 , . . . , an }.

The cumulative distribution is the function Pr(X > x). The dotted curve in Figure

8 illustrates the cumulative distribution of the binomial distribution B(1000, 0.1)

with the value ranging from 0 to 1 as x goes from to . The solid curve at

2

the lower-left corner is the bound e /2 for the lower tail. The solid curve at the

2

upper-right corner is the bound 1 e 2(+a/3) for the upper tail.

2.2. GENERAL CHERNOFF INEQUALITIES 29

0.8

Cumulative Probability

0.6

0.4

0.2

0

70 80 90 100 110 120 130

value

The inequality (2.3) in the above theorem is a corollary of the following general

concentration inequality (also see Theorem 2.7 in the survey paper by McDiarmid

[99]).

Theorem 2.6. [99] Let Xi be independent random P

variables satisfying Xi

n

M , for 1 i n. We consider theP

E(Xi ) + P sum X = i=1 Xi with expectation

n n

E(X) = i=1 E(Xi ) and variance Var(X) = i=1 Var(Xi ). Then we have

2

Pr(X E(X) + ) e 2(Var(X)+M /3) .

Theorem 2.7. If X1 , X2 , . . . , Xn are non-negative

P independent random vari-

ables, we have the following bounds for the sum X = ni=1 Xi :

Pn 2E(X2 )

Pr(X E(X) ) e 2

i=1 i .

Theorem 2.8. SupposeP pPn variables satisfying Xi

Xi are independent random

n

M , for 1 i n. Let X = i=1 Xi and kXk = 2

i=1 E(Xi ). Then we have

2

2(kXk2+M /3)

Pr(X E(X) + ) e .

for the lower tail.

Theorem 2.9. Let P pPn variables satisfying Xi M ,

Xi be independent random

n

for 1 i n. Let X = i=1 Xi and kXk = 2

i=1 E(Xi ). Then we have

2

2(kXk2+M /3)

Pr(X E(X) ) e .

30 2. OLD AND NEW CONCENTRATION INEQUALITIES

Before we give the proof of Theorem 2.8, we will first show the implications of

Theorems 2.8 and 2.9. Namely, we will show that the other concentration inequal-

ities can be derived from Theorems 2.8 and 2.9.

Pn

Proof. Let Xi0 = Xi E(Xi ) and X 0 = i=1 Xi0 = X E(X). We have

Xi0 M for 1 i n.

We also have

X

n

2

kX 0 k2 = E(Xi0 )

i=1

X

n

= E((Xi E(Xi ))2 )

i=1

X

n

= Var(Xi )

i=1

= Var(X).

Applying Theorem 2.8, we get

Pr(X E(X) + ) = Pr(X 0 )

2

2(kX 0 k

e 2 +M /3)

2

e 2(Var(X)+M /3) .

The proof is straightforward by choosing M = 0.

X n X

n

kXk2 = E(Yi2 ) = a2i pi = .

i=1 i=1

Equation (2.2) now follows from Theorem 2.7 since the Yi s are non-negative.

Yi ai a E(Yi ) + a.

Equation (2.3) now follows from Theorem 2.6.

2.9 to Y .

2.2. GENERAL CHERNOFF INEQUALITIES 31

Finally, we give the complete proof of Theorem 2.8 and thus finish the proofs

for all the above theorems on Chernoff inequalities.

PX Y

n

E(etX ) = E(et i i

)= E(etXi )

i=1

P y k2 2(ey 1y)

We define g(y) = 2 k=2 k! = y2 , and use the following facts:

g(0) = 1.

g(y) 1, for y < 0.

g(y) is monotone increasing, for y 0.

For y < 3, we have

X

X

y k2 y k2 1

g(y) = 2 =

k! 3k2 1 y/3

k=2 k=2

Y

n

tX

E(e ) = E(etXi )

i=1

Yn X k k

t Xi

= E( )

i=1

k!

k=0

Y

n

1

= E(1 + tE(Xi ) + t2 Xi2 g(tXi ))

i=1

2

Y

n

1

(1 + tE(Xi ) + t2 E(Xi2 )g(tM ))

i=1

2

Y

n

1 2

E(Xi2 )g(tM)

etE(Xi )+ 2 t

i=1

1 2

P n

E(Xi2 )

= etE(X)+ 2 t g(tM) i=1

1 2

g(tM)kXk2

= etE(X)+ 2 t .

Pr(X E(X) + ) = Pr(etX etE(X)+t )

etE(X)t E(etX )

1 2

g(tM)kXk2

et+ 2 t

1 2

kXk2 1tM/3

1

et+ 2 t .

32 2. OLD AND NEW CONCENTRATION INEQUALITIES

To minimize the above expression, we choose t = kXk2 +M/3 . Therefore, tM < 3

and we have

1 2

kXk2 1tM/3

1

Pr(X E(X) + ) et+ 2 t

2

2(kXk2+M /3)

= e .

in Theorem 2.8. We first consider the upper tail.

Theorem 2.10. Let Xi denote independent

E(Xi ) + ai + M , for 1 i n. For, X = i=1 Xi , we have

Pn2 a2 +M /3)

Pr(X E(X) + ) e 2(Var(X)+

i=1 i .

Pn

Proof. Let Xi0 = Xi E(Xi ) ai and X 0 = i=1 Xi0 . We have

Xi0 M for 1 i n.

X

n

X 0 E(X 0 ) = (Xi0 E(Xi0 ))

i=1

X

n

= (Xi0 + ai )

i=1

X

n

= (Xi E(Xi ))

i=1

= X E(X).

Thus,

X

n

2

kX 0 k2 = E(Xi0 )

i=1

X

n

= E((Xi E(Xi ) ai )2 )

i=1

X

n

= E((Xi E(Xi ))2 + a2i

i=1

X

n

= Var(X) + a2i .

i=1

2.3. MORE CONCENTRATION INEQUALITIES 33

E(Xi ) + Mi , forP0 i n. We order the Xi s so that the Mi are in increasing

order. Let X = ni=1 Xi . Then for any 1 k n, we have

2(Var(X)+

Pn 2

(Mi Mk )2 +Mk /3)

Pr(X E(X) + ) e i=k .

0 if 1 i k,

ai =

Mi Mk if k i n.

We have

Xi E(Xi ) Mi ai + Mk for 1 k n,

X

n X

n

a2i = (Mi Mk )2 .

i=1 i=k

Using Theorem 2.10, we have

2(Var(X)+ Pn 2

(Mi Mk )2 +Mk /3)

Pr(Xi E(X) + ) e i=k .

1 i n 1, suppose Xi follows the same distribution with

Pr(Xi = 0) = 1 p and Pr(Xi = 1) = p,

and Xn follows the distribution with

Pr(Xn = 0) = 1 p and Pr(Xn = n) = p.

Pn

Consider the sum X = i=1 Xi .

We have

X

n

E(X) = E(Xi )

i=1

= (n 1)p + np.

X

n

Var(X) = Var(Xi )

i=1

= (n 1)p(1 p) + np(1 p)

= (2n 1)p(1 p).

Apply Theorem 2.6 with M = (1 p) n. We have

2

2((2n1)p(1p)+(1p)

Pr(X E(X) + ) e n/3) .

1

In particular, for constant p (0, 1) and = (n 2 + ), we have

Pr(X E(X) + ) e(n ) .

34 2. OLD AND NEW CONCENTRATION INEQUALITIES

n(1 p). Choosing k = n 1, we have

Var(X) + (Mn Mn1 )2 = (2n 1)p(1 p) + (1 p)2 ( n 1)2

(2n 1)p(1 p) + (1 p)2 n

(1 p2 )n.

Thus,

2

2((1p2 )n+(1p)

Pr(Xi E(X) + ) e 2 /3)

.

1

For constant p (0, 1) and = (n 2 + ), we have

2

Pr(X E(X) + ) e(n )

.

From the above examples, we note that Theorem 2.11 gives a significantly better

bound than that in Theorem 2.6 if the random variables Xi have very different upper

bounds.

For completeness, we also list the corresponding theorems for the lower tails.

(These can be derived by replacing X by X.)

P random variables satisfying Xi

E(Xi ) ai M , for 0 i n. For X = ni=1 Xi , we have

2(Var(X)+

Pn2 a2 +M /3)

Pr(X E(X) ) e i=1 i .

E(Xi ) Mi , forP0 i n. We order the Xi s so that the Mi are in increasing

n

order. Let X = i=1 Xi . Then for any 1 k n, we have

2(Var(X)+ Pn 2

(Mi Mk )2 +Mk /3)

Pr(X E(X) ) e i=k .

Continuing

the above example, we choose M1 = M2 = = Mn1 = p, and

Mn = np. We choose k = n 1, so we have

Var(X) + (Mn Mn1 )2 = (2n 1)p(1 p) + p2 ( n 1)2

(2n 1)p(1 p) + p2 n

p(2 p)n.

2

2(p(2p)n+p

Pr(X E(X) ) e 2 /3)

.

1

For a constant p (0, 1) and = (n 2 + ), we have

2

Pr(X E(X) ) e(n )

.

2.4. A CONCENTRATION INEQUALITY WITH A LARGE ERROR ESTIMATE 35

In the previous section, we saw that the Chernoff inequality gives very good

probabilistic estimates when a random variable is close to its expected value. Sup-

pose we allow the error bound to the expected value to be a positive fraction of the

expected value. Then we can obtain even better bounds for the probability of the

tails. The following two concentration inequalities can be found in [105].

For any > 0,

E(X)

e

(2.4) Pr(X (1 + )E(X)) .

(1 + )1+

For any 0 < < 1,

2

(2.5) Pr(X E(X)) e(1) E(X)/2

.

The above inequalities, however, are still not enough for our applications in

Chapter 7. We need the following somewhat stronger concentration inequality for

the lower tail.

For any 0 e1 , we have

Pn

Proof. Suppose that X = i=1 Xi , where Xi s are independent random vari-

ables with

36 2. OLD AND NEW CONCENTRATION INEQUALITIES

We have

bE(X)c

X

Pr(X E(X)) = Pr(X = k)

k=0

bE(X)c

X X Y Y

= pi (1 pi )

k=0 |S|=k iS i6S

bE(X)c

X X Y Y

pi epi

k=0 |S|=k iS i6S

bE(X)c

X X Y P

= pi e i6S pi

k=0 |S|=k iS

bE(X)c

X X Y P P

pi e pi +

n

pi

= i=1 iS

k=0 |S|=k iS

bE(X)c

X X Y

pi eE(X)+k

k=0 |S|=k iS

bE(X)c

X Pn

E(X)+k ( pi )k

e i=1

k!

k=0

bE(X)c

X (eE(X))k

= eE(X) .

k!

k=0

(eE(X))k

Note that g(k) = k! increases when k < eE(X). Let k0 = bE(X)c

E(X).

We have

X

k0

(eE(X))k

Pr(X E(X)) eE(X)

k!

k=0

(eE(X))k0

eE(X) (k0 + 1) .

k0 !

n n

n! 2n( )n ( )n ,

e e

2.5. MARTINGALES AND AZUMAS INEQUALITY 37

we have

(eE(X))k0

Pr(X E(X)) eE(X) (k0 + 1)

k0 !

2

e E(X) k0

eE(X) (k0 + 1)( )

k0

e2

eE(X) (E(X) + 1)( )E(X)

= (E(X) + 1)e(12+ ln )E(X) .

2

E(X) x

Here we replaced k0 by E(X) since the function (x + 1)( e x ) is increasing for

x < eE(X).

1 1

E(X)

1

since E(X) 1 and e1 1 . Thus, E(X) + 1 E(X).

1 ln x

Also, we have E(X) e. The function x is decreasing for x e. Thus,

ln E(X) ln 1

1 = ln .

E(X)

We have

Pr(X E(X)) (E(X) + 1)e(12+ ln )E(X)

E(X)e(12)E(X)e ln E(X)

e(12)E(X) e2 ln E(X)

= e(12(1ln ))E(X) .

The proof of Theorem 2.17 is complete.

such that the conditional expectation of Xn+1 given X0 , X1 , . . . , Xn is equal to Xn .

The above definition is given in the classical book of Feller (see [53], p. 210).

However, the conditional expectation depends on the random variables under con-

sideration and can be difficult to deal with in various cases. In this book we will

use the following definition which is concise and basically equivalent for the finite

cases.

denote a -field on . (A -field on is a collection of subsets of which contains

and , and is closed under unions, intersections, and complementation.) In a

-field F of , the smallest set in F containing an element x is the intersection of

all sets in F containing x. A function f : R is said to be F -measurable if

38 2. OLD AND NEW CONCENTRATION INEQUALITIES

f (x) = f (y) for any y in the smallest set containing x. (For more terminology on

martingales, the reader is referred to [80].)

by

X

E(f ) = E(f (x) | x ) := f (x)p(x).

x

the formula

1 X

E(f | F)(x) := P f (y)p(y)

yF (x) p(y) yF (x)

{0, } = F0 F1 Fn = F .

A martingale (obtained from) X is associated with a filter F and a sequence of

random variables X0 , X1 , . . . , Xn satisfying Xi = E(X | Fi ) and, in particular,

X0 = E(X) and Xn = X.

define a martingale X = Y1 +Y2 + +Yn as follows. Let Fi be the -field generated

by Y1 , . . . , Yi . (In other words, Fi is the minimum -field so that Y1 , . . . , Yi are Fi -

measurable.) We have a natural filter F:

{0, } = F0 F1 Fn = F .

Pi Pn

Let Xi = j=1 Yj + j=i+1 E(Yj ). Then, X0 , X1 , X2 , . . . , Xn form a martingale

corresponding to the filter F.

to be c-Lipschitz if

(2.7) |Xi Xi1 | ci

for i = 1, 2, . . . , n. A powerful tool for controlling martingales is the following:

Pn2 c2

(2.8) Pr(|X E(X)| ) 2e 2

i=1 i ,

where c = (c1 , . . . , cn ).

|Xi E(Xi )| ci ,

for 1 i n.

P

Then we have the following bound for the sum X = ni=1 Xi .

Pn2 c2

Pr(|X E(X)| ) 2e 2

i=1 i .

2.5. MARTINGALES AND AZUMAS INEQUALITY 39

f (x) = etx . For any |x| c, f (x) is below the line segment from (c, f (c)) to

(c, f (c)). In other words, we have

1 tc 1

etx (e etc )x + (etc + etc ).

2c 2

Therefore, we can write

1 tci 1

E(et(Xi Xi1 ) |Fi1 ) E( (e etci )(Xi Xi1 ) + (etci + etci )|Fi1 )

2ci 2

1 tci

= (e + etci )

2

2 2

et ci /2 .

Here we apply the conditions E(Xi Xi1 |Fi1 ) = 0 and |Xi Xi1 | ci .

Hence,

2 2

E(etXi |Fi1 ) et ci /2

etXi1 .

Inductively, we have

2 2

et cn /2

E(etXn1 )

Yn

2 2

etci /2

E(etX0 )

i=1

1 2

P n

c2i tE(X)

= e2t i=1 e .

Therefore,

et E(et(XE(X)))

1 2

P c2i

et e 2 t

n

1 2

P i=1

c2i

= et+ 2

n

t i=1 .

We choose t = Pc

n 2 (in order to minimize the above expression). We have

P

i=1 i

1 2

c2i

et+ 2 t

n

Pr(X E(X) + ) i=1

2

Pn2 c2

= e i=1 i .

proof. Then we obtain the following bound for the lower tail.

Pn2 c2

Pr(X E(X) ) e 2

i=1 i .

40 2. OLD AND NEW CONCENTRATION INEQUALITIES

Many problems which can be set up as a martingale do not satisfy the Lipschitz

condition. It is desirable to be able to use tools similar to Azumas inequality in

such cases. In this section, we will first state and then prove several extensions of

Azumas inequality (see Figure 9).

Our starting point is the following well known concentration inequality (see

[99]):

Theorem 2.21. Let X be the martingale associated with a filter F satisfying

(2) Xi Xi1 M , for 1 i n.

Then, we have

Pn 2

2 +M /3)

Pr(X E(X) ) e 2(

i=1 i .

(see Example 2.18), Theorem 2.21 implies Theorem 2.6. In a similar way, the

following theorem is associated with Theorem 2.10.

Theorem 2.22. Let X be the martingale associated with a filter F satisfying

(2) Xi Xi1 Mi , for 1 i n.

Then, we have

Pn 2

(2 +M 2 )

Pr(X E(X) ) e 2

i=1 i i .

2.6. GENERAL MARTINGALE INEQUALITIES 41

(2) Xi Xi1 ai + M , for 1 i n.

Then, we have

Pn 2

(2 +a2 )+M /3)

Pr(X E(X) ) e 2(

i=1 i i .

(2) Xi Xi1 Mi , for 1 i n.

Pn 2 +

PM >M

2

(Mi M )2 +M /3)

Pr(X E(X) ) e 2(

i=1 i i .

0 if Mi M,

ai =

Mi M if Mi M.

It suffices to prove Theorem 2.23 so that all the above stated theorems hold.

P y k2

Recall that g(y) = 2 k=2 k! satisfies the following properties:

limy0 g(y) = 1.

g(y) is monotone increasing, for y 0.

1

When b < 3, we have g(b) 1b/3 .

42 2. OLD AND NEW CONCENTRATION INEQUALITIES

X k

t

E(et(Xi Xi1 ai ) |Fi1 ) = E( (Xi Xi1 ai )k |Fi1 )

k!

k=0

X k

t

= 1 tai + E( (Xi Xi1 ai )k |Fi1 )

k!

k=2

2

t

1 tai + E( (Xi Xi1 ai )2 g(tM )|Fi1 )

2

t2

= 1 tai + g(tM )E((Xi Xi1 ai )2 |Fi1 )

2

t2

= 1 tai + g(tM )(E((Xi Xi1 )2 |Fi1 ) + a2i )

2

t2

1 tai + g(tM )(i2 + a2i )

2

t2 2 2

etai + 2 g(tM)(i +ai ) .

Thus,

E(etXi |Fi1 ) = E(et(Xi Xi1 ai ) |Fi1 )etXi1 +tai

t2 2 2

etai + 2 g(tM)(i +ai ) etXi1 +tai

t2 2 2

= e 2 g(tM)(i +ai ) etXi1 .

Inductively, we have

E(etX ) = E(E(etXn |Fn1 ))

t2 2 2

e 2 g(tM)(n +an ) E(etXn1 )

Yn

t2 2 2

e 2 g(tM)(i +ai ) E(etX0 )

i=1

1 2

P 2 2

i=1 (i +ai )

n

= e2t g(tM)

etE(X) .

Then for t satisfying tM < 3, we have

Pr(X E(X) + ) = Pr(etX etE(X)+t )

etE(X)t E(etX )

1 2

P 2 2

et e 2 t i=1 (i +ai )

n

P

g(tM)

1 2 2 2

= et+ 2 i=1 (i +ai )

n

P

t g(tM)

t2 2 2

t+ 12 1tM/3 i=1 (i +ai )

n

e .

We choose t = P 2

2

i=1 (i +ai )+M/3

n . Clearly tM < 3 and

1 t2

P 2 2

et+ 2 1tM/3 i=1 (i +ci )

n

Pr(X E(X) + )

2(

Pn 2

(2 +c2 )+M /3)

= e i=1 i i .

The proof of the theorem is complete.

2.7. SUPERMARTINGALES AND SUBMARTINGALES 43

For completeness, we state the following theorems for the lower tails. The

proofs are almost identical and will be omitted.

Theorem 2.25. Let X be the martingale associated with a filter F satisfying

(2) Xi1 Xi ai + M , for 1 i n.

Then, we have

Pn 2

(2 +a2 )+M /3)

Pr(X E(X) ) e 2(

i=1 i i .

Theorem 2.26. Let X be the martingale associated with a filter F satisfying

(2) Xi1 Xi Mi , for 1 i n.

Then, we have

Pn 2

(2 +M 2 )

Pr(X E(X) ) e 2

i=1 i i .

Theorem 2.27. Let X be the martingale associated with a filter F satisfying

(2) Xi1 Xi Mi , for 1 i n.

Pn 2 +

PM >M

2

(Mi M )2 +M /3)

Pr(X E(X) ) e 2(

i=1 i i .

inequalities that were mentioned so far. Instead of a fixed upper bound for the

variance, we will assume that the variance Var(Xi |Fi1 ) is upper bounded by a

linear function of Xi1 . Here we assume this linear function is non-negative for all

values that Xi1 takes. We first need some terminology.

For a filter F:

{, } = F0 F1 Fn = F ,

a sequence of random variables X0 , X1 , . . . , Xn is called a submartingale if Xi is

Fi -measurable (i.e., Xi (a) = Xi (b) if all elements of Fi containing a also contain b

and vice versa) then E(Xi | Fi1 ) Xi1 , for 1 i n.

if Xi is Fi -measurable and E(Xi | Fi1 ) Xi1 , for 1 i n.

To avoid repetition, we will first state a number of useful inequalities for sub-

martingales and supermartingales. Then we will give the proof for the general

inequalities in Theorem 2.32 for submartingales and in Theorem 2.30 for super-

martingales. Furthermore, we will show that all the stated theorems follow from

44 2. OLD AND NEW CONCENTRATION INEQUALITIES

Theorems 2.32 and 2.30 (See Figure 10). Note that the inequalities for submartin-

gales and supermartingales are not quite symmetric.

satisfies

Var(Xi |Fi1 ) i Xi1

and

Xi E(Xi |Fi1 ) M

for 1 i n. Then we have

2((X Pn2

Pr(Xn X0 + ) e 0 +)( i=1

i )+M /3)

.

Theorem 2.29. Suppose that a submartingale X, associated with a filter F,

satisfies, for 1 i n,

Var(Xi |Fi1 ) i Xi1

and

E(Xi |Fi1 ) Xi M.

Then we have

2(X Pn 2

Pr(Xn X0 ) e 0( i=1

i )+M /3)

,

for any X0 .

Theorem 2.30. Suppose that a supermartingale X, associated with a filter F,

satisfies

Var(Xi |Fi1 ) i2 + i Xi1

and

Xi E(Xi |Fi1 ) ai + M

for 1 i n. Here i , ai , i and M are non-negative constants. Then we have

Pn 2

(2 +a2 )+(X0 +)(

Pn

Pr(Xn X0 + ) e 2(

i=1 i i i=1

i )+M /3)

.

Remark 2.31. Theorem 2.30 implies Theorem 2.28 by setting all i s and ai s

to zero. Theorem 2.30 also implies Theorem 2.23 by choosing 1 = = n = 0.

the condition on the variance.

2.7. SUPERMARTINGALES AND SUBMARTINGALES 45

fies, for 1 i n,

Var(Xi |Fi1 ) i2 + i Xi1

and

E(Xi |Fi1 ) Xi ai + M,

where M , ai s, i s, and i s are non-negative constants. Then we have

Pn 2

(2 +a2 )+X0 (

Pn

Pr(Xn X0 ) e 2(

i=1 i i i=1

i )+M /3)

,

P

for any 2X0 +

n

P n

2

i=1

2

i=1 (i +ai )

.

Remark 2.33. Theorem 2.32 implies Theorem 2.29 by setting all i s and ai s

to zero. Theorem 2.32 also implies Theorem 2.25 by choosing 1 = = n = 0.

E(etXi |Fi1 ) = etE(Xi |Fi1 )+tai E(et(Xi E(Xi |Fi1 )ai ) |Fi1 )

k

X

tE(Xi |Fi1 )+tai t

= e E((Xi E(Xi |Fi1 ) ai )k |Fi1 )

k!

tE(Xi |Fi1 )+

P k=0

tk

E((Xi E(Xi |Fi1 )ai )k |Fi1 )

e k=2 k! .

P y k2

Recall that g(y) = 2 k=2 k! satisfies

1

g(y) g(b) <

1 b/3

for all y b and 0 b 3.

k

X t

E((Xi E(Xi |Fi1 ) ai )k |Fi1 )

k!

k=2

g(tM ) 2

t E((Xi E(Xi |Fi1 ) ai )2 |Fi1 )

2

g(tM ) 2

= t (Var(Xi |Fi1 ) + a2i )

2

g(tM ) 2 2

t (i + i Xi1 + a2i ).

2

Since E(Xi |Fi1 ) Xi1 , we have

P tk

E(etXi |Fi1 ) etE(Xi |Fi1 )+ k=2 k! E((Xi E(Xi |Fi1 )ai )k |Fi1 )

g(tM ) 2

t (i2 +i Xi1 +a2i )

etXi1 + 2

g(tM ) t2

i t2 )Xi1 2 2

= e(t+ 2 e 2 g(tM)(i +ai ) .

46 2. OLD AND NEW CONCENTRATION INEQUALITIES

g(t0 M ) 2

ti1 = ti + i ti ,

2

while t0 will be chosen later. Then

tn tn1 t0 ,

and

g(ti M ) t2

i t2i )Xi1 2 2

e(ti + e 2 g(ti M)(i +ai )

i

E(eti Xi |Fi1 ) 2

g(t0 M ) 2 t2 2 2

e(ti + ti i )Xi1

e 2 g(ti M)(i +ai )

i

2

t2 2 2

eti1 Xi1 e 2 g(ti M)(i +ai )

i

=

since g(y) is increasing for y > 0.

Pr(Xn X0 + ) etn (X0 +) E(etn Xn )

= etn (X0 +) E(E(etn Xn |Fn1 ))

t2 2 2

etn (X0 +) E(etn1 Xn1 )e 2 g(ti M)(i +ai )

i

P t2

g(ti M)(i2 +a2i )

etn (X0 +) E(et0 X0 )e

n i

i=1 2

t2

0

P 2 2

etn (X0 +)+t0 X0 + 2 g(t0 M) i=1 (i +ai )

n

.

Note that

X

n

tn = t0 (ti1 ti )

i=1

Xn

g(t0 M ) 2

= t0 i ti

i=1

2

g(t0 M ) 2 X

n

t0 t0 i .

2 i=1

Hence

t2

0

P 2 2

etn (X0 +)+t0 X0 + 2 g(t0 M) i=1 (i +ai )

n

Pr(Xn X0 + )

P g(t0 M ) 2 t2

0

P 2 2

e(t0 i )(X0 +)+t0 X0 + i=1 (i +ai )

n n

t0 g(t0 M)

P P

2 i=1 2

g(t0 M ) 2 2 2

= et + t ( i=1 (i +ai )+(X0 +) i )

n n

0 2 0 i=1 .

Now we choose t0 = P ( +a )+(X +)(

n 2

P 2

0

n

i )+M/3

. Using the fact that t0 M <

i=1 i i i=1

P P

3, we have

2 2 2

t +t ( i=1 (i +ai )+(X0 +)

n n

i ) 2(1t1 M/3)

Pr(Xn X0 + ) e 0 0 i=1 0

2(

P 2

n (2 +a2 )+(X +)( Pn i )+M /3)

= e i=1 i i 0 i=1 .

The proof of the theorem is complete.

2.7. SUPERMARTINGALES AND SUBMARTINGALES 47

The proof is quite similar to that of Theorem 2.30. The following inequality

still holds.

E(etXi |Fi1 ) = etE(Xi |Fi1 )+tai E(et(Xi E(Xi |Fi1 )+ai ) |Fi1 )

X k

t

= etE(Xi |Fi1 )+tai E((E(Xi |Fi1 ) Xi ai )k |Fi1 )

k!

tE(Xi |Fi1 )+

P k=0

tk

E((E(Xi |Fi1 )Xi ai )k |Fi1 )

e k=2 k!

g(tM ) 2

t E((Xi E(Xi |Fi1 )ai )2 )

etE(Xi |Fi1 )+ 2

g(tM ) 2

t (Var(Xi |Fi1 )+a2i )

etE(Xi |Fi1 )+ 2

g(tM ) 2 g(tM ) 2

t (i2 +a2i )

e(t 2 t i )Xi1

e 2 .

g(tn M ) 2

ti1 = ti i ti ,

2

while tn will be defined later. Then we have

t0 t1 tn ,

and

g(ti M ) 2 g(ti M ) 2

ti (i2 +a2i )

E(eti Xi |Fi1 ) e(ti 2 ti i )Xi1

e 2

g(tn M ) 2 g(tn M ) 2

ti (i2 +a2i )

e(ti 2 ti i )Xi1

e 2

g(tn M ) 2

ti (i2 +a2i )

= eti1 Xi1 e 2 .

etn (X0 ) E(etn Xn )

= etn (X0 ) E(E(etn Xn |Fn1 ))

g(tn M ) 2 2

+a2n )

etn (X0 ) E(etn1 Xn1 )e 2 tn (n

P g(tn M ) 2

ti (i2 +a2i )

etn (X0 ) E(et0 X0 )e

n

i=1 2

t2 P 2 2

etn (X0 )t0 X0 + i=1 (i +ai )

n n

g(tn M)

2 .

We note

X

n

t0 = tn + (ti1 ti )

i=1

Xn

g(tn M ) 2

= tn i ti

i=1

2

g(tn M ) 2 X

n

tn tn i .

2 i=1

48 2. OLD AND NEW CONCENTRATION INEQUALITIES

Thus, we have

t2 P 2 2

etn (X0 )t0 X0 + i=1 (i +ai )

n n

Pr(Xn X0 ) 2 g(tn M)

g(tn M ) 2 t2 P 2 2

etn (X0 )(tn tn )X0 + 2n i=1 (i +ai )

n

g(tn M)

P P

2

g(tn M ) 2 2 2

etn + t ( ( +a )+( i )X0 )

n n

= 2 n i=1 i i i=1 .

We choose tn = P 2 2

i=1 (i +ai )+(

P

i )X0 +M/3

. We have tn M < 3 and

P P

n n

i=1

2 2 2 1

t +t ( i=1 (i +ai )+(

n n

i )X0 ) 2(1tn

Pr(Xn X0 ) e n n i=1 M/3)

Pn 2

(2 +a2 )+X0 (

Pn

e 2(

i=1 i i i=1

i )+M /3)

.

It remains to verify that all ti s are non-negative. Indeed,

ti t0

g(tn M ) 2 X

n

tn tn i

2 i=1

1 X n

tn 1 tn i

2(1 tn M/3) i=1

tn 1

P

=

2X0 + P 2 2

i=1 (i +ai )

n

n

i=1 i

0.

The proof of the theorem is complete.

which is not strictly Lipschitz but is nearly Lipschitz. Namely, the (Lipschitz-

like) assumptions are allowed to fail for relatively small subsets of the probability

space and we can still have similar but weaker concentration inequalities. Similar

techniques have been introduced by Kim and Vu [81] in their important work on

deriving concentration inequalities for multivariate polynomials. The basic setup

for decision trees can be found in [5] and has been used in the work of Alon, Kim and

Spencer [4]. Wormald [124] considers martingales with a stopping time that has

a similar flavor. Here we use a rather general setting and we shall give a complete

proof here.

We are only interested in finite probability spaces and we use the following

computational model. The random variable X can be evaluated by a sequence of

decisions Y1 , Y2 , . . . , Yn . Each decision has finitely many outputs. The probability

that an output is chosen depends on the previous history. We can describe the

process by a decision tree T , a complete rooted tree with depth n. Each edge uv of

T is associated with a probability puv depending on the decision made from u to

v. Note that for any node u, we have

X

puv = 1.

v

2.8. THE DECISION TREE AND RELAXED CONCENTRATION INEQUALITIES 49

We allow puv to be zero and thus include the case of having fewer than r outputs

for some fixed r. Let i denote the probability space obtained after the first i

decisions. Suppose = n and X is the random variable on . Let i : i

be the projection mapping each point to the subset of points with the same first i

decisions. Let Fi be the -field generated by Y1 , Y2 , . . . , Yi . (In fact, Fi = i1 (2i )

is the full -field via the projection i .) The Fi form a natural filter:

{, } = F0 F1 Fn = F .

The leaves of the decision tree are exactly the elements of . Let X0 , X1 , . . . , Xn =

X denote the sequence of decisions to evaluate X. Note that Xi is Fi -measurable,

and can be interpreted as a labeling on nodes at depth i.

for i = 0, 1, . . . , n.

A vertex labeling of the decision tree T , f : V (T ) R.

In order to simplify and unify the proofs for various general types of martingales,

here we introduce a definition for a function f : V (T ) R. We say f satisfies an

admissible condition P if P = {Pv } holds for every vertex v.

E(Xi |Fi1 ) Xi1 .

Thus the admissible condition Pu holds if

X

f (u) puv f (v)

vC(u)

where Cu is the set of all children nodes of u and puv is the transition

probability at the edge uv.

(2) Supermartingale: For 1 i n, we have

E(Xi |Fi1 ) Xi1 .

In this case, the admissible condition of the submartingale is

X

f (u) puv f (v).

vC(u)

E(Xi |Fi1 ) = Xi1 .

The admissible condition of the martingale is then

X

f (u) = puv f (v).

vC(u)

|Xi Xi1 | ci .

50 2. OLD AND NEW CONCENTRATION INEQUALITIES

follows:

|f (u) f (v)| ci , for any child v C(u)

where the node u is at level i of the decision tree.

(5) Bounded Variance: For 1 i n, we have

Var(Xi |Fi1 ) i2

for some constants i .

The admissible condition of the bounded variance property can be

described as:

X X

puv f 2 (v) ( puv f (v))2 i2 .

vC(u) vC(u)

Var(Xi |Fi1 ) i2 + i Xi1

where i , i are non-negative constants, and Xi 0. The admissible

condition of the general bounded variance property can be described as

follows:

X X

puv f 2 (v) ( puv f (v))2 i2 + i f (u), and f (u) 0

vC(u) vC(u)

(7) Upper-bounded: For 1 i n, we have

Xi E(Xi |Fi1 ) ai + M

where ai s, and M are non-negative constants. The admissible condition

of the upper bounded property can be described as follows:

X

f (v) puv f (v) ai + M, for any child v C(u)

vC(u)

(8) Lower-bounded: For 1 i n, we have

E(Xi |Fi1 ) Xi ai + M

where ai s, and M are non-negative constants. The admissible condition

of the lower bounded property can be described as follows:

X

( puv f (v)) f (v) ai + M, for any child v C(u)

vC(u)

For any labeling f on T and fixed vertex r, we can define a new labeling fr as

follows:

f (r) if u is a descendant of r,

fr (u) =

f (u) otherwise.

labeling f satisfying P , and a vertex r, fr satisfies P .

2.8. THE DECISION TREE AND RELAXED CONCENTRATION INEQUALITIES 51

submartingale, supermartingale, martingale, c-Lipschitz, bounded variance, general

bounded variance, upper-bounded, and lower-bounded are all invariant under

subtree-unification.

Proof. We note that these properties are all admissible conditions. Let P

denote any one of these. For any node u, if u is not a descendant of r, then fr and

f have the same value on v and its children nodes. Hence, Pu holds for fr since Pu

does for f .

its children nodes. We verify Pu in each case. Assume that u is at level i of the

decision tree T .

X X

puv fr (v) = puv f (r)

vC(u) vC(u)

X

= f (r) puv

vC(u)

= f (r)

= fr (u).

Hence, Pu holds for fr .

(2) For c-Lipschitz property, we have

|fr (u) fr (v)| = 0 ci , for any child v C(u).

Again, Pu holds for fr .

(3) For the bounded variance property, we have

X X X X

puv fr2 (v) ( puv fr (v))2 = puv f 2 (r) ( puv f (r))2

vC(u) vC(u) vC(u) vC(u)

2 2

= f (r) f (r)

= 0

i2 .

(4) For the general bounded variance property, we have

fr (u) = f (r) 0.

X X X X

puv fr2 (v) ( puv fr (v))2 = puv f 2 (r) ( puv f (r))2

vC(u) vC(u) vC(u) vC(u)

2 2

= f (r) f (r)

= 0

i2 + i fr (u).

52 2. OLD AND NEW CONCENTRATION INEQUALITIES

X X

fr (v) puv fr (v) = f (r) puv f (r)

vC(u) vC(u)

= f (r) f (r)

= 0

ai + M.

for any child v of u.

(6) For the lower-bounded property, we have

X X

puv fr (v) fr (v) = puv f (r) f (r)

vC(u) vC(u)

= f (r) f (r)

= 0

ai + M,

for any child v of u.

which is only true when both P and Q are true. If both admissible conditions

P and Q are invariant under subtree-unification, then P Q is also invariant under

subtree-unification.

For any vertex u of the tree T , an ancestor of u is a vertex lying on the unique

path from the root to u. For an admissible condition P , the associated bad set Bi

over Xi s is defined to be

Bi = {v| the depth of v is i, and Pu does not hold for some ancestor u of v}.

Lemma 2.35. For a filter F

{, } = F0 F1 Fn = F ,

suppose each random variable Xj is Fi -measurable, for 0 i n. For any admis-

sible condition P , let Bi be the associated bad set of P over Xi . There are random

variables Y0 , . . . , Yn satisfying:

(1) Yi is Fi -measurable.

(2) Y0 , . . . , Yn satisfy condition P .

(3) {x : Yi (x) 6= Xi (x)} Bi , for 0 i n.

0 f (u) if f satisfies Pv for every ancestor v of u including u itself.

f (u) =

f (v) v is the ancestor with smallest depth so that f fails Pv .

f fails Pu ,

f satisfies Pv for every ancestor v of u.

2.8. THE DECISION TREE AND RELAXED CONCENTRATION INEQUALITIES 53

where S is the set of the roots of subtrees. Furthermore, the order of subtree-

unifications does not matter. Since P is invariant under subtree-unifications, the

number of vertices that P fails decreases. Now we will show f 0 satisfies P .

Suppose to the contrary that f 0 fails Pu for some vertex u. Since P is invariant

under subtree-unifications, f also fails Pu . By the definition, there is an ancestor v

(of u) in S. After the subtree-unification on the subtree rooted at v, Pu is satisfied.

This is a contradiction.

Then the Yi s satisfy the desired properties.

restricted version can be found in [81].

Theorem 2.36. For a filter F

{, } = F0 F1 Fn = F ,

suppose the random variable Xi is Fi -measurable, for 0 i n. Let Bi denote the

bad set associated with the following admissible conditions:

E(Xi |Fi1 ) = Xi1

|Xi Xi1 | ci

where c1 , c2 , . . . , cn are non-negative numbers. Let B = ni Bi denote the union of

all bad sets. Then we have

Pn2c2

Pr(|Xn X0 | ) 2e 2

i=1 i + Pr(B).

isfying properties (1)-(3) in the statement of Lemma 2.35. Then it satisfies

E(Yi |Fi1 ) = Yi1

|Yi Yi1 | ci .

In other words, Y0 , . . . , Yn form a martingale which is (c1 , . . . , cn )-Lipschitz. By

Azumas inequality, we have

Pn2c2

Pr(|Yn Y0 | ) 2e 2

i=1 i .

Since Y0 = X0 and {x : Yn (x) 6= Xn (x)} i=1n Bi = B, we have

Pr(|Xn X0 | ) Pr(|Yn Y0 | ) + Pr(Xn 6= Yn )

Pn2c2

2e 2

i=1 i + Pr(B).

near-c-Lipschitz with an exceptional probability if

X

(2.9) Pr(|Xi Xi1 | ci ) .

i

Theorem 2.36 can be restated as follows:

54 2. OLD AND NEW CONCENTRATION INEQUALITIES

X is near-c-Lipschitz with an exceptional probability . Then X satisfies

Pna2 c2

Pr(|X E(X)| a) 2e 2

i=1 i + .

Now, we can use the same technique to relax all the theorems in the previous

sections.

Here are the relaxed versions of Theorems 2.23, 2.28, and 2.30.

Theorem 2.38. For a filter F

{, } = F0 F1 Fn = F ,

suppose a random variable Xi is Fi -measurable, for 0 i n. Let Bi be the bad

set associated with the following admissible conditions:

E(Xi | Fi1 ) Xi1

Var(Xi |Fi1 ) i2

Xi E(Xi |Fi1 ) ai + M

where i , ai and M are non-negative constants. Let B = ni=1 Bi be the union of all

bad sets. Then we have

Pn 2

(2 +a2 )+M /3)

Pr(Xn X0 + ) e 2(

i=1 i i + Pr(B).

Theorem 2.39. For a filter F

{, } = F0 F1 Fn = F ,

suppose a non-negative random variable Xi is Fi -measurable, for 0 i n. Let

Bi be the bad set associated with the following admissible conditions:

E(Xi | Fi1 ) Xi1

Var(Xi |Fi1 ) i Xi1

Xi E(Xi |Fi1 ) M

where i and M are non-negative constants. Let B = ni=1 Bi be the union of all

bad sets. Then we have

2((X Pn2

Pr(Xn X0 + ) e 0 +)( i=1

i )+M /3)

+ Pr(B).

Theorem 2.40. For a filter F

{, } = F0 F1 Fn = F ,

suppose a non-negative random variable Xi is Fi -measurable, for 0 i n. Let

Bi be the bad set associated with the following admissible conditions:

E(Xi | Fi1 ) Xi1

Var(Xi |Fi1 ) i2 + i Xi1

Xi E(Xi |Fi1 ) ai + M

where i , i , ai and M are non-negative constants. Let B = ni=1 Bi be the union of

all bad sets. Then we have

Pn 2

(2 +a2 )+(X0 +)(

Pn

Pr(Xn X0 + ) e 2(

i=1 i i i=1

i )+M /3)

+ Pr(B).

2.8. THE DECISION TREE AND RELAXED CONCENTRATION INEQUALITIES 55

2.29, and 2.32.

{, } = F0 F1 Fn = F ,

suppose a random variable Xi is Fi -measurable, for 0 i n. Let Bi be the bad

set associated with the following admissible conditions:

E(Xi | Fi1 ) Xi1

Var(Xi |Fi1 ) i2

E(Xi |Fi1 ) Xi ai + M

where i , ai and M are non-negative constants. Let B = ni=1 Bi be the union of all

bad sets. Then we have

Pn 2

(2 +a2 )+M /3)

Pr(Xn X0 ) e 2(

i=1 i i + Pr(B).

{, } = F0 F1 Fn = F ,

suppose a random variable Xi is Fi -measurable, for 0 i n. Let Bi be the bad

set associated with the following admissible conditions:

E(Xi | Fi1 ) Xi1

Var(Xi |Fi1 ) i Xi1

E(Xi |Fi1 ) Xi M

where i and M are non-negative constants. Let B = ni=1 Bi be the union of all

bad sets. Then we have

2(X Pn 2

Pr(Xn X0 ) e 0( i=1

i )+M /3)

+ Pr(B).

for all X0 .

{, } = F0 F1 Fn = F ,

suppose a non-negative random variable Xi is Fi -measurable, for 0 i n. Let

Bi be the bad set associated with the following admissible conditions:

E(Xi | Fi1 ) Xi1

Var(Xi |Fi1 ) i2 + i Xi1

E(Xi |Fi1 ) Xi ai + M

where i , i , ai and M are non-negative constants. Let B = ni=1 Bi be the union of

all bad sets. Then we have

Pn 2

(2 +a2 )+X0 (

Pn

Pr(Xn X0 ) e 2(

i=1 i i i=1

i )+M /3)

+ Pr(B),

for < X0 .

56 2. OLD AND NEW CONCENTRATION INEQUALITIES

The best way to see the powerful effect of the concentration and martingale

inequalities, as stated in this chapter, is to check out many interesting applications.

Indeed, the inequalities here are especially useful for estimating the error bounds in

the random graphs that we shall discuss in subsequent chapters. The applications

for random graphs of the off-line models are easier than those for the on-line models.

The concentration results in Chapter 3 (for the preferential attachment scheme) and

Chapter 4 (for the duplication model) are all quite complicated. For a beginner,

a good place to start is Chapter 5 on classical random graphs of the Erdos-Renyi

model and the generalization of random graph models with given expected degrees.

An earlier version of this chapter has appeared as a survey paper [36] and includes

some further applications.

- Math Concepts Everyone Should Know by Metin BektasUploaded byYahia Bob
- 99ebook_com-0132311232Uploaded byAli Qazmi
- psqt imp questions[1][1].docUploaded bySai Santhosh
- ML103430339Uploaded byAmjad Ali Malik
- ch03aUploaded byMuhammad Zeshan
- Statistic Paper -2Uploaded byrupeshdahake
- probabilityUploaded byVinit Gupta
- 4594827 Probability Concepts and ApplicationsUploaded bySaad Ali
- Error analysis lecture 3Uploaded byOmegaUser
- quest old323232Uploaded byruel
- ch06Uploaded bySwapnil Bonde
- Module 18 Probability DistributionsUploaded bytheuniquecollection
- Statistics Homework HelpUploaded byTutorsonnet
- lognamrmUploaded byRobin Lau
- 466_Chapter7sUploaded byescialsped
- JIM 104 2011-12.docUploaded bySharuvindan Nair
- is.397.2.2003Uploaded bykannankriv
- Chapter 05_intan RevisedUploaded byAkhmal Rizal
- Lecture 8.pdfUploaded bySarah Seunarine
- 6.2.3.2Uploaded byajaymechengineer
- Topic2 ProbabilityUploaded byelsiekakas8364
- Diapositivas cap7Uploaded byIsrael Zepahua
- Distributions 325Uploaded bymichaelnashed
- 2014 MAM Analysis Task 2Uploaded byVincentLiang
- lect3BinHypergeo.pptUploaded byDEVANSHI MATHUR
- Technique of Project ManagementUploaded byKumar Rupesh
- Abrams, Marshall - ''Toward a Mechanistic Interpretation of Probability'' - paper DRAFT - Jun 2009.pdfUploaded bySanctro
- cn4Uploaded byPreeti Agarwal
- Ncert Solutions for Class 11 Maths Chapter_8_Binomial_Theorem(2)Uploaded byRaju Kumar
- Estadisticas EnUploaded byNitesh Jain

- Wsc Poster MainUploaded bylifengpei
- Red Scarf France Visa Itinerary Template 1Uploaded bylifengpei
- 1395.full.pdfUploaded bylifengpei
- convex_analysis_and_optimization.pdfUploaded bylifengpei
- bookdraft2017june19.pdfUploaded bylifengpei
- 80979827-Real-Analysis-Modern-Techniques-Folland (1).pdfUploaded bylifengpei
- Concentration.pdfUploaded bylifengpei
- Frank WolfeUploaded bylifengpei
- Measuring Avogadro's Number Using Brownian MotionUploaded bylifengpei

- KontraUploaded byagunkw
- Surgical Pathology of the Parietal PericardiumUploaded byme_raloo
- Feminist IR Theory and Quantitative Methodology_A Critical Analysis - CopieUploaded byAlina Alice
- UMD Essential CapacitiesUploaded byPremed
- Www.radiologyassistant.nl en 4befacb3e4691Uploaded bySarat Kumar Chalasani
- Literature ReviewUploaded byMuhammad Faisal
- Wk 5 AlgorithmsUploaded byPei Yin
- AFF Rules 2nd Ed - Luck SystemUploaded byjason
- VIOLENCE AND LIBERATION- FANON’S POLITICAL PHILOSOPHY OF HUMANIZATION IN THE HISTORICAL CONTEXT OF RACISM AND COLONIALISM a dissertation by Simon John Makuru.pdfUploaded byTernass
- Chemistry Notes a2Uploaded byAkalya Raviraj
- Edu Technology .PptUploaded byDevika Rajan
- shruti internship report.docxUploaded byYuvraj Ramawat
- Week 6 AnnotatedUploaded byBob
- A Gratefulness Guide to Jewish FestivalsUploaded byHenry Glazer
- The Old TownUploaded byEzizHemidov
- eelpart3 for printingUploaded byapi-173833928
- SentryGlas Elastic PropertiesUploaded bywepverro
- Lecture 3 Ocado Case StudyUploaded byHajrë Hyseni -Ari
- Perception and Individual Decision Making Mcqs - 7335 Words _ BartlebyUploaded byamrin jannat
- Chapter 15Uploaded byRhiya Sky
- Technodeleuze and Mille PlateauxUploaded byav
- 1-s2.0-S1876610213011739-mainUploaded bybcshende
- Jurisdiction of the Court1Uploaded byAnonymous lYBiiLh
- Group Report Guideline -2012Uploaded bylooneytune
- Seginus Inc is Proud to Announce New PMA Release: 725495EH Thrust Washer (OEM 725495 Thrust Washer)Uploaded byPR.com
- Ecdis Inspectors Top 36 Ecdis 30 04 17Uploaded bymarijankobaraba
- Following ModelUploaded byAnup Pednekar
- Chapter 10 Global Climate ProjectionsUploaded byIntergovernmental Panel on Climate Change
- Green Tea, Camellia Sinensis, in Healthy Energy DrinksUploaded byOllie
- Week 4 Required Quiz 1Uploaded byDonald112