You are on page 1of 27

1/29/2015

3. Moments and Deviations

• We examine techniques for bounding the tail


distribution
– the probability that a random variable
assumes values that are far from its
expectation
• In the analysis of algorithms, these bounds are
the major tool for estimating the failure
probability of algorithms and for establishing
high probability bounds on their run-time

MAT-72306 RandAl, Spring 2015 29-Jan-15 117

3.1. Markov's Inequality

• Markov's inequality is often too weak to yield


useful results, but it is a fundamental tool in
developing more sophisticated bounds

Theorem 3.1 [Markov's Inequality]:


Let be a RV that assumes only nonnegative
values. Then, for all > 0,
]
Pr

MAT-72306 RandAl, Spring 2015 29-Jan-15 118

1
1/29/2015

Proof: For > 0, let


if
=
otherwise
and note that, since 0,
.
Because is a 0 1 random variable,
= Pr = 1 = Pr .
Taking expectations thus yields
Pr = = .
MAT-72306 RandAl, Spring 2015 29-Jan-15 119

• Let us use Markov's inequality to bound the


probability of obtaining more than /4 heads in
a sequence of fair coin flips

if the th coin flip is heads


=
otherwise
and = denotes the number of heads
in the coin flips
• Since ] = Pr = 1) = 1/2, it follows that
= ]= 2
• Applying Markov's inequality, we obtain
2 2
Pr 4 = =
4 4 3
MAT-72306 RandAl, Spring 2015 29-Jan-15 120

2
1/29/2015

3.2. Variance and Moments of a RV

• Markov's inequality gives the best tail bound


possible when all we know is the expectation of
the RV and that the variable is nonnegative
• If more information about the distribution of the
random RV is available, it can be improved
• Additional information about a RV is often
expressed in terms of its moments

Definition 3.1: The th moment of a RV is ]

MAT-72306 RandAl, Spring 2015 29-Jan-15 121

• Given the first and second moments, one can


compute the variance and standard deviation of
the RV
• Intuitively, the variance and standard deviation
offer a measure of how far the RV is likely to be
from its expectation

Definition 3.2: The variance of a RV is


]

The standard deviation of a RV is


]=

MAT-72306 RandAl, Spring 2015 29-Jan-15 122

3
1/29/2015

• The two forms of the variance in the definition


are equivalent, as is easily seen by the linearity
of expectations
• Keeping in mind that is a constant, we have

[ ]= ]

MAT-72306 RandAl, Spring 2015 29-Jan-15 123

• If a RV is constant – always assumes the


same value – then its and are zero
• More generally, if a RV takes on the value
] with probability and the value 0 with
probability , =( 1) ] and
= ]
• The (and ) of a RV are small when the RV
assumes values close to its expectation and are
large when it assumes values far from its
expectation
MAT-72306 RandAl, Spring 2015 29-Jan-15 124

4
1/29/2015

Definition 3.3: The covariance of and is


Cov ) = [( ])( ])].
Theorem 3.2: For any two RVs and
]= ]+ ]+2 ).
Proof:

[ ] + ]
+2 ] ] ]
] ]
+2 [ ]
+2

MAT-72306 RandAl, Spring 2015 29-Jan-15 125

Theorem 3.3: If and are two independent


RVs, then
]= ]
• This does not necessarily hold if the RVs are
dependent
• Let and each correspond to fair coin flips,
both taking on the value 0 if the flip is heads
and 1 if the flip is tails
• Then = 1/2
• If the two flips are independent, then is 1
with probability 1/4 and 0 otherwise
• Indeed [ ]= [ ] [ ]
MAT-72306 RandAl, Spring 2015 29-Jan-15 126

5
1/29/2015

• Suppose instead that the coin flips are


dependent in the following way:
– the coins are tied together, so and either
both come up heads or tails together
• Each coin considered individually is still a fair
coin flip, but now is 1 with probability 1/2
and so ]· ]

Corollary 3.4: If and are independent RVs,


)=0
and
[ + ]= [ ]+ [ ]
MAT-72306 RandAl, Spring 2015 29-Jan-15 127

Proof:
) = [( ])( ])]

=0
In the second equation we have used the fact that,
since and are independent, so are
and and hence Theorem 3.3 applies. For
the last equation we use the fact that, for any
random variable ,
[( ])] = ] = 0.

Since ) = 0, we have
+ = + .
MAT-72306 RandAl, Spring 2015 29-Jan-15 128

6
1/29/2015

3.2.1. Example: Variance of a BRV

• The variance of a BRV ) can be


determined by computing ]:

• We conclude that

= = (1 )
MAT-72306 RandAl, Spring 2015 29-Jan-15 129

• An alternative derivation makes use of


independence
• Recall that a BRV can be represented as the
sum of independent Bernoulli trials, each with
success probability
• Such a Bernoulli trial has variance
= Pr ]
+ (1 )
(1 )
• By Corollary 3.4, the variance of is then
(1 )
MAT-72306 RandAl, Spring 2015 29-Jan-15 130

7
1/29/2015

3.3. Chebyshev's Inequality

• Using the expectation and the variance of the


RV, one can derive a significantly stronger tail
bound known as Chebyshev's inequality

Theorem 3.6 [Chebyshev's Inequality]:


For any > 0,
Pr(|

MAT-72306 RandAl, Spring 2015 29-Jan-15 131

Proof: First observe that


Pr = Pr ]

Since ] is a nonnegative random


variable, we can apply Markov's inequality to
prove:
[ ]
Pr ]

MAT-72306 RandAl, Spring 2015 29-Jan-15 132

8
1/29/2015

• The following useful variants of Chebyshev's


inequality bound the deviation of the RV from
its expectation in terms of a constant factor of
] or ]

Corollary 3.7: For any > 1,


1
Pr and

Pr
]

MAT-72306 RandAl, Spring 2015 29-Jan-15 133

• Let us use Chebyshev's inequality to bound the


probability of obtaining more than /4 heads
in a sequence of fair coin flips
• Recall that = 1 if the th coin flip is heads and
0 otherwise
• = denotes the number of heads in the
coin flips
• To use Chebyshev's inequality we need to
compute the variance of
• Observe first that, since is a 0–1 RV,
1
=
2
MAT-72306 RandAl, Spring 2015 29-Jan-15 134

9
1/29/2015

• Thus,
1 1 1
= =
2 4 4
• Now, since = and the are
independent, we can use Thm 3.5 to compute

= =
4
• Applying Chebyshev's inequality then yields
Pr 4 Pr 4
4 4
= =
4 4

MAT-72306 RandAl, Spring 2015 29-Jan-15 135

• In fact, we can do slightly better


• Chebyshev's inequality yields that 4/ is
actually a bound on the probability that is
either smaller than /4 or larger than /4
• By symmetry the probability that is greater
than /4 is actually 2/
• Chebyshev's inequality gives a significantly
better bound than Markov's inequality for large

MAT-72306 RandAl, Spring 2015 29-Jan-15 136

10
1/29/2015

3.3.1. Example: Coupon Collector's Problem

• Recall that the time to collect coupons has


expectation , where
= = ln (1)
• Hence Markov's inequality yields
1
Pr
2
• Recall again that = , where the are
geometric random variables with parameter
+ 1)

MAT-72306 RandAl, Spring 2015 29-Jan-15 137

• The are independent because the time to


collect the th coupon does not depend on how
long it took to collect the previous 1 coupons
• Hence
= ]

so we need to find the variance of a geometric


random variable

Lemma 3.8: The variance of a geometric RV with


parameter is
(1 )
MAT-72306 RandAl, Spring 2015 29-Jan-15 138

11
1/29/2015

• We simplify the argument by using the upper


bound for a geometric RV,
instead of the exact result of Lemma 3.8
• Then

= ]
+1
1
6
because
1
=
6
MAT-72306 RandAl, Spring 2015 29-Jan-15 139

• Now, by Chebyshev's inequality,


6
Pr =
6
1
ln
• Chebyshev's inequality again gives a much
better bound than Markov's inequality
• But it is still a fairly weak bound, as we can see
by considering instead a simple union bound
argument
MAT-72306 RandAl, Spring 2015 29-Jan-15 140

12
1/29/2015

• Consider the probability of not obtaining the th


coupon after ln steps
• This probability is
)
1 ) =
1

• By a union bound, the probability that some


coupon has not been collected after ln
steps is only
• In particular, the probability that all coupons are
not collected after ln steps is at most
• A bound that is significantly better than what can
be achieved even with Chebyshev's inequality
MAT-72306 RandAl, Spring 2015 29-Jan-15 141

3.4. Application: A Randomized


Algorithm for Computing the Median
• Given a set of elements drawn from a totally
ordered universe, the median is s.t.
– at least 2 elements in are and
– at least 2 + 1 elements in are
• If the elements in are distinct, then is the
( 2 )th element in the sorted order of
• The median can be found deterministically in
log ) steps by sorting, and a relatively
complex deterministic algorithm computes it in
( ) time
MAT-72306 RandAl, Spring 2015 29-Jan-15 142

13
1/29/2015

• Let us assume that is odd and that the


elements in are distinct
• The goal is to find two elements that are close
together in the sorted order of and that have
the median lie between them
• Specifically, we seek two elements s.t.
1. (the median is between and )
2. for = , |= log
(the # of elements between and is small)
• Sampling gives us a simple and efficient method
for finding two such elements

MAT-72306 RandAl, Spring 2015 29-Jan-15 143

• Once these two elements are identified, can


be found in linear time with the following steps:
– Count (in linear time) the number of
elements of that are smaller than and then
sort (in sublinear, or ), time) the set
– Set can be sorted in time ) using any
standard log ) sorting algorithm, since
|= log
– The ( 2 + 1)th element in the sorted
order of is , since there are exactly 2
elements in that are smaller than that value
( 2 ) in the set and in )
MAT-72306 RandAl, Spring 2015 29-Jan-15 144

14
1/29/2015

• To find and , we sample with replacement a


multi-set of elements from
• Each element in is chosen uniformly at
random from the set , independent of previous
choices
• Thus, the same element of might appear more
than once in the multi-set
• Sampling w/o replacement might give marginally
better bounds, but implementing and analyzing it
are significantly harder
• We assume that an element can be sampled
from in constant time
MAT-72306 RandAl, Spring 2015 29-Jan-15 145

• Since is a random sample of , we expect


to be close to the median element of
• We therefore choose and to be elements of
surrounding the median of
• We require all the steps of our algorithm to work
with high probability (w.h.p), by which we
mean with probability at least for
some constant > 0
• To guarantee that w.h.p. the set includes the
median , we fix and to be respectively the
2 th and the 2+ th
elements in the sorted order of
MAT-72306 RandAl, Spring 2015 29-Jan-15 146

15
1/29/2015

• With this choice, the set includes all the


elements of that are between the 2 sample
points surrounding the median of
• The analysis will clarify that the choice of the
size of and the choices for and are tailored
to guarantee both that
a) the set is large enough to include with
high probability and
b) the set is sufficiently small so that it can be
sorted in sublinear time with high probability

MAT-72306 RandAl, Spring 2015 29-Jan-15 147

RANDOMIZED MEDIAN ALGORITHM:


Input: A set of elements over a totally ordered universe
Output: The median element of , denoted by
1. Pick a (multi-)set of elements in , chosen
independently and uniformly at random with replacement
2. Sort the set
3. 2 th smallest element in the sorted set
4. 2+ th smallest element in the sorted set
5. By comparing every element in to and , compute the set
= and the numbers
= and =
6. if 2 or 2 then FAIL
7. if then sort the set , otherwise FAIL
8. output the ( 2 + 1)th element in the sorted order of

MAT-72306 RandAl, Spring 2015 29-Jan-15 148

16
1/29/2015

Theorem 3.9: The randomized median algorithm


terminates in linear time, and if it does not output FAIL
then it outputs the correct median element of the input
set .
Proof: Correctness follows because the algorithm
could only give an incorrect answer if the median were
not in the set . But then either 2 or >
2 and thus step 6 guarantees that, in these cases,
the algorithm outputs FAIL.
Similarly, as long as is sufficiently small, the total
work is only linear in the size of . Step 7 therefore
guarantees that the algorithm does not take more than
linear time; if the sorting might take too long, the
algorithm outputs FAIL without sorting.
MAT-72306 RandAl, Spring 2015 29-Jan-15 149

• We identify three "bad" events such that, if none


of them occurs, the algorithm does not fail
: = < 2 ;
= < 2 ;
>4
Lemma 3.10: The randomized median algorithm
fails iff at least one of , , or occurs.
Proof: Failure in step 7 is equivalent to . Failure
in step 6 occurs iff 2 or 2. But for
2, the 2 )th smallest element of
must be larger than : this is equivalent to .
Similarly, > 2 is equivalent to the event .
MAT-72306 RandAl, Spring 2015 29-Jan-15 150

17
1/29/2015

Lemma 3.11:
Pr
Proof: Define a random variable by
if the th sample is the median
=
otherwise
The are independent, since the sampling is
done with replacement.
Because there are 1)/2 + 1 elements in
that are the median, the probability that a
randomly chosen element of is the median can
be written as
1)/2 + 1 1 1
Pr =1 = = +
2
MAT-72306 RandAl, Spring 2015 29-Jan-15 151

The event is equivalent to

= < 2

Since is the sum of Bernoulli trials, it is a BRV


with parameters and 1/2 1/2 . Hence,
using the earlier result ] = (1 ) yields

1 1 1 1
]= +
2 2
1 1 1
= <
4 4 4
MAT-72306 RandAl, Spring 2015 29-Jan-15 152

18
1/29/2015

Applying Chebyshev's inequality then yields


Pr = Pr < 2
Pr ] >
]

1
1
<4 =
4

• We similarly obtain the same bound for the


probability of the event

Lemma 3.12: Pr
MAT-72306 RandAl, Spring 2015 29-Jan-15 153

Theorem 3.13: The Pr that the randomized


median algorithm fails is bounded by .
• Repeating the algorithm until it succeeds (finds
the median), we obtain an iterative algorithm
that never fails but has a random running time
• The samples taken in successive runs are
independent, so the success of each run is
independent of others, and hence the number of
runs until success is achieved is a geometric
random variable
• This variation of the algorithm still has linear
expected running time
MAT-72306 RandAl, Spring 2015 29-Jan-15 154

19
1/29/2015

• Rand.algs. that may fail or return an incorrect


answer are called Monte Carlo algorithms
• The running time of a MC algorithm often does
not depend on the random choices made
• E.g., the median algorithm always terminates in
linear time, regardless of its random choices
• A randomized algorithm that always returns the
right answer is called a Las Vegas algorithm
• The median MC algorithm can be turned into a
LV algorithm by repeating it until it succeeds
• Turning it into a LV algorithm means the running
time is variable, although the is still linear
MAT-72306 RandAl, Spring 2015 29-Jan-15 155

4 Chernoff Bounds

• Chernoff bounds are extremely powerful, giving


exponentially decreasing bounds on the tail
distribution
• These bounds are derived by using Markov's
inequality on the moment generating function of
a random variable
MAT-72306 RandAl, Spring 2015 29-Jan-15 156

20
1/29/2015

4.1. Moment Generating Functions

Definition 4.1: The moment generating


function (MGF) of a random variable is
= Pr

• We are mainly interested in the existence and


properties of this function in the
neighborhood of zero
• captures all of the moments of

MAT-72306 RandAl, Spring 2015 29-Jan-15 157

Theorem 4.1: Let be a RV with MGF .


Under the assumption that exchanging the
expectation and differentiation operands is
legitimate, for all > 1 we then have
0 ,
where 0 is the th derivative of
evaluated at = 0.
Proof: Assuming that we can exchange the
expectation and differentiation operands, then
.
Computed at = 0, this expression yields
0 = .
MAT-72306 RandAl, Spring 2015 29-Jan-15 158

21
1/29/2015

• Expectation and differentiation operands can be


exchanged whenever the MGF exists in a
neighborhood of zero
• This holds for all distributions considered in this
course
• Consider a RV ~Geom( ),
• Then, for ln(1 ),
=

= (1 1
1
MAT-72306 RandAl, Spring 2015 29-Jan-15 159

• It follows that
)
(1 and
)
=2
(1
• Evaluating these derivatives at = 0 and using
Theorem 4.1 gives
1
= and

= ,

matching our previous calculations


MAT-72306 RandAl, Spring 2015 29-Jan-15 160

22
1/29/2015

• The MGF of a RV
– equivalently, all of the moments of the RV
uniquely defines its distribution

Theorem 4.2: Let and be two RVs. If

for all ) for some > 0, then and


have the same distribution.

MAT-72306 RandAl, Spring 2015 29-Jan-15 161

Theorem 4.3: If and are independent RVs,


then .
Proof:
)

Here we have used that and are independent -


and hence and are independent.

• Thus, if we know and and if we


recognize the function as the MGF
of a known distribution, then that must be the
distribution of + when Theorem 4.2 applies
MAT-72306 RandAl, Spring 2015 29-Jan-15 162

23
1/29/2015

4.2. Deriving and Applying Chernoff Bounds

• The Chernoff bound for a RV is obtained by


applying Markov's inequality to for some
well-chosen value
• From Markov’s inequality, we can derive the
following useful inequality: for any > 0,
Pr = Pr
• In particular,
Pr min
MAT-72306 RandAl, Spring 2015 29-Jan-15 163

• Similarly, for any < 0,


Pr = Pr
• Hence,
Pr min
• Bounds for specific distributions are obtained by
choosing appropriate values for
• The value of that minimizes gives
the best possible bounds
• However, often one chooses a value of that
gives a convenient form
MAT-72306 RandAl, Spring 2015 29-Jan-15 164

24
1/29/2015

4.2.1. Chernoff Bounds for the Sum


of Poisson Trials

• We develop a common version of the Chernoff


bound: for the tail distribution of a sum of
independent 0–1 RVs, which are also known as
Poisson trials
• The distributions of the RVs in Poisson trials are
not necessarily identical
• Bernoulli trials are a special case of Poisson
trials where the independent 0–1 RVs have the
same distribution; i.e., all trials take on the value
1 with the same probability
MAT-72306 RandAl, Spring 2015 29-Jan-15 165

• Recall that the binomial distribution gives the #


of successes in independent Bernoulli trials
• Our Chernoff bound will hold for the binomial
distribution and also for the more general setting
of the sum of Poisson trials
• Let , … , be a sequence of independent
Poisson trials with Pr =1 =
• Let = , and let

= =

MAT-72306 RandAl, Spring 2015 29-Jan-15 166

25
1/29/2015

• For a given > 0, we are interested in bounds


on Pr 1+ ) and Pr )
• I.e., the probability that deviates from its
expectation by or more
• To develop a Chernoff bound we need to
compute the MGF of
• We start with the MGF of each :

+ (1 )
=1+ 1)
)

because for any , 1 +


MAT-72306 RandAl, Spring 2015 29-Jan-15 167

• Applying Thm 4.3, we take the product of the


MGFs to obtain

= )

= exp 1)

• We can now develop concrete versions of the


Chernoff bound for a sum of Poisson trials
• Bounds on the deviation above the mean

MAT-72306 RandAl, Spring 2015 29-Jan-15 168

26
1/29/2015

Theorem 4.4: Let , … , be independent


Poisson trials s.t. Pr = 1 = . Let =
and ]. Then the following Chernoff bounds
hold:
1. for any > 0,

Pr (1 + )< )
;
(1 + )
2. for 0 < 1,
Pr (1 + ;
3. for ,
Pr 2 .

MAT-72306 RandAl, Spring 2015 29-Jan-15 169

27

You might also like