You are on page 1of 22

1/22/2015

2. Discrete Random
Variables and Expectation
• In tossing two dice we are often interested in the
sum of the dice rather than their separate values
• The sample space in tossing two dice consists of
36 events of equal probability, given by the
ordered pairs of numbers {(1,1), (1,2), … , (6, 6)}
• If the quantity we are interested in is the sum of
the two dice, then we are interested in 11 events
(of unequal probability)
• Any such function from the sample space to the
real numbers is called a random variable
MAT-72306 RandAl, Spring 2015 22-Jan-15 75

2.1. Random Variables and Expectation

Definition 2.1: A random variable (RV) on a


sample space is a real-valued function on ;
that is, . A discrete random variable is a
RV that takes on only a finite or countably infinite
number of values
• For a discrete RV and a real value , the event
" includes all the basic events of the
sample space in which assumes the value
• I.e., " represents the set )= }

MAT-72306 RandAl, Spring 2015 22-Jan-15 76

1
1/22/2015

• We denote the probability of that event by


Pr = Pr
,

• If is the RV representing the sum of the two


dice, the event = 4 corresponds to the set of
basic events {(1, 3), (2,2), (3, 1)}
• Hence,
3 1
Pr = 4 = =
36 12

MAT-72306 RandAl, Spring 2015 22-Jan-15 77

Definition 2.2: Two RVs and are independent


if and only if
Pr(( )) = Pr Pr )
for all values and .
Similarly, RVs ,…, are mutually
independent if and only if, for any subset [1, ]
and any values , ,

Pr ) = Pr

MAT-72306 RandAl, Spring 2015 22-Jan-15 78

2
1/22/2015

Definition 2.3: The expectation of a discrete RV


, denoted by E ], is given by
= Pr

where the summation is over all values in the


range of . The expectation is finite if
| Pr , converges; otherwise, it is
unbounded.
• E.g., the expectation of the RV representing
the sum of two dice is
1 2 3 1
= 2+ 3+ 4+ + 12 = 7
36 36 36 36
MAT-72306 RandAl, Spring 2015 22-Jan-15 79

• As an example of where the expectation of a


discrete RV is unbounded, consider a RV that
takes on the value 2 with probability 1 2 for
= 1,2, …
• The expected value of is
1
]= 2 = 1
2
• expresses that ] is unbounded

MAT-72306 RandAl, Spring 2015 22-Jan-15 80

3
1/22/2015

2.1.1. Linearity of Expectations

• By this property, the expectation of the sum of


RVs is equal to the sum of their expectations

Theorem 2.1 [Linearity of Expectations]:


For any finite collection of discrete RVs
,…, with finite expectations,

MAT-72306 RandAl, Spring 2015 22-Jan-15 81

Proof: We prove the statement for two random variables


and (general case by induction). The summations that
follow are understood to be over the ranges of the
corresponding RVs:
= Pr )

= Pr ) + Pr )

= Pr ) + Pr )

= Pr + Pr
]+
The first equality follows from Definition 1.2. In the
penultimate equation uses Theorem 1.6, the law of total
probability.
MAT-72306 RandAl, Spring 2015 22-Jan-15 82

4
1/22/2015

• Let us now compute the expected sum of two


standard dice
• Let , where represents the
outcome of die for = 1,2
• Then
1 7
= =
6 2
• Applying the linearity of expectations, we have
=7
• Linearity of expectations holds for any collection
of RVs, even if they are not independent
MAT-72306 RandAl, Spring 2015 22-Jan-15 83

Lemma 2.2: For any constant and discrete RV ,


].
Proof: The lemma is obvious for = 0. For 0,
]= Pr

Pr

Pr

MAT-72306 RandAl, Spring 2015 22-Jan-15 84

5
1/22/2015

2.1.2. Jensen's Inequality

• Let us choose the length of a side of a square


uniformly at random from the range [1,99]
• What is the expected value of the area?
• We can write this as ]
• It is tempting to think of this as being equal to
, but a simple calculation shows that this is
not correct
• In fact, = = 50 = 2500 whereas
= 9950 3 3317 > 2500
MAT-72306 RandAl, Spring 2015 22-Jan-15 85

• More generally, [ ] ( )
• Consider = ( )
• The RV is nonnegative and hence its
expectation must also be nonnegative
] = [( ])
+ ]
]+( )
)
• To obtain the penultimate line, use the linearity
of expectations
• To obtain the last line use Lemma 2.2 to simplify
[ [ ]] = [ ] [ ]
MAT-72306 RandAl, Spring 2015 22-Jan-15 86

6
1/22/2015

• The fact that [ ] ( ) is an example of


Jensen's inequality
• Jensen's inequality shows that, for any convex
function , we have )] ])

Definition 2.4: A function is said to be


convex if, for any and 1,
+ + (1 )

Lemma 2.3: If is twice differentiable function,


then is convex if and only if
"( ) 0
MAT-72306 RandAl, Spring 2015 22-Jan-15 87

MAT-72306 RandAl, Spring 2015 22-Jan-15 88

7
1/22/2015

Theorem 2.4 [Jensen's Inequality]: If is a


convex function, then )] ]).
Proof: We prove the theorem assuming that
has a Taylor expansion. Let ].
By Taylor's theorem there is a value such that
(x )+ ) 2
(x )
since ) > 0 by convexity. Taking expectations
and applying linearity of and Lemma 2.2 yields:

]+ )(
)= ]).
MAT-72306 RandAl, Spring 2015 22-Jan-15 89

2.2. The Bernoulli and Binomial


Random Variables

• We run an experiment that succeeds with


probability and fails with probability
• Let be a RV such that
if the experiment succeeds,
=
otherwise
• The variable is called a Bernoulli or an
indicator random variable.
• Note that, for a Bernoulli RV,
]= 1 + (1 0 = = Pr( = 1)

MAT-72306 RandAl, Spring 2015 22-Jan-15 90

8
1/22/2015

• If we, e.g., flip a fair coin and consider heads a


success, then the expected value of the
corresponding indicator RV is 1/2
• Consider a sequence of independent coin flips
• What is the distribution of the number of heads
in the entire sequence?
• More generally, consider a sequence of
independent experiments, each of which
succeeds with probability
• If we let represent the number of successes in
the experiments, then has a binomial
distribution
MAT-72306 RandAl, Spring 2015 22-Jan-15 91

Definition 2.5: A binomial RV with parameters


and , denoted by ), is defined by the
following probability distribution on = 0,1,2, … , :
Pr =

• I.e., the binomial RV (BRV) equals when


there are exactly successes and failures
in independent experiments, each of which is
successful with probability
• Definition 2.5 ensures that the BRV is a valid
probability function (Definition 1.2):
Pr = =1
MAT-72306 RandAl, Spring 2015 22-Jan-15 92

9
1/22/2015

• We want to gather data about the packets going


through a router
• We want to know the approximate fraction of
packets from a certain source / of a certain type
• We store a random subset – or sample – of the
packets for later analysis
• Each packet is stored with probability and
packets go through the router each day, the
number of sampled packets each day is a BRV
with parameters and
• To know how much memory is necessary for
such a sample, determine the expectation of
MAT-72306 RandAl, Spring 2015 22-Jan-15 93

• If is a BRV with parameters and , then is


the number of successes in trials, where each
trial is successful with probability
• Define a set of indicator RVs , … , , where
= 1 if the th trial is successful and 0
otherwise
• Clearly, ] = and = and so, by the
linearity of expectations,

MAT-72306 RandAl, Spring 2015 22-Jan-15 94

10
1/22/2015

2.3. Conditional Expectation

Definition 2.6:
]= Pr ,

where the summation is over all in the range of

• The conditional expectation of a RV is, like , a


weighted sum of the values it assumes
• Now each value is weighted by the conditional
probability that the variable assumes that value
MAT-72306 RandAl, Spring 2015 22-Jan-15 95

• Suppose that we independently roll two


standard six-sided dice
• Let be the number that shows on the first die,
the number on the second die, and the sum
of the numbers on the two dice
• Then
=2 = Pr =2

1 11
=
6 2
MAT-72306 RandAl, Spring 2015 22-Jan-15 96

11
1/22/2015

• As another example, consider =5:

=5 = Pr =5

Pr =5
=
Pr =5
1 36 5
= =
4 36 2

MAT-72306 RandAl, Spring 2015 22-Jan-15 97

Lemma 2.5: For any RVs and ,


]= Pr ],

where the sum is over all values in the range of


and all of the expectations exist.
Proof:
Pr
= Pr Pr
= Pr Pr
= Pr
= Pr ]
MAT-72306 RandAl, Spring 2015 22-Jan-15 98

12
1/22/2015

• The linearity of expectations also extends to


conditional expectations

Lemma 2.6: For any finite collection of discrete


RVs ,…, with finite expectations and for
any RV ,

= ]

MAT-72306 RandAl, Spring 2015 22-Jan-15 99

• Confusingly, the conditional expectation is also


used to refer to the following RV

Definition 2.7: The expression ] is a RV )


that takes on the value ] when

• ] is not a real value; it is actually a function


of the RV
• Hence ] is itself a function from the sample
space to the real numbers and can therefore be
thought of as a RV
MAT-72306 RandAl, Spring 2015 22-Jan-15 100

13
1/22/2015

• In the previous example of rolling two dice,


1 7
= Pr = +
6 2
• We see that is a RV whose value
depends on
• If ] is a RV, then it makes sense to consider
its expectation ]
• We found that +7 2
• Thus,
7 7 7
+ = + =7= ]
2 2 2
MAT-72306 RandAl, Spring 2015 22-Jan-15 101

• More generally,

Theorem 2.7: Y = ]

Proof: From Definition 2.7 we have ,


where takes on the value when
. Hence
= Pr

The right-hand side equals Y by Lemma 2.5.


MAT-72306 RandAl, Spring 2015 22-Jan-15 102

14
1/22/2015

• Consider a program that includes one call to a


process
• Assume that each call to process recursively
spawns new copies of the process , where the
number of new copies is a BRV with parameters
and
• We assume that these random variables are
independent for each call to
• What is the expected number of copies of the
process generated by the program?
MAT-72306 RandAl, Spring 2015 22-Jan-15 103

• To analyze this recursive spawning process, we


use generations
• The initial process is in generation 0
• Otherwise, we say that a process is in
generation if it was spawned by another
process in generation 1
• Let denote the number of processes in
generation
• Since we know that = 1, the number of
processes in generation 1 has a binomial
distribution
• Thus, =
MAT-72306 RandAl, Spring 2015 22-Jan-15 104

15
1/22/2015

• Similarly, suppose we knew that the number of


processes in generation 1 was , so

• Then

• Applying Theorem 2.7, we can compute the


expected size of the th generation inductively
• We have
]
• By induction on , and using the fact that = 1,
we then obtain
=
MAT-72306 RandAl, Spring 2015 22-Jan-15 105

• The expected total number of copies of process


generated by the program is given by

= =

• If 1 then the expectation is unbounded; if


< 1, the expectation is 1 (1 )
• The # of processes generated by the program
is bounded iff the # of processes spawned by
each process is less than 1
• This is a simple example of a branching process,
a probabilistic paradigm extensively studied in
probability theory
MAT-72306 RandAl, Spring 2015 22-Jan-15 106

16
1/22/2015

2.4. The Geometric Distribution

• Let us flip a coin until it lands on heads


• What is the distribution of the number of flips?
• This is an example of a geometric distribution
• It arises when we perform a sequence of
independent trials until the first success, where
each trial succeeds with probability
Definition 2.8: A geometric RV with parameter
is given by the following probability distribution on
= 1,2, … : Pr )=
MAT-72306 RandAl, Spring 2015 22-Jan-15 107

• Geometric RVs are said to be memoryless


because the probability that you will reach your
first success trials from now is independent of
the number of failures you have experienced
• Informally, one can ignore past failures – they do
not change the distribution of the number of
future trials until first success
• Formally, we have the following

Lemma 2.8: For a geometric RV with parameter


and for > 0,
Pr ) = Pr
MAT-72306 RandAl, Spring 2015 22-Jan-15 108

17
1/22/2015

• When a RV takes values in the set of natural


numbers = {0,1,2,3, … } there is an alternative
formula for calculating its expectation
Lemma 2.9: Let be a discrete RV that takes on
only nonnegative integer values. Then
]= Pr

Proof:
Pr = Pr

= Pr = Pr ]

MAT-72306 RandAl, Spring 2015 22-Jan-15 109

• For a geometric RV with parameter ,


Pr = =

• Hence
=
1
=
(1 )
1
=
• Thus, for a fair coin where = 1/2, on average
it takes two flips to see the first heads
MAT-72306 RandAl, Spring 2015 22-Jan-15 110

18
1/22/2015

• Finding the expectation of a geometric RV with


parameter using conditional expectations and
the memoryless property of geometric RVs
• Recall that corresponds to the number of flips
until the first heads given that each flip is heads
with probability
• Let = 0 if the first flip is tails and = 1 if the
first flip is heads
• By the identity from Lemma 2.5,
= Pr = 0 =0
+ Pr = 1 = 1]
= (1 ) [ | = 0] + [ | = 1]
MAT-72306 RandAl, Spring 2015 22-Jan-15 111

• If = 1 then = 1, so [ | = 1] = 1
• If = 0, then > 1
• In this case, let the number of remaining flips
(after the first flip until the first heads) be
• Then, by the linearity of expectations,
] = (1 + 1] + 1 = (1 ]+1
• By the memoryless property of geometric RVs,
is also a geometric RV with parameter
• Hence ]= ], since they both have the
same distribution
• We therefore have ] = (1 ]+1 =
(1 ) [ ] + 1, which yields [ ] = 1/
MAT-72306 RandAl, Spring 2015 22-Jan-15 112

19
1/22/2015

2.4.1. Example: Coupon


Collector's Problem

• Each box of cereal contains


one of different coupons
• Once you obtain one of every type of coupon,
you can send in for a prize
• Coupon in each box is chosen independently
and uniformly at random from the possibilities
and that you do not collaborate to collect
coupons
• How many boxes of cereal must you buy before
you obtain at least one of every type of coupon?
MAT-72306 RandAl, Spring 2015 22-Jan-15 113

• Let be the number of boxes bought until at


least one of every type of coupon is obtained
• If is the number of boxes bought while you
had exactly 1 different coupons, then clearly
=
• The advantage of breaking into a sum of
random variables , = 1, … , , is that each
is a geometric RV
• When exactly 1 coupons have been found,
the probability of obtaining a new coupon is
1
=1
MAT-72306 RandAl, Spring 2015 22-Jan-15 114

20
1/22/2015

• Hence, is a geometric RV with parameter :


1
= =
+1
• Using the linearity of expectations, we have that

=
+1
1

MAT-72306 RandAl, Spring 2015 22-Jan-15 115

• The summation is known as the


harmonic number )

Lemma 2.10: The harmonic number =


satisfies ) = ln (1).

• Thus, for the coupon collector's problem, the


expected number of random coupons required
to obtain all coupons is ln
MAT-72306 RandAl, Spring 2015 22-Jan-15 116

21
1/22/2015

• Given the first and second moments, one can


compute the variance and standard deviation of
the RV
• Intuitively, the variance and standard deviation
offer a measure of how far the RV is likely to be
from its expectation

Definition 3.2: The variance of a RV is


]
The standard deviation of a RV is
]=

MAT-72306 RandAl, Spring 2015 22-Jan-15 122

• The two forms of the variance in the definition


are equivalent, as is easily seen by using the
linearity of expectations
• Keeping in mind that is a constant, we have

[ ]= ]

MAT-72306 RandAl, Spring 2015 22-Jan-15 123

22

You might also like