You are on page 1of 13

Probability (Important Random Variables)

H2 Mathematics

(Properties of common random variables covered in H2 syllabus are briefly


explained here. Basic knowledge of PMF, PDF, CDF, expected value (mean),
variance etc. is required)

1. Bernoulli Random Variable


.
1. Definition

A trial is conducted. The trial has two possible outcomes - success and
failure, each having probability p and 1 p respectively. (This trial is called
a Bernoulli trial). Then a Bernoulli random variable X is defined as
(
1 , if the trial succeeded
X=
0 , if the trial failed

2. Probability Mass Function

The PMF of a Bernoulli random variable is evidently


(
p , if k = 1
P (X = k) =
1 p , if k = 0
3. Expected Value

The expected value of a Bernoulli random variable is easily computed.


E(X) = 1 p + 0 (1 p) = p

1
4. Variance and Standard Deviation

The variance is

var(X) = E(X 2 ) E(X)2 = 12 p + 02 (1 p) p2 = p(1 p)

thus the standard deviation is


p
(X) = p(1 p)

2. Binomial Random Variable


.
1. Definition

n independent Bernoulli trials are conducted. The binomial random variable


B(n, p) counts the number of successes obtained in these n Bernoulli
trials. In other words, if X1 , X2 , , Xn are n independent Bernoulli random
variables,
B(n, p) = X1 + X2 + + Xn
2. Probability Mass Function

Out of n trials, there are nk ways to choose k successes. pk (1 p)nk is




the probability that there are k successes and n k failures. Hence the PMF
of X B(n, p) is  
n k
P (X = k) = p (1 p)nk
k
3. Expected Value

Recall that X B(n, p) is a sum of n independent Bernoulli random vari-


ables X1 , X2 , , Xn . By the linearity of expectation,
n
X n
X
E(X) = E(Xk ) = p = np
k=1 k=1

2
Alternatively, E(X) = np can be derived from a not-so-elegant brute force
approach.
n   n  
X n k nk X n k
E(X) = k p (1 p) = lim x x (1 p)nk
k=0
k xp x
k=0
k

= lim x (x + 1 p)n = lim xn(x + 1 p)n1 = np
xp x xp

4. Variance and Standard Deviation

By the independence of the n Bernoulli random variables X1 , X2 , , Xn ,


the variance of X B(n, p) can be calculated as follows.
n
X n
X
var(X) = var(X1 + X2 + + Xn ) = var(Xk ) = p(1 p) = np(1 p)
k=1 k=1

Thus the standard deviation of X B(n, p) is


p
(X) = np(1 p)

A brute force approach can also be used to prove that var(X) = np(1 p).

var(X) = E(X 2 ) E(X)2


n  
2 n
X
= k pk (1 p)nk n2 p2
k=0
k
n  
X n k
= lim x x x (1 p)nk n2 p2
xp x x k
k=0

= lim x x (x + 1 p)n n2 p2
xp x x

= lim x xn(x + 1 p)n1 n2 p2
xp x

= lim nx((x + 1 p)n1 + x(n 1)(x + 1 p)n2 )) n2 p2


xp

= np(1 + p(n 1)) n2 p2


= np(1 p)

3
3. Poisson Random Variable
.
1. Definition

Suppose that events occur randomly and independently across time. It is


known that on average, events occur in a time interval t. The Poisson
random variable Pois() counts the number of events that occur in a
time interval t.

2. Probability Mass Function

Let us break up the time interval t into n sub-intervals of t, i.e. t = nt.


It follows that the average number of events that occur in a time interval of

t is . Now, make the assumption that t is sufficiently small such that
n
the probability of more than one event occurring in t is negligible. Now
let Xk be the random variable that equals 1 when one event occurs in the
k-th interval of duration t, and 0 otherwise. It is easy to see that Xk is

approximately a Bernoulli random variable with mean , so X Pois()
n
can be approximated by
 

X X1 + X2 + Xn B n,
n

 enough, i.e. for n large. This implies that as n , the PMF of


fort small

B n, approaches the PMF of X Pois(). By Stirlings approximation,
n
   k  nk
n
P (X = k) = lim 1
n k n n
n n
 k  nk
2n n e
= lim p 1
n k! 2(n k)(n k)nk e(nk) n n
r  nk
1 n k k k
= lim e 1+
n k! nk nk
k
e
=
k!

4
3. Approximation to Binomial Random Variable

The previous section was essentially a proof of what is called the Poisson
Limit Theorem which provides us with a useful approximation of the bino-
mial random variable. That is, for n large (say n > 50) and np small (say
np < 5),
B(n, p) Pois(np)

4. Expected Value, Variance and Standard Deviation

By definition, the mean of X Pois() is

E(X) =
 

Using the fact that B n, tends to Pois(), we may easily derive the
n
variance of X Pois() as well.
 

var(X) = lim n 1 =
n n n

and the standard deviation is



(X) =

Alternatively, the variance can be computed using the PMF.

var(X) = E(X 2 ) E(X)2



X k e
= k2 2
k=0
k!


X k
= e (k + 1) 2
k=0
k!

2
X k
X k
= e + e 2
k=0
k! k=0
k!
2
= e e + e e = 2

5
4. Normal Distribution
.
1. Deriving the Central Limit Theorem

Let X1 , X2 , , Xn be independent and identically distributed random vari-


ables with E(Xk ) = 0 and var(X) = 2 for all k = 1, 2, , n. We now
introduce the concept of the characteristic function of a random variable X,
which is defined as
(it)2 E(X 2 )
X (t) = E(eitX ) = 1 + itE(X) + + O(t3 )
2!
Since E(Xk ) = 0 and var(X) = E(X 2 ) = 2 , the characteristic function of
Xk is
2
Xk (t) = 1 t2 + O(t3 )
2
X1 + X2 + + Xn
Let X = . Suppose we are interested in the properties
n
of nX. By the independence of X1 , X2 , , Xn ,
     
t t t
nX (t) = X1
X2 Xn
n n n
2
   3 
t
= exp n ln 1 t2 + O
2n n3/2
 2  3 
2 t
= exp t +O
2 n1/2

2 2
 
so as n , nX (t) exp t .
2

Let us now determine the probability density function of lim nX. No-
n
tice that the characteristic function of a random variable is simply a Fourier
transform of its PDF. Thus by Fourier inversion,
Z Z ( 2 )
2 2

1 2 2 1 ix x
lim fnX (x) = eitx e t /2 dt = exp t + 2 2 dt
n 2 2 2 2

6
 
ix
Let the integrand be f t + 2 . By Cauchys Integral Theorem, the inte-
 
ix
gral of f t + 2 over R is equal to the integral of f (t) over R since the

integral of f (t) over R, R + ix2 decays exponentially as R . Thus
 

Z
x2
  2 
1

lim f nX (x) = exp 2 exp t2 dt
n 2 2 2

x2 2 t2
  Z
1
= exp 2 e dt
2 2
x2
 
1
= exp 2
2 2
R 2
The last equality follows from the fact that et dt = . (There are
several ways to prove this result. The most common methods include the
use of polar coordinates or Eulers reflection formula).

So as n ,
X1 + X2 + + X n

n
x2
 
1
will approach a probability distribution that has a PDF of exp 2 .
2 2
This is known as the Central Limit Theorem.

2. Definition - Probability Density and Cumulative Distribution Function

We say that a continuous random variable X N (, 2 ) is normally dis-


tributed if it has mean , variance 2 and PDF
(x )2
 
1
fX (x) = exp
2 2 2
By convention, Z N (0, 1) is said to be a standard normal distribution.
The CDF of X N (, 2 ) can be expressed in terms of the CDF of Z by
the elementary relation
 
x
FX (x) = P (X x) = FZ

7
3. Two (Very) Important Properties

Property 1: If X N (, 2 ) then aX + b N (a + b, a2 2 ) for any


constants a and b.

The proof of this property is straightforward. The CDF of aX + b is


!
xb

   
xb a
x (a + b)
P (aX + b x) = P X = FZ = FZ
a a
and this shows that aX + b is normally distributed with mean a + b and
standard deviation a. This completes the proof.

Property 2: If X1 N (1 , 12 ) and X2 N (2 , 22 ) and X1 and X2 are


independent, then X1 + X2 N (1 + 2 , 12 + 22 ).

This proof involves characteristic functions. Let X N (, 2 ). By a simple


calculation, the CF of X is
 2 
2
X (t) = exp t + it
2
Then by the independence of X1 and X2 ,
(12 + 22 ) 2
 
X1 +X2 (t) = X1 (t)X2 (t) = exp t + i(1 + 2 )t
2
This shows that X1 + X2 is also normally distributed and has mean 1 + 2
and variance 12 + 22 , hence completing the proof.

4. Central Limit Theorem - Alternative Form

We may now re-write the Central Limit Theorem (previously derived) in


a slightly different form. Let Yk = Xk + for all k = 1, 2, , n. Then
Y1 , Y2 , , Yn are independent and identically distributed random variables
each with mean and variance 2 . Applying Property 1,
Y1 + Y2 + + Yn N (n, n 2 )
as n grows large.

8
5. Normal Approximation to Binomial Random Variable

Recall that a binomial random variable X B(n, p) is a sum of n inde-


pendent Bernoulli random variables X1 , X2 , , Xn , each with mean p and
variance p(1 p). By the Central Limit Theorem, as n grows large,

B(n, p) = X1 + X2 + + Xn N (np, np(1 p))

This approximation is usually accurate only if np > 5 and n(1 p) > 5.

As we are using a continuous distribution to approximate a discrete one,


continuity correction has to be applied to improve the accuracy of the
approximation. Thus we instead approximate the CDF of X B(n, p) by
!
k + 12 np
P (X k) FZ p
np(1 p)

where Z N (0, 1).

6. Normal Approximation to Poisson Random Variable

Let X1 , X2 , , X be independent Poisson random variables each with mean


1. The characteristic function of X Pois() is

X k e it
X (t) = eikt = e(e 1)
k=0
k!

Then by the independence of X1 , X2 , , X ,


it 1)
X1 +X2 ++X (t) = X1 (t)X2 (t) X (t) = e(e

which proves that


X = X1 + X2 + + X
Then by the Central Limit Theorem, X Pois(), a sum of independent
Poisson random variables with mean and variance 1, can be approximated
by (for large)
X = X1 + X2 + + X N (, )
This approximation is usually accurate when > 10. Continuity correction
should also be used when applicable.

9
5. Worked Examples (Easy)
.
1. Find the minimum number of rolls of two dice required such that the
1
probability of getting two sixes at least twice is at least .
2
Solution:

Suppose I roll two dice n times. Let X be the random variable


 that counts
1
the number of times I get two sixes. Evidently, X B n, 36 so its PMF is
   k  nk
n 1 35
P (X = k) =
k 36 36
The probability of getting two sixes at least twice is
 n    n1
35 1 35 1
P (X 2) = 1 n
36 36 36 2

Solving the inequality (with GC), we get that the minimum n is n = 61.

2. There is a concert and only 2500 tickets are available, so one must send
in applications for a chance to obtain tickets. If you sent in 100 applications,
and the total number of applications is 125000, find the probability that you
will get a ticket, and explain why this probability is close to 1 e2 .

Solution:

Let X be the random variable that counts the number of accepted appli-
cations out of the 100 sent in. Its PMF is
100
 124900 
k 2500k
P (X = k) = 125000

2500

thus the probability we seek is


124900

2500
P (X 1) = 1 125000
 0.867
2500

10
To intuitively understand why this probability is close to 1 e2 , we see that
the process of choosing 2500 tickets out of 125000 (without replacement)
can be roughly estimated by 2500 independent Bernoulli trials (i.e. choosing
tickets with replacement). This is because the probability of choosing a
given ticket remains almost constant since 125000 >> 2500. Thus we can
estimate the PMF of X as that of a binomial random variable.
  k  2500k
2500 100 124900
P (X = k)
k 125000 125000

We can in turn approximate this binomial PMF with that of Pois(2) as


n = 2500 is sufficiently large and np = 2 is sufficiently small. Hence our
desired probability is roughly

20 e2
P (X 1) 1 = 1 e2
0!
3. n dice are simultaneously rolled 46n1 times. You win a round if you get
six on every dice rolled in one of the rolls. Show that probability of winning
at least one round is roughly 1 e2/3 when n is large.

Solution:

Let X be the random variable that counts the number of times you get
all sixes on every roll of k dice. Then X B(4 6n1 , 6n ) with PMF
k  46n1 k
4 6n1
 
1 1
P (X = k) = 1 n
k 6n 6

Since n is large and p = 6n is small, X is approximately Pois( 23 ), so the


probability we seek is approximately
2 0 2/3

e
P (X 1) 1 3
= 1 e2/3
0!

11
4. On average, 100 plane crashes happen a year. Estimate the probability
that more than 150 plane crashes will happen in a time period of a year.

Solution:

It is reasonable to model the number of plane crashes as a Poisson random


variable X Pois(100). Then the probability we seek is
150
X 100k e100
P (X > 150) = 1 1.23 106
k=0
k!
5. IQ can be modelled by a normal distribution. Given that 2.275 percent
of the population have an IQ higher than 130, and 9.121 percent of the pop-
ulation have an IQ lower than 80, calculate the probability that out of 5
randomly picked people, their average IQ is greater than 110.

Solution:

Let X N (, 2 ) indicate the IQ score of a randomly selected person. We


are given in the question that
   
130 80
1 FZ = 0.02275 , FZ = 0.09121

80 4 130
So = and = 2. This yields = 100 and = 15. Suppose
3
X1 , X2 , , X5 are all N (100, 225) distributed. Then the probability we seek
is
 
550 500
P (X1 + X2 + + X5 > 550) = 1 FZ 0.0680
225 5
6. (continued) Suppose the IQ of aliens is N (70, 100) distributed. Determine
the probability that the combined IQ of 3 aliens exceeds that of 2 humans.

Solution:

Let Y1 , Y2 , Y3 respectively measure the IQ of each alien. Likewise define


X1 , X2 similarly. The probability we seek is
 
0 (2 100 3 70)
P (X1 + X2 Y1 Y2 Y3 < 0) = FZ 0.642
2 225 + 3 100

12
7. In a betting game, Alex has 21 probability of losing 1 dollar, 41 probability
of winning 1 dollar and 14 probability of winning 2 dollars every bet. Estimate
the probability that his net win is at most 20 dollars after 100 bets.

Solution:

Let Xk count the amount won from the k-th bet. Then the expected value
and variance are
1 1 1 1
E(Xk ) = 1 +1 +2 =
2 4 4 4
 2
1 1 1 1 27
var(Xk ) = (1)2 + 12 + 22 =
2 4 4 4 16

By the Central Limit Theorem, X1 +X2 + +X100 is approximately normally


1 27 675
distributed with mean 100 = 25 and variance 100 = . Thus
4 16 4
!
20 + 21 25
P (X1 + X2 + + X100 20) FZ p 0.365
675/4

13