CVEN2002 Lecture3

Lecture 3
3. Special random variables
Lecture 3 Special random variables 2 / 49

Lecture 3
Special random variables
The Binomial distribution
The Poisson distribution
The Uniform distribution
The Exponential distribution
Other useful distributions
The Normal distribution
Checking if data are normally distributed
Additional reading: Sections 1.4-6, 2.4 in the textbook

Introduction
In practice, certain “types” of random variables come up over and over
again from similar (random) experiments.
In this chapter, we will study a variety of those special random
variables.
Try disttool on Matlab to get a feel for these special distributions.
Or you could try the following interactive apps:
http://demonstrations.wolfram.com/BinomialDistribution/
http://demonstrations.wolfram.com/PoissonDistribution/
https://demonstrations.wolfram.com/TheNormalDistribution/
https://demonstrations.wolfram.com/TheExponentialDistribution/
http://demonstrations.wolfram.com/TheContinuousUniformDistribution/

3.1 The Binomial distribution
Consider the following random experiments and random variables:
Flip a coin 10 times. Let X = number of heads obtained
A worn machine tool produces defective parts 1% of the time . Let
X = number of defective parts in the next 25 parts produced
Each sample of air has a 10% chance of containing a particular
molecule. Let X = the number of air samples that contain the
molecule in the next 18 samples analysed
Of all bits transmitted through a digital channel, 15% are received
in error. Let X = the number of bits in error in the next five bits
transmitted
A multiple-choice test contains 10 questions, each with 4 choices,
and you guess at each question. Let X = the number of questions
answered correctly
→ Similar experiments, similar random variables
→ A general framework that include these experiments as particular
cases would be very useful
Assume:
the outcome of a random experiment can be classified as either a
“Success” or a “Failure” (→ S = {Success, Failure})
we observe a Success with probability π
n independent repetitions of this experiment are performed
Define X = number of Successes observed over the n repetitions. We
say that X is a binomial random variable with parameters n and π:
X ∼ Bin(n, π)
See that SX = {0, 1, 2, . . . , n} (→ discrete r.v.) and the binomial

probability mass function is given by

n x
p(x) = π (1 − π)n−x , for x ∈ SX
x
where xn is the number of different groups of x objects that can be

chosen from a set of n objects.

Note: the coefficients xn = n!/(x!(n − x)!) are called the binomial

coefficients, they are the coefficients arising in Newton’s famous

binomial expansion.
n
X n k n−k
(a + b)n = a b
k
k =0
These coefficients are often represented in Pascal’s triangle, named

after the French mathematician Blaise Pascal (1623-1662)

The Binomial distribution: pmf and cdf
p = 0.1 p = 0.2 p = 0.5 p = 0.8 p = 0.9
1.0
1.0
1.0
1.0
1.0
0.8
0.8
0.8
0.8
0.8
0.6
0.6
0.6
0.6
0.6
● ●
pX(x)
pX(x)
pX(x)
pX(x)
pX(x)
0.4
0.4
0.4
0.4
0.4
● ●
● ● ● ●
● ●
0.2
0.2
0.2
0.2
0.2
● ●
● ●
● ●
● ●
● ●
0.0
0.0
0.0
0.0
0.0
● ● ● ● ● ● ● ● ● ●
−1 0 1 2 3 4 5 6 −1 0 1 2 3 4 5 6 −1 0 1 2 3 4 5 6 −1 0 1 2 3 4 5 6 −1 0 1 2 3 4 5 6
x x x x x
1.0
1.0
1.0
1.0
1.0
● ●
● ● ● ● ●
● ● ● ● ●
● ●
● ●
● ●
0.8
0.8
0.8
0.8
0.8
● ●
● ●
● ●
0.6
0.6
0.6
0.6
0.6
● ●
FX(x)
FX(x)
FX(x)
FX(x)
FX(x)
● ●
0.4
0.4
0.4
0.4
0.4
● ●
● ●
● ●
0.2
0.2
0.2
0.2
0.2
● ●
● ●
● ●
● ●
0.0
0.0
0.0
0.0
0.0
● ● ● ● ●
● ● ● ● ●
● ●
−1 0 1 2 3 4 5 6 −1 0 1 2 3 4 5 6 −1 0 1 2 3 4 5 6 −1 0 1 2 3 4 5 6 −1 0 1 2 3 4 5 6
x x x x x
Binomial pmf and cdf, for n = 5 and π = {0.1, 0.2, 0.5, 0.8, 0.9}

The Bernoulli distribution
Particular case: if n = 1 → the Bernoulli distribution,
named after the Swiss scientist Jakob Bernoulli (1654-1705)
X ∼ Bern(π)
pmf: 
1 − π if x = 0
p(x) = π if x = 1
0 otherwise

Note: if X ∼ Bin(n, π), we can represent it as

n
X
X = Xi
i=1
where Xi ’s are n independent Bernoulli r.v. with parameters π

→ Each repetition of the experiment in the Binomial framework is
called a Bernoulli trial
Binomial distribution: properties
First note that
n
X X n
p(x) = π x (1 − π)n−x = (π + (1 − π))n = 1
x
x∈SX x=0
using the binomial expansion

Second, it is easy to see that if X1 ∼ Bin(n1 , π), X2 ∼ Bin(n2 , π) and X1
is independent of X2 , then
X1 + X2 ∼ Bin(n1 + n2 , π)

Binomial distribution: expectation and variance
Recall the representation X = ni=1 Xi , with Xi ∼ Bern(π)
P
We know that
E(Xi ) = π and Var(Xi ) = π(1 − π)

Pn Pn Pn
It follows E(X ) = E i=1 Xi = i=1 E(Xi ) = i=1 π = nπ and
n n n
!
ind.
X X X
Var(X ) = Var Xi = Var(Xi ) = π(1 − π) = nπ(1 − π)
i=1 i=1 i=1
Mean and variance of the binomial distribution

If X ∼ Bin(n, π),
µ = E(X ) = nπ and σ 2 = Var(X ) = nπ(1 − π)

Example
It is known that disks produced by a certain company will be defective with
probability 0.01 independently of each other. The company sells the disk in
packages of 10 and offers a money-back guarantee if more than 1 of the
disks are defective. a) In the long-run, what proportion of packages is
returned? b) If someone buys three packages, what is the probability that
exactly one of them will be returned?

Example (Ex. 54 p.54 in the textbook)
Suppose that 10% of all bits transmitted through a digital communication
channel are erroneously received and that whether any is erroneously
received is independent of whether any other bit is erroneously received.
Consider sending a large number of messages, each consisting of 20 bits. a)
What proportion of these messages will have exactly 2 erroneously received
bits? b) What proportion of these messages will have at least 5 erroneously
received bits? c) What proportion of these messages will more than half the
bits be erroneously received?
Let X be the number of erroneously received bits in a message of 20 bits.

Clearly, we have X ∼ Bin(20, 0.1). Thus we have
P(X = 2) = 20
2 18
2 0.1 0.9 = 0.2852
a)
b) P(X ≥ 5) = 1 − P(X = 0) − P(X = 1) − P(X = 2) − P(X = 3) − P(X =

4) = . . .
c) P(X > 10) = P(X = 11) + P(X = 12) + . . . + P(X = 20) = . . .
→ very tedious! → use statistical software

3.2 The Poisson distribution
Assume you are interested in the number of occurrences of some
random phenomenon in a fixed period of time.
Define X = number of occurrences. We say that X is a Poisson
random variable with parameter λ, i.e.
X ∼ P(λ),
if
λx
p(x) = e−λ for x ∈ SX = {0, 1, 2, . . .}
x!
Note: Simeon-Denis Poisson

(1781-1840) was a French
mathematician

Poisson distribution: how does it arise?
Think of the time period of interest as being split up into a large number,
say n, of sub-periods
Assume that the phenomenon could occur at most one time in each of
those subperiods, with some constant probability π
If what happens within one interval is independent to others,
X ∼ Bin(n, π)
Now, as n increases, π should decrease (the shorter the period, the less
likely the occurrence of the phenomenon) → let π = λ/n for some λ > 0
Then, for any x ∈ {0, 1, . . . , n},
x n−x
n! λ λ
P(X = x) = 1− (binomial pmf)
x!(n − x)!n n
n
λx

n! λ
= x 1−
n (n − x)!(1 − λ/n)x x! n

Finally, as n → ∞
n
n! λ
→1 and 1− → e−λ
n (n − x)!(1 − λ/n)x
x n
Therefore,
λx
P(X = x) = e−λ for x ∈ {0, 1, . . .}
x!
which is the Poisson pmf as given on Slide 16
The Poisson distribution is thus suitable for modelling the number of
occurrences of a random phenomenon satisfying some assumptions of
continuity, stationarity, independence and non-simultaneity
λ is called the intensity of the phenomenon
Note: we defined the P(λ) distribution by partitioning a time period, however
the same reasoning can be applied to any interval, area or volume

Poisson distribution: pmf and cdf
lambda = 0.1 lambda = 0.5 lambda = 1 lambda = 2 lambda = 10
1.0
1.0
1.0
1.0
1.0
●
0.8
0.8
0.8
0.8
0.8
0.6
0.6
0.6
0.6
0.6
●
pX(x)
pX(x)
pX(x)
pX(x)
pX(x)
0.4
0.4
0.4
0.4
0.4
● ●
●
● ●
0.2
0.2
0.2
0.2
0.2
● ●
●
● ●
● ●
● ● ● ●
● ●
● ●
●
● ● ●
● ● ● ●
0.0
0.0
0.0
0.0
0.0
● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
x x x x x
1.0
1.0
1.0
1.0
1.0
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ●
● ● ● ●
● ● ● ● ● ●
● ●
● ● ● ●
0.8
0.8
0.8
0.8
0.8
● ●
● ●
● ●
● ●
0.6
0.6
0.6
0.6
0.6
● ●
● ●
FX(x)
FX(x)
FX(x)
FX(x)
FX(x)
● ●
0.4
0.4
0.4
0.4
0.4
● ●
● ●
● ●
● ●
0.2
0.2
0.2
0.2
0.2
● ● ● ●
● ●
● ●
0.0
0.0
0.0
0.0
0.0
● ●
● ● ● ● ● ● ● ●
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
x x x x x
Poisson pmf and cdf, for λ = {0.1, 0.5, 1, 2, 10}

Poisson distribution: properties
First we have, as expected,
∞ x ∞
X X
−λ λ −λ
X λx
p(x) = e =e = e−λ eλ = 1
x! x!
x∈SX x=0 x=0
Similarly,
∞ ∞
X X λx X λx−1
E(X ) = xp(x) = xe−λ =λ e−λ =λ
x! (x − 1)!
x∈SX x=0 x=1
X
E(X 2 ) = x 2 p(x) = . . . = λ2 + λ
x∈SX
→ Var(X ) = E(X 2 ) − (E(X ))2 = λ2 + λ − λ2 = λ
Mean and variance of the Poisson distribution

If X ∼ P(λ),
E(X ) = λ and Var(X ) = λ
Example
Over a 10-minute period, a counter records an average of 1.3 gamma
particles per millisecond coming from a radioactive substance. To a good
approximation, the distribution of the count, X , of gamma particles during the
next millisecond is Poisson distributed. Determine a) λ, b) the probability of
observing one or more gamma particles during the next millisecond and c)
the variance of this number.

Example (Ex. 56 p.55 in the textbook)
Suppose that the number of drivers who travel between a particular origin
and destination during a designated time period has Poisson distribution with
parameter λ = 20. In the long-run, in what proportion of time periods will the
number of drivers a) be at most 10? b) exceed 20? c) be between 10 and 20,
inclusive? Strictly between 10 and 20?
Let X be the number of drivers. It is given that X ∼ P(20)

a)
P(X ≤ 10) = P(X = 0) + P(X = 1) + . . . + P(X = 10)

202 2010
= e−20 + e−20 × 20 + e−20 + . . . + e−20
2 10!
= ...
→ tedious ! → use statistical software

Poisson approximation to the Binomial distribution
Since it was derived as a limit case of the Binomial distribution when n
is ‘large’ and π is ‘small’, one can expect the Poisson distribution to be
a good approximation to Bin(n, π) in that case.
As it involves only one parameter, the Poisson pmf or the Poisson
tables are usually easier to handle than the corresponding Binomial
pmf and tables.
Example
It is known that 1% of the books at a certain bindery have defective bindings.
Compare the probabilities that x (x = 0, 1, 2, . . .) of 100 books will have
defective bindings using the (exact) formula for the binomial distribution and
its Poisson approximation.
The exact Binomial pmf is p(x) = 100 × 0.01x × 0.99100−x , while its Poisson

x
approximation is xλ
p∗ (x) = e−λ
λ!
with λ = n × p = 100 × 0.01 = 1
Poisson distribution: examples
Matlab computations give:
We see that the error we would make by using the Poisson

approximation instead of the true distribution is only of order 10−3
→ very good approximation
3.3 The Uniform distribution
There are also numerous continuous distributions which are of great
interest. The simplest one is certainly the uniform distribution.
A random variable is said to be uniformly distributed over an interval
[α, β], i.e.
X ∼ U[α,β]
if its probability density function is given by

1
if x ∈ [α, β]
f (x) = β−α (→ SX = [α, β])
0 otherwise
Constant density → X is just as likely to be “close” to any value in SX .

By integration, it is easy to show that

 0 if x < α
x−α
F (x) = if α ≤ x ≤ β
 β−α
1 if x > β

The Uniform distribution: cdf and pdf
1.0
0.8
1
β−α
0.6
FX(x)
fX(x)
0.4
0.2
0.0
α β α β
x x
cdf F (x) pdf f (x) = F 0 (x)

Uniform distribution: mean and variance
Note that the Uniform density is constant at 1/(β − α) on [α, β] so as to
Rβ
ensure that α f (x) dx = 1
Now,
β 2 β
β 2 − α2
Z
1 1 x α+β
E(X ) = x dx = = =
α β−α β − α 2 α 2(β − α) 2
Similarly,
β 3 β
β 3 − α3 β 2 + αβ + α2
Z
2 12 1 x
E(X ) = x dx = = =
α β−α β − α 3 α 3(β − α) 3
(β−α)2
which implies Var(X ) = E(X 2 ) − (E(X ))2 = . . . = 12
Mean and variance of the Uniform distribution

If X ∼ U[α,β] ,
α+β (β − α)2
E(X ) = and Var(X ) =
2 12
Uniform distribution: properties
The probability that X lies in any subinterval [a, b] of [α, β] is:
b−a
P(a < X < b) = (area of a rectangle)
β−α
If X ∼ U[a,b] then any linear transformation of X is also uniform. That

is, if Y = c + dX (for d 6= 0) then
Y ∼ U[c+ad,c+bd]

Example
Buses arrive at a specified stop at 15-minute intervals starting at 7 A . M . That
is, they arrive at 7, 7:15, 7:30, 7:45, etc. If a passenger arrives at the stop at a
time uniformly distributed between 7 and 7:30, find the probability that he
waits less than 5 minutes for a bus

3.4 The Exponential distribution

Recall that a Poisson distributed r.v. counts the number of
occurrences of a given phenomenon over a unit period of time
The (random) amount of time before the first occurrence of that
phenomenon is often of interest as well
If N ∼ P(λ) denote the number of occurrences over a unit period
of time, then the number of occurrences of the phenomenon by a
time x, say Nx , is ∼ P(λx) (“Poisson process”)
Denote X the amount of time before the first occurrence
This time will exceed x (x ≥ 0) if and only if there have been no
occurrences of the phenomenon by time x, that is, Nx = 0
0
As Nx ∼ P(λx), it follows P(X > x) = P(Nx = 0) = e−λx (λx)
0! = e
−λx ,
which yields the cdf of X :
F (x) = P(X ≤ x) = 1 − e−λx for x ≥ 0
This particular distribution is called the Exponential distribution.

A random variable is said to be an Exponential random variable with
parameter µ (µ > 0), i.e.
X ∼ Exp(µ),
x
(
1 −µ
e if x ≥ 0
f (x) = µ (→ SX = R+ )
0 otherwise
By integration, we find
(
0 if x < 0
F (x) = −x
1−e µ if x ≥ 0
This distribution is often useful for representing random amounts of

time, like the amount of time required to complete a task, the waiting
time at a counter, the amount of time until you receive a phone call, etc.
Note: the parameter µ is related to λ by µ = 1/λ.

The Exponential distribution: cdf and pdf
λ
FX(x)
fX(x)
1/2 ●
0 ln 2
0 λ 0
x x

Exponential distribution: properties
We can check that (as expected)
#+∞
−x
"
Z +∞ Z +∞
1 − µx 1 e µ
f (x) dx = e dx = =1
−∞ 0 µ µ (− µ1 )
0
Moreover,
Z +∞ Z +∞
1 − µx h x
i+∞ x
E(X ) = e dx = −x e− µ
x + e− µ dx (by parts)
0 µ 0 0
" x
#+∞
e− µ
=0+ − 1 =µ
µ 0
x
R +∞ −µ
Similarly, E(X 2 ) = 0 x 2e dx = . . . = 2µ2 , so that Var(X ) = µ2
Mean and variance of the Exponential distribution

If X ∼ Exp(µ),
E(X ) = µ and Var(X ) = µ2
Example
Suppose that, on average, 3 trucks arrive per hour to be unloaded at a
warehouse. What is the probability that the time between the arrivals of two
successive trucks will be a) less than 5 minutes? b) at least 45 minutes?

3.5 Other useful distributions
In the remainder of this course we will also encounter some other

continuous distributions, among these are
the Student-t (or just t) distribution, X ∼ tν ;
the χ2 distribution, X ∼ χ2ν ;
the Fisher-F (or just F ) distribution, X ∼ Fd1 ,d2
We will return to them later when we will need them.
The several distributions that we have introduced so far are very useful
in the application of statistics to problems of engineering and physical
science.

3.6 The Normal distribution
The most widely used, and therefore the most important, statistical
distribution is undoubtedly the
Normal distribution
Its prevalence was first highlighted when it was observed that in many
natural processes, random variation among individuals systematically
conforms to a particular pattern:
most of the observations concentrate around one single value
(which is the mean)
the number of observations smoothly decreases, symmetrically on
either side, with the deviation from the mean
it is very unlikely, yet not impossible, to find very extreme values
→ this yields the famous bell-shaped curve

The bell-shaped curve was first spotted by the French mathematician
Abraham de Moivre (1667-1754) who in his 1738book “The Doctrine
of Chances” showed that the coefficients Ckn = kn in the binomial
expansion of (a + b)n (see Slide 7) precisely follow the bell shape
pattern when n is large
n= 5 n= 10
250
10
● ● ●
● ●
200
8
150
6
Cnk
Cnk
● ●
● ●
100
4
50
2
● ●
● ●
● ●
● ●
0
0
0 1 2 3 4 5 0 2 4 6 8 10
k k
n= 20 n=1000
● ●●●●●
●● ●●
● ●
● ●
● ● ● ●
150000
● ●
● ●
2.0e+299
● ●
● ●
● ●
● ●
● ● ● ●
● ●
100000
● ●
● ●
● ●
Cnk
Cnk
● ●
1.0e+299
● ●
● ● ● ●
● ●
● ●
50000
● ●
● ●
● ●
● ●
● ● ● ●
● ●
● ●
● ●
● ●●
●● ●●
● ● ●●
0.0e+00
●●
●●●
● ●●●
● ●●●●
● ● ●●●● ●●●●●●●●●●●●●●
● ● ● ● ● ● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
0
0 5 10 15 20 400 450 500 550 600
k k

Later, Carl-Friedrich Gauss (1777-1855), a German mathematician
(sometimes referred to as the Princeps mathematicorum, Latin for "the Prince
of Mathematicians" or "the foremost of mathematicians") , was the first to
write an explicit equation for the bell-shaped curve:
1
2π
1 2
φ(x) = √ e−x /2
φ(x)
2π
−2 −1 0 1 2
When deriving his distribution, Gauss was primarily interested in errors of

measurement, whose distribution typically follows the bell-shaped curve as
well. He called his curve the “normal curve of errors”, which was to become
the Normal distribution. In honour of Gauss, the Normal distribution is also
often referred to as the Gaussian distribution

It is important to note that the Normal distribution is not just a
convenient mathematical tool, but also occurs in natural phenomena.
For instance, in 1866 Maxwell, a Scottish physicist, determined the
distribution of molecular velocity in a gas at equilibrium. As a result of
unpredictable collisions with other molecules, molecular velocity in a
given direction is randomly distributed, and from basic assumptions,
that distribution can be shown to be the Normal distribution
James Maxwell (1831-1879)

Another famous example is the “bean machine”, invented by Sir
Francis Galton (English scientist, 1822-1911) to demonstrate the
Normal distribution. The machine consists of a vertical board with
interleaved rows of pins. Balls are dropped from the top, and bounce
left and right as they hit the pins. Eventually, they are collected into
bins at the bottom. The height of ball columns in the bins
approximately follows the bell-shaped curve.

The Normal distribution: definition
A random variable is said to be normally distributed with parameters µ
and σ (σ > 0), i.e.
X ∼ N (µ, σ),
1 (x−µ)2
−
f (x) = √ e 2σ 2 (→ SX = R)
2πσ
Unfortunately, no closed form exists for
Z x (y −µ)2
1 −
F (x) = √ e 2σ2 dy
2πσ −∞
Important remark: Be careful! Many sources use the alternative
notation
X ∼ N (µ, σ 2 )
→ in the textbook and in Matlab, the notation N (µ, σ) is used, so we
adopt it in these slides as well
The Normal distribution: cdf and pdf
1
2πσ2
FX(x)
fX(x)
1/2
0
µ − 2σ µ−σ µ µ+σ µ + 2σ µ − 2σ µ−σ µ µ+σ µ + 2σ
x x

Normal distribution: properties
It can be shown that, for any µ and σ,
Z Z +∞ (x−µ)2
1 −
f (x) dx = √ e 2σ2 dx = 1
SX −∞ 2πσ
Similarly, we can find

Z +∞ (x−µ)2
1 −
E(X ) = √ xe 2σ 2 dx = µ
2πσ −∞
and Z +∞ (x−µ)2
1 −
Var(X ) = √ (x − µ)2 e 2σ 2 dx = σ 2
2πσ −∞
Mean and variance of the Normal distribution

If X ∼ N (µ, σ),
E(X ) = µ and Var(X ) = σ 2 (→ sd(X ) = σ)

An important observation is that all normal probability distribution
functions have the same bell shape
They only differ in where they are centred (at µ) and in their spread
(quantified by σ).
N(0,1) N(40,2.5)
0.4
0.15
0.3
0.10
f(x)
f(x)
0.2
0.05
0.1
0.00
0.0
−3 −2 −1 0 1 2 3 35 40 45
x x
N(70,10) N(10,5)
0.04
0.08
0.03
0.06
0.02
0.04
f(x)
f(x)
0.01
0.02
0.00
0.00
40 50 60 70 80 90 100 −5 0 5 10 15 20 25
x x

In Matlab, key in disttool and play with the interactive probability
function display tool (see Lab Week 6)

The Standard Normal distribution
The Standard Normal distribution is the Normal distribution with
µ = 0 and σ = 1. This yields
1 x2
f (x) = √ e− 2
2π
Usually, in this situation, the specific notation
. .
f (x) = φ(x) and F (x) = Φ(x)
is used, and a standard normal random variable is usually denoted Z .

It directly follows from the preceding that
E(Z ) = 0 and Var(Z ) = 1 (= sd(Z ))

The Standard Normal distribution: cdf and pdf
1
2π
Φ(x)
φ(x)
1/2
−2 −1 0 1 2 −2 −1 0 1 2
x x
cdf Φ(x) pdf φ(x) = Φ0 (x)

Normal distribution: standardisation
It is clear from the expression and the shape of the Normal pdf that if
X ∼ N (µ, σ), then Y = aX + b is normally distributed with mean
E(Y ) = aµ + b and variance Var(Y ) = a2 σ 2 .
The following result directly follows from the foregoing:
Property: Standardisation
If X ∼ N (µ, σ), then
X −µ
Z = ∼ N (0, 1)
σ
This linear transformation is called the standardisation of the normal
random variable X , as it transforms X into a standard normal random
variable Z .
Standardisation will play a paramount role in what follows.

Normal distribution: 68-95-99 rule
This extremely important fact allows us to deduce any required
information for a given Normal distribution N (µ, σ) from the features of
the ‘simple’ standard normal distribution.
For instance, for the standard pdf φ(x), it can be found that
R1
φ(x) dx = P(−1 < Z < 1) ' 0.6827
R−12
φ(x) dx = P(−2 < Z < 2) ' 0.9545
R−23
−3 φ(x) dx = P(−3 < Z < 3) ' 0.9973
This automatically translates to the general case X ∼ N (µ, σ):
P(µ − σ < X < µ + σ) ' 0.6827

P(µ − 2σ < X < µ + 2σ) ' 0.9545
P(µ − 3σ < X < µ + 3σ) ' 0.9973
This is known as the 68-95-99 rule for normal distributions.

For instance, suppose we are told that women’s heights in a given
population follow a normal distribution with mean µ = 64.5 inches and
σ = 2.5 inches
→ we expect 68.27 % of women to be between

µ − σ = 64.5 − 2.5 = 62 inches and µ + σ = 64.5 + 2.5 = 67 inches
tall, etc.

Normal distribution: remark
Theoretically, the domain of variation SX of a normally distributed
random variable X is R = (−∞, +∞)
However, there is a 99.7% chance to find X between µ − 3σ and

µ + 3σ
It almost impossible to find X outside that interval, and virtually
impossible to find it much further away from µ
→ There is in general no problem in modelling the distribution of a
positive quantity with a Normal distribution, provided µ is large
compared to σ
Typical examples include weight, height, or IQ of people
This also explains why 6σ is sometimes called the width of the
normal distribution

Example
Suppose that Z ∼ N (0, 1). What is P(Z ≤ 1.25)?
In principle, this should be given by

Z 1.25
1 2
P(Z ≤ 1.25) = Φ(1.25) = √ e−x /2 dx
−∞ 2π
However, we know that this integral cannot be evaluated analytically.
→ we must use software (command normcdf in Matlab) or the Standard
Normal table∗ .
This probability is the ‘area under the standard normal curve to the left of z’,
that is
P(Z ≤ z) = Φ(z)
Matlab gives: P(Z ≤ 1.25) = Φ(1.25) = 0.8944
∗ We don’t use Standard Normal Table in this course.

Any other kind of probabilities must be written in terms of P(Z ≤ z).
Example: standard normal probabilities

Suppose that Z ∼ N (0, 1). What is P(Z < 1.25)? What is P(Z > 1.25)?
What is P(−0.38 ≤ Z < 1.25)?

Example 1.16 p.38 (textbook)
The time it takes a driver to react to the brake light on a decelerating vehicle
follows a Normal distribution having parameters µ = 1.25 sec and σ = 0.46
sec. In the long run, what proportion of reaction times will be between 1 and
1.75 sec? (Hint: normcdf(1.09) = 0.8621, normcdf(-0.54) =
0.2946.)

Example: coffee
The actual amount of instant coffee that a filling machine puts into “4-ounce”
jars may be looked upon as a random variable having a normal distribution
with σ = 0.04 ounce. If only 2% of the jars are to contain less than 4 ounces,
what should be the mean fill of these jars? (Hint: normcdf(-2.05) =
0.0202.)

Example: Slow reaction times
Assume again that the time it takes a driver to react to the brake light on a
decelerating vehicle follows a Normal distribution having parameters µ = 1.25
sec and σ = 0.46 sec. How long must reaction time be if it is to fall among the
slowest 10% of reaction times? (Hint: normcdf(1.28) ' 0.90.)

Normal distribution: quantiles
As in the previous example, we are sometimes given a probability and
asked to find the corresponding value z
0.4
For instance, for any α ∈ (0, 1), let
0.3
zα be such that
φ(x)
0.2
P(Z > zα ) = 1 − α.
i.e., P(Z < zα ) = α,
0.1
for Z ∼ N (0, 1) 1−α
0.0
−3 −2 −1 0 1 2zα 3
x
This value zα is called the quantile of level α of the standard normal

distribution. Normal quantiles can be calculated using Matlab via the
norminv function.
Some particular quantiles will be used extensively in subsequent
chapters. These are the quantiles of level 0.95, 0.975 and 0.995:
P(Z > 1.645) = 0.05, P(Z > 1.96) = 0.025, P(Z > 2.575) = 0.005
0.4
0.4
0.4
0.3
0.3
0.3
φ(x)
φ(x)
φ(x)
0.2
0.2
0.2
0.1
0.1
0.1
0.05
0.025
0.005
0.0
0.0
0.0
0 1.645 0 1.96 0 2.575
x x x
Note: by symmetry of the normal pdf, it is easy to see that
z1−α = −zα
→ for instance, P(Z < −1.96) = 0.025

Some further properties of the Normal distribution
We know that if X ∼ N (µ, σ), then aX + b is also normally distributed,
for any real values a and b.
This generalises further: if X1 ∼ N (µ1 , σ1 ) and X2 ∼ N (µ2 , σ2 ), and X1
and X2 are independent, then aX1 + bX2 is also normally distributed for
any real values a and b.
Also, we can compute the parameters of the resulting distribution.
Property
Suppose X1 ∼ N (µ1 , σ1 ), X2 ∼ N (µ2 , σ2 ) and X1 and X2 are
independent. Then, for any real values a and b,
q
aX1 + bX2 ∼ N aµ1 + bµ2 , a2 σ12 + b2 σ22

As a direct application of the preceding property, we have, with
X1 ∼ N (µ1 , σ1 ) and X2 ∼ N (µ2 , σ2 ), X1 and X2 independent,
q q
X1 +X2 ∼ N µ1 + µ2 , σ12 + σ22 , X1 −X2 ∼ N µ1 − µ2 , σ12 + σ22
Besides, the previous property can be readily extended to an arbitrary

number of independent normally distributed random variables.
Example: sum of normal variables

Let X1 , X2 , X3 represent the times necessary to perform three successive
repair tasks at a certain service facility. Suppose they are independent normal
random variables with expected values µ1 , µ2 and µ3 and variances σ12 , σ22
and σ32 , respectively. What can be said about the distribution of X1 + X2 + X3 ?
From the previous property, we can conclude that

q
2 2 2
X1 + X2 + X3 ∼ N µ1 + µ2 + µ3 , σ1 + σ2 + σ3

Example: sum of normal variables (ctd.)
If µ1 = 40 min, µ2 = 50 min and µ3 = 60 min, and σ12 = 10 min2 , σ22 = 12
min2 and σ32 = 14 min2 , what is the probability that the full task would take
less than 160 min? (Hint: normcdf(1.67) = 0.9525.)

3.7 Checking if data are normally distributed
Fact
Many of the statistical techniques presented in the coming chapters
are based on an assumption that the distribution of the random
variable of interest is (almost) normal.
→ In many instances, we will need to check whether a data set

appears to be generated by a normally distributed random variable
How do we do that ?
Although they involve an element of subjective judgement, graphical
procedures are the most helpful for detecting serious departures from
normality.
Some of the visual displays we have used earlier, such as the density
histogram, can provide a first insight about the form of the underlying
distribution.

Density histograms to check for normality
Think of a density histogram as a piecewise constant function hn (x),
where n is the number of observations in the data set
→then, if the r.v. X having generated the data has density f on a
support SX , it can be shown that, under some regularity
assumptions, for any x ∈ SX ,
hn (x) → f (x)
as n → ∞ (and the number of classes → ∞)

(the convergence “hn (x) → f (x)” has to be understood in a particular
probabilistic sense, but details are beyond the scope of this course)
Concretely, the larger the number of observations, the more
similar the density histogram and the ‘true’ (unknown) density f
are
→ look at the histogram and decide whether it looks enough like the
symmetric ‘bell-shaped’ normal curve or not
Suppose we have a data set of size n = 50, drawn from a normal
distribution
density data set (dotchart) : n=50 Density histogram
0.15
0.15
0.10
0.10
Density
f(x)
0.05
0.05
0.00
0.00
●● ●●● ● ●●● ●
●
●● ●
●
●●●●
●●
● ●
● ●●
●
●●●● ●●●● ●
●
●●● ●● ● ●● ●
●● ●●● ● ●●● ●
●
●● ●
●
●●●●
●●
● ●
● ●●
●
●●●● ●●●● ●
●
●●● ●● ● ●● ● ●● ●●● ● ●●● ●
●
●● ●
●
●●●●
●●
● ●
● ●●
●
●●●● ●●●● ●
●
●●● ●● ● ●● ●
35 40 45 35 40 45 35 40 45
x x x
→ the density histogram ‘looks like’ the bell-shaped curve

Also, as both f (x) and the density histogram are scaled such that the
blue-ish purple areas are 1, they are easily superimposed and
compared.

Look again at the (density) Look again at the density
histogram on Slide 24, Lecture 2 histogram on Slide 33, Lecture 2
0.15
0.08
0.06
0.10
Density
Density
0.04
0.05
0.02
0.00
0.00
5 10 15 20 0 20 40 60 80
→ the histogram is symmetric and → clear lack of symmetry (skewed

bell-shaped, without outliers to the right)
→ the normality assumption is → serious departure from
reasonable normality (Exponential?)

Quantile plots
Density histograms are easy to use; however, they are usually not
really reliable indicators of the distributional form unless the number of
observations is very large.
→ another special graph, called a normal quantile plot, is more
effective in detecting departure from normality
The plot essentially compares the data ordered from smallest to
largest with what to expect to get for the smallest to largest in a sample
if the theoretical distribution from which the data have come is normal
→if the data were effectively selected from the normal distribution, the
two sets of values should be reasonably close to one another
Note: a quantile plot is also sometimes called qq-plot
(→ command in Matlab: qqplot)

Procedure for building a quantile plot:
observations {x1 , x2 , . . . , xn }
ordered observations: x(1) ≤ x(2) ≤ . . . ≤ x(n)
i−0.5
cumulative probabilities αi = n , for all i = 1, . . . , n
standard normal quantiles of level αi : for all i = 1, . . . , n, zαi
chosen such that P(Z ≤ zαi ) = αi , where Z ∼ N (0, 1)
Quantile plot: plot the n pairs (x(i) , zαi )
If the sample comes from the Standard Normal distribution, x(i) ' zαi
and the points would fall close to a 45◦ straight line passing by (0,0).
If the sample comes from some other normal distribution, the points
would still fall around a straight line, as there is a linear relationship
between the quantiles of N (µ, σ) and the standard normal quantiles.
Fact
If the sample comes from some normal distribution, the points should
follow (at least approximately) a straight line.

Quantile plots: examples
The figure below displays quantile plots for the previous two examples
quantile plot quantile plot
● ●
2
●
●
2
●
● ●
● ●
●
● ●
● ●
●
● ●
● ●
●●
1
●
●●
1
●
●●
● ●
● ● ●
●
● ●
● ● ●
●
● ●
Theoretical Quantiles
Theoretical Quantiles
●● ●
●
● ●
●● ●
●
●●
● ●
●
● ●
●● ● ●
●
● ●
●● ●
●●
● ●●
0
0
●
●
●●
● ●
●
●
● ●●
●● ●
●●
●
●
●
●
● ●
●
● ●
●
● ●
●●
● ●
●
●●
● ●
●
● ●
●● ●
●
−1
●
●● ●
−1
● ●
● ●
●
● ●
●
● ●
●
● ●
● ●
●
−2
●
●
● −2 ●
5 10 15 0 20 40 60 80
Sample Quantiles Sample Quantiles
→ the normality assumption appears acceptable for the first data set,
not at all for the second
Transforming observations
When the density histogram and the qq-plot indicate that the
assumption of a normal distribution is invalid, transformations of the
data can often improve the agreement with normality (see lab classes
2 & 4).
Scientists regularly express their observations in natural logs.
Let’s look again at the seeded clouds rainfall data.

Apart from the log, other transformations may be useful:
−1 √ √
, x, 4
x, x 2, x3
x
If the observations are positive and the distribution has
√ a long tail on
the right, then concave transformations like log(x) or x put the large
values down farther than they pull the central or small values
→ observations ‘more symmetric’
Convex transformations work the other way
If the transformed observations are approximately normal (check with
a quantile plot), it is usually advantageous to use the normality of this
new scale to perform any statistical analysis.

Objectives
Now you should be able to:
Understand the assumptions for some common discrete probability
distributions
Select an appropriate discrete probability distribution to calculate
probabilities in specific applications
Calculate probabilities, determine means and variances for some
common discrete probability distributions
Understand the assumptions for some common continuous probability
distributions
Select an appropriate continuous probability distribution to calculate
probabilities in specific applications
Calculate probabilities, determine means and variances for some
common continuous probability distributions
Calculate probabilities, determine mean, variance and standard
deviation for normal distributions
Standardise normal random variables, and understand why this is useful
UseLecture
the 3cdf of a standard normal distribution
Special random variables to determine probabilities72 / 49
Recommended exercises
→ Q53 p.54, Q55 p.55, Q57 p.55, Q33 p.221, Q37 p.222, Q23 p.78,
Q19 p.31, Q23 p.31, Q62 p.56 (2nd edition)
→ Q54 p.56, Q56 p.57, Q58 p.57, Q35 p.226, Q39 p.227, Q24 p.79,
Q19 p.34, Q23 p.35, Q64 p.58 (3rd edition)

CVEN2002 Lecture3

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CVEN2002 Lecture3

Uploaded by

Copyright:

Available Formats

Lecture 3

3. Special random variables

Lecture 3 Special random variables 2 / 49

Additional reading: Sections 1.4-6, 2.4 in the textbook

Lecture 3 Special random variables 3 / 49

Lecture 3 Special random variables 4 / 49

See that SX = {0, 1, 2, . . . , n} (→ discrete r.v.) and the binomial

where xn is the number of different groups of x objects that can be

chosen from a set of n objects.

coefficients, they are the coefficients arising in Newton’s famous

These coefficients are often represented in Pascal’s triangle, named

Lecture 3 Special random variables 7 / 49

Lecture 3 Special random variables 8 / 49

Note: if X ∼ Bin(n, π), we can represent it as

where Xi ’s are n independent Bernoulli r.v. with parameters π

using the binomial expansion

Lecture 3 Special random variables 10 / 49

E(Xi ) = π and Var(Xi ) = π(1 − π)

Mean and variance of the binomial distribution

µ = E(X ) = nπ and σ 2 = Var(X ) = nπ(1 − π)

Lecture 3 Special random variables 11 / 49

Lecture 3 Special random variables 12 / 49

Let X be the number of erroneously received bits in a message of 20 bits.

b) P(X ≥ 5) = 1 − P(X = 0) − P(X = 1) − P(X = 2) − P(X = 3) − P(X =

Lecture 3 Special random variables 13 / 49

Note: Simeon-Denis Poisson

Lecture 3 Special random variables 15 / 49

Lecture 3 Special random variables 16 / 49

Lecture 3 Special random variables 17 / 49

Poisson pmf and cdf, for λ = {0.1, 0.5, 1, 2, 10}

Lecture 3 Special random variables 18 / 49

→ Var(X ) = E(X 2 ) − (E(X ))2 = λ2 + λ − λ2 = λ

Mean and variance of the Poisson distribution

Lecture 3 Special random variables 20 / 49

Let X be the number of drivers. It is given that X ∼ P(20)

P(X ≤ 10) = P(X = 0) + P(X = 1) + . . . + P(X = 10)

→ tedious ! → use statistical software

Lecture 3 Special random variables 21 / 49

We see that the error we would make by using the Poisson

if its probability density function is given by

Constant density → X is just as likely to be “close” to any value in SX .

Lecture 3 Special random variables 25 / 49

cdf F (x) pdf f (x) = F 0 (x)

Lecture 3 Special random variables 26 / 49

Mean and variance of the Uniform distribution

If X ∼ U[a,b] then any linear transformation of X is also uniform. That

Lecture 3 Special random variables 28 / 49

Lecture 3 Special random variables 29 / 49

Lecture 3 Special random variables 30 / 49

which yields the cdf of X :

F (x) = P(X ≤ x) = 1 − e−λx for x ≥ 0

This particular distribution is called the Exponential distribution.

Lecture 3 Special random variables 31 / 49

This distribution is often useful for representing random amounts of

Lecture 3 Special random variables 32 / 49

cdf F (x) pdf f (x) = F 0 (x)

Lecture 3 Special random variables 33 / 49

Mean and variance of the Exponential distribution

Lecture 3 Special random variables 35 / 49

In the remainder of this course we will also encounter some other

Lecture 3 Special random variables 36 / 49

Lecture 3 Special random variables 37 / 49

0 5 10 15 20 400 450 500 550 600