Professional Documents
Culture Documents
X ∼ Bin(n, π)
1.0
1.0
1.0
1.0
0.8
0.8
0.8
0.8
0.8
0.6
0.6
0.6
0.6
0.6
● ●
pX(x)
pX(x)
pX(x)
pX(x)
pX(x)
0.4
0.4
0.4
0.4
0.4
● ●
● ● ● ●
● ●
0.2
0.2
0.2
0.2
0.2
● ●
● ●
● ●
● ●
● ●
0.0
0.0
0.0
0.0
0.0
● ● ● ● ● ● ● ● ● ●
−1 0 1 2 3 4 5 6 −1 0 1 2 3 4 5 6 −1 0 1 2 3 4 5 6 −1 0 1 2 3 4 5 6 −1 0 1 2 3 4 5 6
x x x x x
1.0
1.0
1.0
1.0
1.0
● ●
● ● ● ● ●
● ● ● ● ●
● ●
● ●
● ●
0.8
0.8
0.8
0.8
0.8
● ●
● ●
● ●
0.6
0.6
0.6
0.6
0.6
● ●
FX(x)
FX(x)
FX(x)
FX(x)
FX(x)
● ●
0.4
0.4
0.4
0.4
0.4
● ●
● ●
● ●
0.2
0.2
0.2
0.2
0.2
● ●
● ●
● ●
● ●
0.0
0.0
0.0
0.0
0.0
● ● ● ● ●
● ● ● ● ●
● ●
−1 0 1 2 3 4 5 6 −1 0 1 2 3 4 5 6 −1 0 1 2 3 4 5 6 −1 0 1 2 3 4 5 6 −1 0 1 2 3 4 5 6
x x x x x
Binomial pmf and cdf, for n = 5 and π = {0.1, 0.2, 0.5, 0.8, 0.9}
X ∼ Bern(π)
pmf:
1 − π if x = 0
p(x) = π if x = 1
0 otherwise
X1 + X2 ∼ Bin(n1 + n2 , π)
We know that
n n n
!
ind.
X X X
Var(X ) = Var Xi = Var(Xi ) = π(1 − π) = nπ(1 − π)
i=1 i=1 i=1
X ∼ P(λ),
if
λx
p(x) = e−λ for x ∈ SX = {0, 1, 2, . . .}
x!
X ∼ Bin(n, π)
Now, as n increases, π should decrease (the shorter the period, the less
likely the occurrence of the phenomenon) → let π = λ/n for some λ > 0
Then, for any x ∈ {0, 1, . . . , n},
x n−x
n! λ λ
P(X = x) = 1− (binomial pmf)
x!(n − x)!n n
n
λx
n! λ
= x 1−
n (n − x)!(1 − λ/n)x x! n
Therefore,
λx
P(X = x) = e−λ for x ∈ {0, 1, . . .}
x!
which is the Poisson pmf as given on Slide 16
The Poisson distribution is thus suitable for modelling the number of
occurrences of a random phenomenon satisfying some assumptions of
continuity, stationarity, independence and non-simultaneity
λ is called the intensity of the phenomenon
Note: we defined the P(λ) distribution by partitioning a time period, however
the same reasoning can be applied to any interval, area or volume
1.0
1.0
1.0
1.0
●
0.8
0.8
0.8
0.8
0.8
0.6
0.6
0.6
0.6
0.6
●
pX(x)
pX(x)
pX(x)
pX(x)
pX(x)
0.4
0.4
0.4
0.4
0.4
● ●
●
● ●
0.2
0.2
0.2
0.2
0.2
● ●
●
● ●
● ●
● ● ● ●
● ●
● ●
●
● ● ●
● ● ● ●
0.0
0.0
0.0
0.0
0.0
● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
x x x x x
1.0
1.0
1.0
1.0
1.0
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ●
● ● ● ●
● ● ● ● ● ●
● ●
● ● ● ●
0.8
0.8
0.8
0.8
0.8
● ●
● ●
● ●
● ●
0.6
0.6
0.6
0.6
0.6
● ●
● ●
FX(x)
FX(x)
FX(x)
FX(x)
FX(x)
● ●
0.4
0.4
0.4
0.4
0.4
● ●
● ●
● ●
● ●
0.2
0.2
0.2
0.2
0.2
● ● ● ●
● ●
● ●
0.0
0.0
0.0
0.0
0.0
● ●
● ● ● ● ● ● ● ●
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
x x x x x
Similarly,
∞ ∞
X X λx X λx−1
E(X ) = xp(x) = xe−λ =λ e−λ =λ
x! (x − 1)!
x∈SX x=0 x=1
X
E(X 2 ) = x 2 p(x) = . . . = λ2 + λ
x∈SX
Example
It is known that 1% of the books at a certain bindery have defective bindings.
Compare the probabilities that x (x = 0, 1, 2, . . .) of 100 books will have
defective bindings using the (exact) formula for the binomial distribution and
its Poisson approximation.
The exact Binomial pmf is p(x) = 100 × 0.01x × 0.99100−x , while its Poisson
x
approximation is xλ
p∗ (x) = e−λ
λ!
with λ = n × p = 100 × 0.01 = 1
Lecture 3 Special random variables 23 / 49
Poisson distribution: examples
Matlab computations give:
1
β−α
0.6
FX(x)
fX(x)
0.4
0.2
0.0
α β α β
x x
b−a
P(a < X < b) = (area of a rectangle)
β−α
Y ∼ U[c+ad,c+bd]
By integration, we find
(
0 if x < 0
F (x) = −x
1−e µ if x ≥ 0
λ
FX(x)
fX(x)
1/2 ●
0 ln 2
0 λ 0
x x
Moreover,
Z +∞ Z +∞
1 − µx h x
i+∞ x
E(X ) = e dx = −x e− µ
x + e− µ dx (by parts)
0 µ 0 0
" x
#+∞
e− µ
=0+ − 1 =µ
µ 0
x
R +∞ −µ
Similarly, E(X 2 ) = 0 x 2e dx = . . . = 2µ2 , so that Var(X ) = µ2
The most widely used, and therefore the most important, statistical
distribution is undoubtedly the
Normal distribution
Its prevalence was first highlighted when it was observed that in many
natural processes, random variation among individuals systematically
conforms to a particular pattern:
most of the observations concentrate around one single value
(which is the mean)
the number of observations smoothly decreases, symmetrically on
either side, with the deviation from the mean
it is very unlikely, yet not impossible, to find very extreme values
→ this yields the famous bell-shaped curve
250
10
● ● ●
● ●
200
8
150
6
Cnk
Cnk
● ●
● ●
100
4
50
2
● ●
● ●
● ●
● ●
0
0
0 1 2 3 4 5 0 2 4 6 8 10
k k
n= 20 n=1000
● ●●●●●
●● ●●
● ●
● ●
● ● ● ●
150000
● ●
● ●
2.0e+299
● ●
● ●
● ●
● ●
● ● ● ●
● ●
100000
● ●
● ●
● ●
Cnk
Cnk
● ●
1.0e+299
● ●
● ● ● ●
● ●
● ●
50000
● ●
● ●
● ●
● ●
● ● ● ●
● ●
● ●
● ●
● ●●
●● ●●
● ● ●●
0.0e+00
●●
●●●
● ●●●
● ●●●●
● ● ●●●● ●●●●●●●●●●●●●●
● ● ● ● ● ● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
0
k k
1
2π
1 2
φ(x) = √ e−x /2
φ(x)
2π
−2 −1 0 1 2
1 (x−µ)2
−
f (x) = √ e 2σ 2 (→ SX = R)
2πσ
Unfortunately, no closed form exists for
Z x (y −µ)2
1 −
F (x) = √ e 2σ2 dy
2πσ −∞
Important remark: Be careful! Many sources use the alternative
notation
X ∼ N (µ, σ 2 )
→ in the textbook and in Matlab, the notation N (µ, σ) is used, so we
adopt it in these slides as well
Lecture 3 Special random variables 42 / 49
The Normal distribution: cdf and pdf
1
2πσ2
FX(x)
fX(x)
1/2
0
µ − 2σ µ−σ µ µ+σ µ + 2σ µ − 2σ µ−σ µ µ+σ µ + 2σ
x x
and Z +∞ (x−µ)2
1 −
Var(X ) = √ (x − µ)2 e 2σ 2 dx = σ 2
2πσ −∞
N(0,1) N(40,2.5)
0.4
0.15
0.3
0.10
f(x)
f(x)
0.2
0.05
0.1
0.00
0.0
−3 −2 −1 0 1 2 3 35 40 45
x x
N(70,10) N(10,5)
0.04
0.08
0.03
0.06
0.02
0.04
f(x)
f(x)
0.01
0.02
0.00
0.00
40 50 60 70 80 90 100 −5 0 5 10 15 20 25
x x
1 x2
f (x) = √ e− 2
2π
Usually, in this situation, the specific notation
. .
f (x) = φ(x) and F (x) = Φ(x)
1
2π
Φ(x)
φ(x)
1/2
−2 −1 0 1 2 −2 −1 0 1 2
x x
Property: Standardisation
If X ∼ N (µ, σ), then
X −µ
Z = ∼ N (0, 1)
σ
This linear transformation is called the standardisation of the normal
random variable X , as it transforms X into a standard normal random
variable Z .
Standardisation will play a paramount role in what follows.
0.4
For instance, for any α ∈ (0, 1), let
0.3
zα be such that
φ(x)
0.2
P(Z > zα ) = 1 − α.
0.1
for Z ∼ N (0, 1) 1−α
0.0
−3 −2 −1 0 1 2zα 3
x
P(Z > 1.645) = 0.05, P(Z > 1.96) = 0.025, P(Z > 2.575) = 0.005
0.4
0.4
0.4
0.3
0.3
0.3
φ(x)
φ(x)
φ(x)
0.2
0.2
0.2
0.1
0.1
0.1
0.05
0.025
0.005
0.0
0.0
0.0
0 1.645 0 1.96 0 2.575
x x x
z1−α = −zα
Property
Suppose X1 ∼ N (µ1 , σ1 ), X2 ∼ N (µ2 , σ2 ) and X1 and X2 are
independent. Then, for any real values a and b,
q
aX1 + bX2 ∼ N aµ1 + bµ2 , a2 σ12 + b2 σ22
Fact
Many of the statistical techniques presented in the coming chapters
are based on an assumption that the distribution of the random
variable of interest is (almost) normal.
hn (x) → f (x)
0.15
0.10
0.10
Density
f(x)
0.05
0.05
0.00
0.00
●● ●●● ● ●●● ●
●
●● ●
●
●●●●
●●
● ●
● ●●
●
●●●● ●●●● ●
●
●●● ●● ● ●● ●
●● ●●● ● ●●● ●
●
●● ●
●
●●●●
●●
● ●
● ●●
●
●●●● ●●●● ●
●
●●● ●● ● ●● ● ●● ●●● ● ●●● ●
●
●● ●
●
●●●●
●●
● ●
● ●●
●
●●●● ●●●● ●
●
●●● ●● ● ●● ●
35 40 45 35 40 45 35 40 45
x x x
0.08
0.06
0.10
Density
Density
0.04
0.05
0.02
0.00
0.00
5 10 15 20 0 20 40 60 80
If the sample comes from the Standard Normal distribution, x(i) ' zαi
and the points would fall close to a 45◦ straight line passing by (0,0).
If the sample comes from some other normal distribution, the points
would still fall around a straight line, as there is a linear relationship
between the quantiles of N (µ, σ) and the standard normal quantiles.
Fact
If the sample comes from some normal distribution, the points should
follow (at least approximately) a straight line.
● ●
2
●
●
2
●
● ●
● ●
●
● ●
● ●
●
● ●
● ●
●●
1
●
●●
1
●
●●
● ●
● ● ●
●
● ●
● ● ●
●
● ●
Theoretical Quantiles
Theoretical Quantiles
●● ●
●
● ●
●● ●
●
●●
● ●
●
● ●
●● ● ●
●
● ●
●● ●
●●
● ●●
0
0
●
●
●●
● ●
●
●
● ●●
●● ●
●●
●
●
●
●
● ●
●
● ●
●
● ●
●●
● ●
●
●●
● ●
●
● ●
●● ●
●
−1
●
●● ●
−1
● ●
● ●
●
● ●
●
● ●
●
● ●
● ●
●
−2
●
●
● −2 ●
5 10 15 0 20 40 60 80
→ the normality assumption appears acceptable for the first data set,
not at all for the second
Lecture 3 Special random variables 69 / 49
Transforming observations
When the density histogram and the qq-plot indicate that the
assumption of a normal distribution is invalid, transformations of the
data can often improve the agreement with normality (see lab classes
2 & 4).
Scientists regularly express their observations in natural logs.
Let’s look again at the seeded clouds rainfall data.
−1 √ √
, x, 4
x, x 2, x3
x
If the observations are positive and the distribution has
√ a long tail on
the right, then concave transformations like log(x) or x put the large
values down farther than they pull the central or small values
→ observations ‘more symmetric’
Convex transformations work the other way
If the transformed observations are approximately normal (check with
a quantile plot), it is usually advantageous to use the normality of this
new scale to perform any statistical analysis.
UseLecture
the 3cdf of a standard normal distribution
Special random variables to determine probabilities72 / 49
Recommended exercises
→ Q53 p.54, Q55 p.55, Q57 p.55, Q33 p.221, Q37 p.222, Q23 p.78,
Q19 p.31, Q23 p.31, Q62 p.56 (2nd edition)
→ Q54 p.56, Q56 p.57, Q58 p.57, Q35 p.226, Q39 p.227, Q24 p.79,
Q19 p.34, Q23 p.35, Q64 p.58 (3rd edition)