ST11: Statistics & Probability
Continuous Random Variables &
Probability Distributions (chap 4)
This document belongs to ESCP Business School.
It cannot be modified nor distributed without the
author’s consent.
Prof. Lynn FARAH
So far we have reviewed the basic principles of probability –
calculation rules (chap 3).
Our objectives for this chapter (chap 4) are to:
- Look at examples of different types of random variables
- Set up the probability distributions for those random variables
- Learn about pre-defined probability distributions that can be
used to model random outcomes
What for?
- We will rely on theoretical probability distributions to describe
sampling distributions (chap 5) used in inferential statistics (chap
6-7-8-10)
Reminder: Random Variables
A random variable assumes values based on the outcomes of a random
event.
There are two types of random variables:
1. Discrete random variables
2. Continuous random variables
A probability distribution (or model) for a random variable consists of:
• the collection of all possible values of a random variable
• the probabilities that the values occur
(in the form of a table, graph and/or formula)
Requirements:
p(x) ≥ 0 for all values of x and ∑p(x) = 1
Continuous Random Variables
A continuous random variable can take any numeric value within an
interval or range of values.
It usually represents data that are measured: heights, distances, periods
of time…
In general, with a continuous random variable, our concern is the
likelihood that X falls within an interval P(a ≤ X ≤ b) and not that it
assumes a particular value.
Continuous random variables also have means (expected values) and
standard deviations. We won’t worry about how to calculate those in
this course for most cases, and we will work with models for continuous
random variables where we are given the parameters.
Continuous Random Variables
Example
The number of hours of life of a calculator battery is a continuous
random variable X.
If the maximum possible life is 1000 hours, then X can assume any
value in the interval [0, 1000].
In a practical sense, the likelihood that X will assume a single specified
value, such as 764.1238, is extremely remote.
It is more meaningful to consider the likelihood of X lying within an
interval, such as that between 764 and 765.
ÞP(764 ≤ X ≤ 765)
Probability Density Function
We can graphically represent the probabilities associated with a continuous
random variable X.
This is done by means of the graph of a function y = f(x) such that the area under
this graph (and above the x-axis) between the lines x = a and x = b represents the
probability that X assumes a value between a and b.
This curve is variously called a probability density function (pdf), a
frequency function, or a probability distribution.
Example
In our school, a fire drill is conducted at the beginning of every
school year as dictated by safety regulations.
At t=0, the fire alarm goes off.
The probability that a given student evacuates the building between
t1 and t2 minutes is expressed by the area under the curve of
𝑓 𝑡 = 2𝑒 !" − 2𝑒 !#" .
The evolution of the proportion of students who evacuated the building
throughout time is represented by the red curve.
The cumulative proportions through time are represented by the green curve.
We observe that more than 50% of the students have evacuated within the first
2 minutes following the alarm. Within the first 7 minutes, all students are out.
9
Looking at the green curve of cumulative proportions, we observe that around
32% of the students evacuated the building within the first minute following
the alarm, and 74% within the first two minutes.
Hence we can estimate that approximately 74%-32%=42% of the students
evacuated the building between the 1st and 2nd minute after the alarm went off.
"
Mathematically this corresponds to the following operation: ∫! 𝑓 𝑡 𝑑𝑡 = 0.42
10
This is referred to as the Probability Density Function PDF:
- it specifies the probability of the random variable falling within a
particular range of values;
- The probability is given by the area under the density function but
above the horizontal axis and between the lowest and highest values of
the x range.
The Normal Model
There’s a model that shows up over and over in Statistics, representing data
collected in real life or found in nature. It is called the Normal model.
The Normal Model
Using a Normal model (or normal distribution) to describe distributions of
continuous random variables makes it possible to say a lot about them,
particularly in a quantitative way.
Normal models are appropriate for distributions whose shapes are unimodal and
(roughly) symmetric (Nearly Normal Condition) => “bell-shaped curves”.
The mean and standard deviation of a Normal model are denoted by μ and σ
respectively, which represent parameters of the model. We write: N(μ,σ)
The 68-95-99.7 Empirical Rule
In a Normal model:
¢ about 68% of the values fall within one standard deviation of
the mean;
¢ about 95% of the values fall within two standard deviations of
the mean;
¢ about 99.7% of the values fall within three standard deviations
of the mean.
μ-3σ μ-2σ μ-1σ μ μ+1σ μ+2σ μ+3σ
Normal Distribution Probability
The 68-95-99.7 Rule – Why?
To be able to understand where the percentages come from, we refer to the
PDF of the Normal distribution:
d
P(c £ x £ d) = ò c
f (x)dx ?
f(x)
x
c d
Probability is area under the curve!
It is calculated using integrals or read from a table.
Probability Density Function
2
æ 1 ö æ x- µ ö
1 -ç ÷ ç
è 2 ø è s ÷ø
f (x) = e
s 2p
where
µ = Mean of the normal random variable x
s = Standard deviation
π = 3.1415 . . .
e = 2.71828 . . .
16
Varying the Parameters
f(X) Changing μ shifts the
distribution left or right.
Changing σ increases or
decreases the spread.
σ
μ X
µ: axis of symmetry σ: flatness of the distribution
Varying the Parameters
There is a unique Normal model for every possible combination
of mean and standard deviation.
Each model would require its own table to read probabilities,
and that’s an infinite number of tables!
19
We need to deal with two issues:
- for every combination of mean and standard deviation we get
a different normal distribution
- the 68-95-99.7 rule doesn’t allow us to find all percentiles
Þ z-scores are the solution!
The Standard Normal Model
Instead of using a different normal model for each distribution (one for each
combination of μ and σ), it is easier and more practical to standardize the data
first, i.e. convert them into z-scores.
x- µ
z=
s
In this case we only need to use one model, the Standard Normal Model
N(0,1), with mean 0 and standard deviation 1.
The Standard Normal Table (z-table)
The 68-95-99.7 rule still applies of course. And used with the standard model gives
the percentiles in a normal distribution for z-scores of 0, ±1, ±2 and ±3.
In addition, now the percentiles for any z-scores of a normal distribution can be
found in a z-table (or table of Normal percentiles, or Standard Normal table).
A z-table tells us the percentage of the Standard Normal Distribution which lies
either above or below a certain z-score of a normal distribution, or even between
two z-scores.
The Standard Normal Table
The Cumulative Standard Normal table gives the probability
below (<=) a desired value of Z (i.e., from negative infinity to Z)
Example:
0.9772
P(Z <= 2.00) = ?
The column gives the value of Z
0 2.00 Z to the second decimal point
Z 0.00 0.01 0.02 …
The row shows 0.0
the value of Z 0.1
to the first .
decimal point .
. The value within the table
2.0 .9772 gives the probability from
Z = - ¥ up to the desired
Z value
P(Z <= 2.00) = 0.9772 = 97.72%
Q. What proportion of data in a normal distribution have z-
scores below 2.25?
The area shown in red
corresponds to the part of the
normal model for which P(z<2.25)
We can read this left tail directly
from the table:
That is, 98.78% have a z-score
less than 2.25.
Q. What proportion of a normally distributed set of data
have z-scores above 1.5?
P(z>1.5) is the blue section shown in
the picture.
The z-table calculates the left tail, the
red section:
P(z<1.5)=0.9332
The total area under the curve is 1,
so:
P(z>1.5)= 1-P(z<1.5)
=1-0.9332=0.0688
That is: just under 7% have z-scores
greater than 1.5.
26
Quality Control Example
You work in Quality Control for GE.
We assume that light bulb lifespan has a normal distribution with
𝜇= 2000 hrs and 𝜎= 200 hrs.
What’s the probability that a bulb will last
1. less than 1470 hours?
2. between 2000 and 2400 hours?
Let X=lifespan of lightbulb
From Percentiles to Scores: z in Reverse
Sometimes we start with areas and need to find the corresponding z-
score or even the original data value.
Example:
Let’s say that the top 10% of students who sit for a school entrance
exam are granted a scholarship. The grades of the entrance exam are
normally distributed, with a mean of 77 and a standard deviation of 6.
What grade cuts off the top 10%?
In this example, we need to start with areas and find the corresponding
z-score, then calculate the original data value.
What z-score cuts off the top 10% in a Normal model?
It’s the z-score for bottom 90%.
From Percentiles to Scores: z in Reverse
Look in the z-table for an area of 0.90.
The exact area is not there, but 0.8997 is pretty close.
This figure is associated with z = 1.28, so the top 10% is 1.28 standard
deviations above the mean.
We can then calculate the original value (cut-off exam grade) since we
know the mean and standard deviation of the distribution.
z=(x - 𝜇)/𝜎 so x=z(𝜎)+𝜇
X = 77+1.28(6) = 85 roughly
Normal Approximation to the Binomial Model
Remember the Tennis Player example?
A certain tennis player makes a successful serve 70% of the time. Assume that each serve
is independent of the others.
We answered questions using the Binomial Model for 6 serves (trials).
If she serves 6 times, what is the probability she gets:
a) All 6 serves in? b) Exactly 4 serves in?
c) At least 4 serves in? d) No more than 4 serves in?
Let X= the number of successful serves in n=6 first trials
X is a random variable following a binomial model with p=0.7 and n=6 so X∼ 𝛃(6;0.7)
a) P(x=6) ≈ 0.118
b) P(x=4) ≈ 0.324
c) P(x≥4) ≈ P(x=4)+P(x=5)+P(x=6)≈ 0.744
d) P(x≤4)= 1-P(x>4)= 1-[P(x=5)+P(x=6)] ≈ 0.580
Normal Approximation to the Binomial Model
Now what if we consider an entire game during which the player serves
80 times, and we look for the probability of getting for example between
50 and 65 successful serves?
Adding up 15 binomial probabilities by hand is doable, but it would be
tiresome, thus the Binomial model can be approximated by the Normal
model.
Normal Approximation to the Binomial Model
The Normal model which may be used is:
But when is this approximation appropriate?
Only when the Binomial distribution looks more or less normal!
So how do we check that?
1) One method to check that is to draw the distribution
Normal Approximation to the Binomial Model
2) Another method is to check the success/failure condition for a large number
of trials.
The Binomial distribution looks approximately Normal when, for a large number of
trials, at least 15 successes and 15 failures are expected: np ≥ 15 & nq ≥ 15
• This comes from the Binomial model being skewed for a small number of
successes or failures expected.
• This condition ensures that there are at least 3 standard deviations on either
side of the mean; so the distribution is not too skewed.
As long as this condition holds, using the Normal model will give a reasonable
approximation to the Binomial, and the z-table can be used.
3) A third method is to calculate the following interval and check that it lies in the range
0 to n:
𝛍±3𝛔 = 𝑛𝑝 ± 3 𝑛𝑝𝑞
Tennis Player Example
Binomial model with n=80 and p=0.70
0,120
0,100
0,080
Probability
0,060
0,040
0,020
0,000
-0,020 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80
Number of successes out of 80 trials
We have a bar chart of a Binomial model with n=80 trials and p=0.70 so 𝛽(80;0.7)
Then we have the curve of Normal distribution: N(np,√npq)=N(56.0, 4.10)
• The graph looks pretty normal
• Success/failure condition checks since
np=80(0.7)=56 and nq=80(0.3)=24, both larger than 15
• Interval 𝛍±3𝛔 = 𝑛𝑝 ± 3 𝑛𝑝𝑞 = 56 ± 3(4.10) = [43.7; 68.3]
lies in the interval 0 to 80.
Tennis Player Example (again)
A certain tennis player makes a successful serve 70% of the time.
Assume that each serve is independent of the others. Suppose that
the player serves 80 times in a match.
a) What’s the mean and standard deviation of the number of good
first serves expected?
b) Verify that you can use a normal model to approximate the
distribution of the number of good first serves.
c) Use the 68-95-99.7 Rule to describe this distribution.
d) What’s the probability that she makes at least 65 serves?
e) What’s the probability that she makes between 50 and 65 serves?
Tennis Player Example - Solution
Let X= the number of successful serves in n=80 first trials
X is a random variable following a Binomial model with p=0.7 and n=80
Þ X∼𝛽(80;0.7)
a) E(X)=np=56 and SD(X)=√npq≈ 4.1
b) Nearly Normal conditions check so we can approximate this Binomial
model β(80,0.7) with a Normal model N(56,4.1)
c) Sketch the curve and describe the empirical rule
d) P(X≥65): find z=(65-56)/4.1≈2.19 and read the probability from the z-
table answer is 1-0.9857=0.0143 = 1.43%
e) P(50 ≤ X ≤ 65)=P(-1.46 ≤ z ≤ 2.195)=0.9857-0.0721= 91.36%
Comparing a Non-Normal Binomial model to the
Normal distribution
The first tennis example had n=6 and p=0.70, so:
np=6(0.70)=4.2 and nq=6(0.3)=1.8, both less than 15
or
np±3√npq=4.2±3(1.12)=[0.84;7.56], doesn’t lie in range 0 to 6
It didn’t satisfy the condition for being nearly normal. Why?
To begin with, the original binomial
model distribution was left-skewed,
so the shape of the normal distribution
will never match up exactly.
Comparing a Non-Normal Binomial
model to the Normal distribution
If we tried to approximate it with a Normal model, that normal model would
be N(4.2, 1.12). The below graph shows the two distributions – binomial and
Normal – together.
Binomial model vs
Normal model
The bar chart shows the binomial model. 0,400
The curve shows the corresponding 0,300
Probability
0,200
Normal model with the same mean 0,100
and standard deviation. 0,000
0 1 2 3 4 5 6 7 8 9 1011
-0,100
It’s not a good fit! Number of successes out of 6 trials
Airline example
An airline overbooks its flights, assuming that 5% of passengers will not
show for their flight. If 275 tickets are sold for a flight which has only
265 seats, what’s the probability that someone will get bumped?
Hint: We are looking for the probability that MORE THAN 265 people
show up for their flight, out of the 275 tickets sold, is P(X>265) where
X=number of passengers showing up for the flight
Generally Speaking, how do we recognize a normal distribution?
Using one of the below methods:
Visually: Numerically:
1) Histogram 3) Calculate indicators and check that
• Symmetric • Mean ≈ median
• Unimodal • 1 SD ≈ 3/4 IQR
• Range ≈ 6 SD (Can be 4 SD if the
size of the set of data is small)
2) Normal probability plot 4) Check the empirical rule 68-95-99.7
• Almost Straight line
Normal Probability Plot
A normal probability plot for a data set is a scatterplot with the
ranked data values on one axis and their corresponding expected z-
scores from a standard normal distribution on the other axis.
[Note: Computation of the expected standard normal z-scores are beyond the
scope of this text.Therefore, we will rely on available statistical software packages
to generate a normal probability plot.]
Expected z–score
Observed value
Other continuous probability distributions
- Uniform Random variables
- Exponential Random variables
Not covered in ST11 (section 4.8)