Statistical Distributions

Head to savemyexams.co.
uk for more awesome resources
YOUR NOTES
AS Maths CIE 
3. Statistical Distributions
CONTENTS
3.1 Probability Distributions
3.1.1 Discrete Probability Distributions
3.1.2 E(X) & Var(X) (Discrete)
3.2 Binomial & Geometric Distribution
3.2.1 The Binomial Distribution
3.2.2 Calculating Binomial Probabilities
3.2.3 The Geometric Distribution
3.3 Normal Distribution
3.3.1 The Normal Distribution
3.3.2 Standard Normal Distribution
3.3.3 Normal Distribution - Calculations
3.3.4 Finding Sigma and Mu
3.4 Working with Distributions
3.4.1 Modelling with Distributions
3.4.2 Normal Approximation of Binomial
Page 1 of 49
© 2015-2021 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to savemyexams.co.uk for more awesome resources
3.1 Probability Distributions YOUR NOTES


3.1.1 Discrete Probability Distributions
Discrete Random Variables
What is a discrete random variable?
A random variable is a variable whose value depends on the outcome of a random
event
The value of the random variable is not known until the event is carried out
(this is what is meant by 'random' in this case)
Random variables are denoted using upper case letters (X , Y , etc )
Particular outcomes of the event are denoted using lower case letters ( x, y, etc)
P (X = x ) means "the probability of the random variable X taking the value x "
A discrete random variable (often abbreviated to DRV) can only take certain values
within a set
Discrete random variables usually count something
Discrete random variables usually can only take a finite number of values but
it is possible that it can take an infinite number of values (see the examples
below)
Examples of discrete random variables include:
The number of times a coin lands on heads when flipped 20 times
(this has a finite number of outcomes: 0,1,2,…,20)
The number of emails a manager receives within an hour
(this has an infinite number of outcomes: 1,2,3,…)
The number of times a dice is rolled until it lands on a 6
(this has an infinite number of outcomes: 1,2,3,…)
The number on a bingo ball when one is drawn at random
(this has a finite number of outcomes: 1,2,3…,90)
Page 2 of 49
Probability Distributions (Discrete) YOUR NOTES

What is a probability distribution? 
A discrete probability distribution fully describes all the values that a discrete
random variable can take along with their associated probabilities
This can be given in a table
Or it can be given as a function (called a probability mass function)
They can be represented by vertical line graphs (the possible values for X
along the horizontal axis and the probability on the vertical axis)
The sum of the probabilities of all the values of a discrete random variable is 1
This is usually written ΣP(X = x ) = 1
Page 3 of 49
Cumulative Probabilities (Discrete) YOUR NOTES

How do I calculate probabilities using a discrete probability distribution? 
First draw a table to represent the probability distribution
If it is given as a function then find each probability
If any probabilities are unknown then use algebra to represent them
Form an equation using ∑P(X = x ) = 1
Add together all the probabilities and make the sum equal to 1
To find P (X = k )
If k is a possible value of the random variable X then P (X = k ) will be given in
the table
If k is not a possible value then P (X = k ) = 0
To find P (X ≤ k )
Identify all possible values, xi , that X can take which satisfy xi ≤ k
Add together all their corresponding probabilities
P(X ≤ k ) = ∑ P(X = x i )
x i ≤k
Some mathematicians use the notation F(x) to represent the cumulative
distribution
F(x ) = P(X ≤ x )
Using a similar method you can find P (X < k ) , P (X ≥ k ) and P (X > k )
As all the probabilities add up to 1 you can form the following equivalent
equations:
P (X < k ) + P (X = k ) + P (X > k ) = 1
P (X > k ) = 1 − P (X ≤ k )
P (X ≥ k ) = 1 − P (X < k )
To calculate more complicated probabilities such as P(X 2 < 4)
Identify which values of the random variable satisfy the inequality or event in
the brackets
Add together the corresponding probabilities
How do I know which inequality to use?
P(X ≤ k ) would be used for phrases such as:
At most k, no greater than k, etc
P(X < k ) would be used for phrases such as:
Fewer than k
P(X ≥ k ) would be used for phrases such as:
At least k , no fewer than k, etc
P(X > k ) would be used for phrases such as:
Greater than k, etc
Page 4 of 49
Worked Example YOUR NOTES

 The probability distribution of the discrete random variable is given by the

function
⎧⎪ 2
⎪⎪ kx x = − 3, − 1,2,4
P(X = x ) = ⎨⎪⎪
⎪⎩ 0 otherwise.
1
(a) Show that k = .
30
(b) Calculate P(X ≤ 3) .

(c)
Calculate P(X 2 < 5)
Page 5 of 49
YOUR NOTES

 ExamTryTip
to draw a table if there are a finite number of values that the
discrete random variable can take
When finding a probability, it will sometimes be quicker to subtract the
probabilities of the unwanted values from 1 rather than adding together
the probabilities of the wanted values
Always make sure that the probabilities are between 0 and 1, and that
they add up to 1!
Page 6 of 49
3.1.2 E(X) & Var(X) (Discrete) YOUR NOTES


E(X) & Var(X) (Discrete)
What does E(X) mean and how do I calculate E(X)?
E(X) means the expected value or the mean of a random variable X
For a discrete random variable, it is calculated by:
Multiplying each value of X with its corresponding probability
Adding all these terms together
Σ x P(X = x )
Look out for symmetrical distributions (where the values of X are symmetrical and
their probabilities are symmetrical) as the mean of these is the same as the
median
For example if X can take the values 1, 5, 9 with probabilities 0.3, 0.4, 0.3
respectively then by symmetry the mean would be 5
How do I calculate E(X²)?
E(X²) means the expected value or the mean of a random variable defined as X²
For a discrete random variable, it is calculated by:
Squaring each value of X to get the values of X2
Multiplying each value of X2 with its corresponding probability
Σ x 2P (X = x )
In a similar way E(f(x)) can be calculated for a discrete random variable by:
Applying the function f to each value of to get the values of f(X)
Multiplying each value of f(X ) with its corresponding probability
Σ f (x ) P (X = x )
Page 7 of 49
YOUR NOTES

Is E(X²) equal to (E(X))²?

Definitely not!
They are only equal if X can take only one value with probability 1
if this was the case it would no longer be a random variable
E(X²) is the mean of the values of X²
(E(X))² is the square of the mean of the values of X
To see the difference
Imagine a random variable X that can only take the values 1 and -1 with equal
chance
The mean would be 0 so the square of the mean would also be 0
The square values would be 1 and 1 so the mean of the squares would also be
1
In general E(f(X)) does not equal f(E(X)) where f is a function
So if you wanted to find something like E ⎛⎜⎜ x1 ⎞⎟⎟ then you would have to use the
⎝ ⎠
definition and calculate:
1
∑ x P(X = x )
What does Var(X) mean and how do I calculate Var(X)?

Var(X) means the variance of a random variable X
For any random variable this can be calculated using the formula
2
E (X ) − (E (X )) 2
This is the mean of the squares of X minus the square of the mean of X
Compare this to the definition of the variance of a set of data
Var(X) is always positive
The standard deviation of a random variable X is the square root of Var(X)
Page 8 of 49

 The discrete random variable X has the probability distribution shown in

the following table:
x 2 3 5 7
P (X = x ) 0.1 0.3 0.2 0.4
(a)
Find the value of E(X ) .
(b)
Find the value of E (X 2) .
(c)
Find the value of Var (X ) .
(a)
Find the value of E(X ) .
(b)
Find the value of E (X 2) .
(c)
Find the value of Var (X ) .
 ExamCheck
Tip
if your answer makes sense. The mean should fit within the range
of the values of X.
Page 9 of 49
3.2 Binomial & Geometric Distribution YOUR NOTES


3.2.1 The Binomial Distribution
Properties of Binomial Distribution
What is a binomial distribution?
A binomial distribution is a discrete probability distribution
The discrete random variable X follows a binomial distribution if it counts the
number of successes when an experiment satisfies the conditions:
There are a fixed finite number of trials ( n )
The outcome of each trial is independent of the outcomes of the other trials
There are exactly two outcomes of each trial (success or failure)
The probability of success (p) is constant
If X follows a binomial distribution then it is denoted X ∼ B (n ,p )
n is the number of trials
p is the probability of success
The probability of failure is 1-p which is sometimes denoted as q
The formula for the probability of r successful trials is given by:
⎛n⎞
P = (X = r ) = ⎜⎜⎜ ⎟⎟⎟ p r (1 − p ) n − r for r = 0, 1, 2,....,n
⎝r ⎠
This is equal to the term which includes p r in the expansion of (p + q ) n where
q = 1 − p (this shows the link with the Binomial Expansion)
What are the important properties of a binomial distribution?

The expected number (mean) of successful trials is np
The variance of the number of successful trials is np (1 − p )
Square root to get the standard deviation
The distribution can be represented visually using a vertical line graph
If p is close to 0 then the graph has a tail to the right
If p is close to 1 then the graph has a tail to the left
If p is close to 0.5 then the graph is roughly symmetrical
If p = 0.5 then the graph is symmetrical
Page 10 of 49
YOUR NOTES

Page 11 of 49
Modelling with Binomial Distribution YOUR NOTES

How do I set up a binomial model? 
Identify what a trial is in the scenario
For example: rolling a dice, flipping a coin, checking hair colour
Identify what the successful outcome is in the scenario
For example: rolling a 6, landing on tails, having black hair
Make sure you clearly state what your random variable is
For example, let X be the number of students in a class of 30 with black hair
What can be modelled using a binomial distribution?
Anything that satisfies the four conditions
For example, let T be the number of times a fair coin lands on tails when flipped
20 times: T ∼ B ⎛⎜⎜ 20, 1 ⎞⎟⎟
⎝ 2⎠
A trial is flipping a coin: There are 20 trials so n =20
We can assume each coin flip does not affect subsequent coin flips: They are
independent
A success is when the coin lands on tails: Two outcomes - tails or not tails
(heads)
The coin is fair: The probability of tails is constant with p = 1
2
Sometimes it might seem like there are more than two outcomes
For example, let Y be the number of yellow cars that are in a car park full of
100 cars
Although there are more than two possible colours of cars, here the trial is
whether a car is yellow so there are two outcomes (yellow or not yellow)
Y would still need to fulfil the other conditions in order to follow a binomial
distribution
Sometimes a sample may be taken from a population
For example, 30% of people in a city have blue eyes, a sample of 30 people
from the city is taken and X is the number of them with blue eyes
As long as the population is large and the sample is random then it can be
assumed that each person has a 30% chance of having blue eyes
What can not be modelled using a binomial distribution?
Anything where the number of trials is not fixed or is infinite
The number of emails received in an hour
The number of times a coin is flipped until it lands on heads
Anything where the outcome of one trial affects the outcome of the other trials
The number of caramels that a person eats when they eat 5 sweets from a bag
containing 6 caramels and 4 marshmallows
If you eat a caramel for your first sweet then there are less caramels left in
the bag when you choose your second sweet
Anything where there are more than two outcomes of a trial
A person's shoe size
The number a dice lands on when rolled
Anything where the probability of success changes
Page 12 of 49
The number of times that a person can swim a length of a swimming pool in YOUR NOTES
under a minute when swimming 50 lengths 
The probability of swimming a lap in under a minute will decrease as the
person gets tired
 Worked Example
It is known that 8% of a large population are immune to a particular virus.
Mark takes a sample of 50 people from this population. Mark uses a
binomial model for the number of people in his sample that are immune to
the virus
(a)
State the distribution that Mark uses.
(b)
State the two assumptions that Mark must make in order to use a binomial
model.
 ExamIf you
Tip
are asked to criticise a binomial model always consider whether
the trials are independent, this is usually the one that stops a variable
from following a binomial distribution!
Page 13 of 49
3.2.2 Calculating Binomial Probabilities YOUR NOTES


Calculating Binomial Probabilities
Throughout this section we will use the random variable X ∼ B (n ,p ) . For binomial, the
probability of a X taking a non-integer or negative value is always zero. Therefore any
values mentioned in this section will be assumed to be non-negative integers.
Where does the formula for a binomial distribution come from?
The formula for calculating an individual binomial probability is
⎛n⎞
P(X = r ) = p r = ⎜⎜⎜ ⎟⎟⎟ p r (1 − p ) n − r
⎝r ⎠
If there are r successes then there are (n − r ) failures
The number of times this can happen is calculated by the binomial
coefficient
⎛⎜ n ⎞⎟ n n!
⎜⎜ ⎟⎟ = C =
r r !(n − r )!
⎝r ⎠
This can be seen by considering a probability tree diagram with n trials, where p
is the probability of success and the tree diagram is being used to find r successes
⎛⎜ n ⎞⎟
⎜⎜ ⎟⎟ is the number of pathways through the tree there would be exactly r
⎝r ⎠
successes within the n trials
The formula allows statisticians to quickly find probabilities for larger values of n
without needing to draw the whole tree diagram
Your calculator may have a function that would allow you to calculate binomial
probabilities
You can learn how to use this to check your work but it is important you
always show your working using the formula to get the marks in the exam
How do I calculate the cumulative probabilities for a binomial
distribution?
Most of the time you will be required to calculate cumulative binomial
probabilities rather than individual ones
Use the formula to find the individual probabilities and then add them up
Make sure you are confident working with inequalities for discrete values
Only integer values will be included so it is easiest to look at which integer
values you should include within your calculation
Sometimes it is quicker to find the probabilities that are not being asked for
and subtract from one
P (X ≤ r ) is asking you to find the probabilities of all values up to and including r
This means all values that are at most r
Don’t forget to include P(X = 0)
It could also be written as P(X ≤ r ) = 1 − P(X > r )
P (X < r ) is asking you to find the probabilities of all values up to but not including r
This means all values that are less than r
Stop at r - 1
It could also be written as P(X < r ) = 1 − P(X ≥ r )
Page 14 of 49
P (X ≥ r ) is asking you to find the probabilities of all values greater than and YOUR NOTES
including r

This means all values that are at least r
It could also be written as P (X ≥ r ) = 1 − P (X < r )
P (X > r ) is asking you to find the probabilities of all values greater than but not
including r
This means all values that are more than r
Start at r + 1
It could also be written as P (X > r ) = 1 − P (X ≤ r )
If calculating P (a ≤ X ≤ b ) pay attention to whether the probability of a and b should
be included in the calculation or not
For example, P (4 < X ≤ 10) :
You want the integers 5 to 10
 Worked Example
If X is the random variable X ∼ B(10,0 . 35) . Find:
(i)
P(X = 3)
(ii)
P(X ≤ 3)
(iii)
P(X > 3)
(iv)
P(3 < X < 6)
Page 15 of 49
YOUR NOTES

 ExamLooking
Tip
carefully at the inequality within the probability is key here,
make sure you consider which integers should be counted within your
calculations.
Page 16 of 49
3.2.3 The Geometric Distribution YOUR NOTES


Properties of Geometric Distribution
What is a geometric distribution and its notation?
A geometric distribution is a discrete probability distribution
The discrete random variable X follows a geometric distribution if it counts the
number of trials until the first success occurs for an experiment that satisfies the
conditions
Each trial has only two outcomes
Broadly labelled as “success” and “failure” - these are mutually exclusive
Such trials may be referred to as Bernoulli trials – named after the Swiss
mathematician Jacob Bernoulli (1655-1705)
The outcomes of trials are independent
The outcome of one trial does not affect the outcome of another trial
The probability of each outcome is constant across all trials
i.e. The probability of “success” does not change between trials

p is the probability of “success” in a single trial
The probability of “failure” in a single trial is 1- p ; often denoted by q
If X follows a geometric distribution, it is denoted by X ∼ Geo(p )
The formula for finding the probability that the first success occurs after r trials (or
after r -1 failures) is
P ( X = r ) = (1 − p) r − 1 p
e.g. If the probability of success in a single trial is 0.3, then the probability it will
take 6 trials to obtain the first success is given by
P (X = 6) = (1 − 0 . 3) 6 − 1 (0 . 3) = (0 . 7) 5 (0 . 3) = 0 . 050421
What does a geometric distribution look like?

If represented visually, using a vertical line graph, the probabilities in a geometric
distribution decrease but never reach zero
The probabilities form a geometric progression
Question: What would the first term (“a”), the common ratio (“r”) and the
sum to infinity be (“S ∞ ”)?
(Answer below diagram)
The probabilities decrease exponentially; drawing a curve through the
tops of the lines would produce a decreasing exponential curve
The graphs also show how 1 is the mode for every geometric distribution
Page 17 of 49
Answer: a = p , r = q = 1 − p , S ∞ = 1 YOUR NOTES

What are the properties of a geometric distribution? 
P (X = r ) > 0 for all r

Every geometric distribution has an infinite (discrete) sample space which is
the set of natural numbers (ℕ) or positive integers (ℤ +)
P (X = r ) < P (X = r − 1) for all r
The mode of every geometric distribution is 1
(the value of r that has the highest probability)
Geometric distributions are memoryless
The number of trials needed for the first success is not dependent on the
number of trials that have already occurred
e.g. If 5 (failed) trials have already occurred, the probability of the first
success happening after 7 trials is the same as the probability of it happening
after 2 trials which would be (1 − p ) p
Mathematically this is written as
P (X = r | X > k ) = P (X = r − k )
where is the number of trials that have already occurred and
e.g. P ( X = 7 | X > 5) = P (X = 7 − 5) = P ( X = 2)
Geometric distributions have the recurrence relation P (X = r ) = qP (X = r − 1)

The mean, or expected value, of a geometric distribution is E ( X ) = 1
p
Page 18 of 49
Modelling with Geometric Distribution YOUR NOTES

How do I set up a geometric model? 
Identify what a trial is in the context of a problem
Flipping a coin, rolling a dice, a football match
Identify a successful outcome
Heads, a square number, a win
Identify the probability of “ success”
1
0 . 5, ,42%
3
Define your random variable using the correct notation
Let X be the number of trials required to obtain the first heads when flipping a
fair coin, X ∼ Geo(0. 5)
What can be modelled using a geometric distribution?
Anything where the first occurrence of a successful outcome is of significance
Rolling a double with two dice before being allowed to start a game
The number of on/off presses a switch can withstand before wearing out
(In which case the first “success” would be the first failure of the switch!)
In addition, the scenario must satisfy the three conditions
Trials only have two outcomes of interest
Trials are independent
Probability of success is constant for all trials
These are also three of the four conditions for a binomial distribution so are
not enough on their own – it will also depend on the context
Many scenarios may appear as having more than two outcomes but in the context
of the question only two are of significance
e.g. A light that randomly flashes in 8 different colours, but the only colour of
interest is blue
So “blue” is “success” and all other colours, regardless of whether it is
red, yellow, etc – i.e. “not blue” - is “failure”
Sometimes a sample may be taken from a population
As long as the population is large enough and the sample is random the
probability of “success” in the sample is the same as the probability of
“success” in the population
What cannot be modelled using a geometric distribution?
Be careful not to confuse binomial and geometric distributions/models
Binomial is for the number of successes in a fixed number of trials
Geometric is for the number of trials up to and including the first success
Anything where a trial would have more than two outcomes of interest
e.g. Outcome of a football match – win, draw or lose
Where the probability of an outcome of a trial is influenced by a previous trial
i.e. trials are not independent
e.g. drawing counters from a bag without replacement
Anything where the probability of “success” changes with time – or practice
Page 19 of 49
e.g. a skateboarder performing a trick - the probability of success should YOUR NOTES
increase after practising the trick 
Page 20 of 49
Calculating Geometric Probabilities YOUR NOTES

How do I calculate geometric probabilities? 
Identify p, the probability of “success” and 1- p, the probability of “failure” (q)
For exact probabilities use P (X = r ) = (1 − p ) r −1p
For inequalities use
P (X > r ) = (1 − p ) r
This means the first success occurs after r trials, therefore the first r
trials all ended in failure
Similarly, P (X ≥ r ) = P (X > r − 1) = q r −1
P (X < r ) = 1 − P (X > r − 1) = 1 − q r − 1
Similarly, P (X ≤ r ) = 1 − P (X > r ) = 1 − qr
P (a ≤ X ≤ b ) = P (X ≤ b ) − P (X < a )
If a and b are close it may be easier to use
P (X = a ) + P (X = a + 1) + . . . + P (X = b )
Logic can be used to deduce most geometric distribution questions so

memorising these formulae is not essential
Beware of questions that exploit the memorylessness property of geometric
distributions – loosely called “given that” questions
e.g. P (X = 8 | X > 6) means
“the probability that equals 8 given that is greater than 6” or
“the probability of 8 trials given that 6 trials have already occurred”
P (X = 8 | X > 6) = P (X = 8 − 6) = P (X = 2)
The mean (expected value) or mode of a geometric distribution may be required
1
E (X ) =
p
The mode is 1 for all geometric distributions
 Worked Example
Given that X ∼ Geo(0 . 4) find
(i)
P(X = 5)
(ii)
P(X > 5)
(iii)
P(3 ≤ X < 8)
(iv)
P(X = 8 | X > 5)
(v)
The mode of X .
Page 21 of 49
YOUR NOTES

 ExamTryTip
not to get bogged down with formulae for the geometric
distribution, most questions can be deduced using logic
If you are asked to criticise a geometric model always consider whether
trials are independent
especially if it involves “practising” or “performing” a skill
most people will improve after they’ve made several attempts at a
skill
so the probability of success should gradually increase over time
If finding the number of trials required (r ) then be careful counting
calculator presses; remember you are likely to be finding r − 1(the
number of failures before success) in the first instance
Page 22 of 49
3.3 Normal Distribution YOUR NOTES


3.3.1 The Normal Distribution
Properties of Normal Distribution
The binomial distribution is an example of a discrete probability distribution. The
normal distribution is an example of a continuous probability distribution.
What is a continuous random variable?
A continuous random variable (often abbreviated to CRV) is a random variable take
any value within a range of infinite values
Continuous random variables usually measure something
For example, height, weight, time, etc
What is a continuous probability distribution?
A continuous probability distribution is a probability distribution in which the
random variable X is continuous
The probability of X being a particular value is always zero
P(X = k) = 0 for any value k
Instead we define the probability density function f(x) for a specific value
We talk about the probability of X being within a certain range
A continuous probability distribution can be represented by a continuous graph
(the values for X along the horizontal axis and probability density on the vertical
axis)
The area under the graph between the points x = a and x = b is equal to P(a ≤ X ≤
b)
The total area under the graph equals 1
As P(X = k) = 0 for any value k , it does not matter if we use strict or weak
inequalities
P(X ≤ k) = P(X < k) for any value k
What is a normal distribution?
A normal distribution is a continuous probability distribution
The continuous random variable can follow a normal distribution if:
The distribution is symmetrical
The distribution is bell-shaped
If X follows a normal distribution then it is denoted X ∼ N (μ , σ 2)
μ is the mean
σ2 is the variance
σ is the standard deviation
If the mean changes then the graph is translated horizontally
If the variance changes then the graph is stretched horizontally
A small variance leads to a tall curve with a narrow centre
A large variance leads to a short curve with a wide centre
Page 23 of 49
YOUR NOTES

What are the important properties of a normal distribution?

The mean is μ
The variance is σ2
If you need the standard deviation remember to square root this
The normal distribution is symmetrical about x = μ
Mean = Median = Mode = μ
The normal distribution curve has two points of inflection
x = μ ± σ (one standard deviation away from the mean)
There are the results:
Approximately two-thirds (68%) of the data lies within one standard
deviation of the mean (μ ± σ)
Approximately 95% of the data lies within two standard deviations of the
mean (μ ± 2σ)
Nearly all of the data (99.7%) lies within three standard deviations of the
mean (μ ± 3σ)
For any value x a z-score (or z-value) can be calculated which measures how many
standard deviations x is away from the mean
x −μ
z =
σ
Page 24 of 49
Modelling with Normal Distribution YOUR NOTES

What can be modelled using a normal distribution? 
A lot of real-life continuous variables can be modelled by a normal distribution
provided that the population is large enough and that the variable is symmetrical
with one mode
For a normal distribution X can take any real value, however values far from the
mean (more than 4 standard deviations away from the mean) have a probability
density of practically zero
This fact allows us to model variables that are not defined for all real values
such as height and weight
What can not be modelled using a normal distribution?
Variables which have more than one mode or no mode
For example, the number given by a random number generator
Variables which are not symmetrical
For example, how long a human lives for
 Worked Example
The random variable S represents the speeds (mph) of a certain species of
cheetahs when they run. The variable is modelled using N(40, 100).
(a)
Write down the mean and standard deviation of the running speeds of
cheetahs.
(b)
State the two assumptions that have been made in order to use this model.
(a)
Write down the mean and standard deviation of the running speeds of cheetahs.
(b)
State the two assumptions that have been made in order to use this model.
Page 25 of 49
YOUR NOTES
 Exam Tip
Remember the second number in N(x, y) is the variance, if you want the 
standard deviation then you need to square root it.
Page 26 of 49
3.3.2 Standard Normal Distribution YOUR NOTES


Standard Normal Distribution
What is the standard normal distribution?
The standard normal distribution is a normal distribution where the mean is 0
and the standard deviation is 1
It is denoted by Z
Z ∼ N(0,12)
Why is the standard normal distribution important?

Calculating probabilities for the normal distribution can be difficult and lengthy
due to its complicated probability density function
The probabilities for the standard normal distribution have been calculated and
laid out in the table of the normal distribution which can be found in your
formula booklet
Nowadays, many calculators can calculate probabilities for any normal
distribution, if yours does it is a good idea to learn how to use it to check your
answers but you must still use the tables of the normal distribution and show
all your working clearly
It is possible to map any normal distribution onto the standard normal distribution
curve
Mapping different normal distributions to the standard normal distribution allows
distributions with different means and standard deviations to be compared with
each other
How is any normal distribution mapped to the standard normal
distribution?
Any normal distribution curve can be transformed to the standard normal
distribution curve by a horizontal translation and a horizontal stretch
Therefore, for X ∼ N(μ ,σ 2) and Z ∼ N(0,12) , we have the relationship:
X −μ
Z=
σ
Probabilities are related by:
⎛ a − μ ⎞⎟
P(X < a ) = P ⎜⎜ Z < ⎟
⎝ σ ⎠
This is a very useful relationship for calculating probabilities for any normal
distribution
As it is a normal distribution P( z1 ≤ Z ≤ z2) = P( z1 < Z < z2) so you do not need
to worry about whether the inequality is strict (< or >) or weak (≤ or ≥)
A value of z = 1 corresponds with the x-value that is 1 standard deviation above
the mean and a value of z = -1 corresponds with the x-value that is 1 standard
deviation below the mean
If a value of x is less than the mean then the z -value will be negative
The function Φ(z ) is used to represent P(Z < z )
Page 27 of 49
How is the table of the normal distribution function used? YOUR NOTES
In your formula booklet you have the table of the normal distribution which 
provides probabilities for the standard normal distribution
The probabilities are provided for Φ(z ) = P(Z ≤ z ) = P(Z < z )
To find other probabilities you should use the symmetry property of the
normal distribution curve
The table gives probabilities for values of z between 0 and 3
For negative values of z, the symmetry property of the normal distribution is
used
For values greater than z = 3 the probabilities are small enough to be
considered negligible
The tables give the probabilities to 4 decimal places
To read probabilities from the normal distribution table for a z value of up to 2
decimal places:
The very first column lists all z values to 1 decimal place from z = 0.0 to z =
2.9
The top row gives the second decimal place for each of these z values
So the value of Φ(1. 23) = P (Z ≤ 1. 23) would be found at the point where the ‘1.2’
row meets the ‘3’ column
P (Z ≤ 1 . 23) = 0 . 8907
To read probabilities from the normal distribution table for a z value of 3 decimal
places:
There is an extra section to the right of the tables that gives the amount to
add on to the probabilities for the third decimal place
The values given in the columns represent one ten-thousandth
If the value is 7 we add 0.0007 to the probability
If the value is 23 we add 0.0023 to the probability
To find the value of Φ(1. 234) we would need to find the amount to add on to
0.8907
Find the point where the 1.2 row meets the ADD 4 column, this gives us the
number 7
Add the value 0.0007 to the probability for Φ(1. 23)
P (Z ≤ 1 . 234) = 0 . 8914
How is the table used to find probabilities that are not listed?
The property that the area under the graph is 1 allows probabilities to be found for
P( Z > z)
Use the formula P (Z > z ) = 1 − Φ(z )
The symmetrical property of the normal distribution gives the following results:
P(Z ≤ z ) = P(Z ≥ − z )
P(Z ≥ z ) = P(Z ≤ − z )
This allows probabilities to be found for negative values of z or for P (Z > z )
Φ( − z) = P (Z ≤ − z ) = P (Z > z) =1 − Φ(z )
Therefore:
Φ( − z ) = 1 − Φ(z )
P (Z > z ) = 1 − Φ(z )
P (Z > − z ) = P (Z ≤ z ) = Φ(z )
Page 28 of 49
The four cases in terms of Φ(z ) are: YOUR NOTES

P (Z < z ) = Φ(z )

P(Z > z ) = 1 − Φ(z )
P (Z < − z ) = 1 − Φ(z )
P(Z > − z ) = Φ(z )
Drawing a sketch of the normal distribution will help find equivalent probabilities
How are z values found from the table of the normal distribution
function?
To find the value of z for which P (Z ≤ z ) = p look for the value of p from within the
table and find the corresponding value of z
If the probability is given to 4 decimal places most of the time the value will
exist somewhere in the tables
Occasionally you may have to use the ADD columns to find the exact value
If the values in the ADD columns don’t exactly match up use the closest value
or find the midpoint of the z values that are either side of the probability
If your probability is 0.5 or greater look through the tables to find the
corresponding z value
For P (Z < z ) ≥ 0. 5 use the z value found in the table
For P (Z > z ) ≥ 0. 5 take the negative of the z value found in the table
If the probability is less than 0.5 you will need to subtract it from one before
using the tables to find the corresponding z value
For P (Z < z ) < 0. 5 take the negative of the z value found in the table
For P (Z > z ) < 0. 5 use the z value found in the table
Always draw a sketch so that you can see these clearly
The formula booklet also contains a table of the critical values of z
This gives z values to 3 decimal places for common probabilities
The probabilities in this table are 0.75, 0.9, 0.95, 0.975, 0.99, 0.995, 0.9975,
0.999 and 0.9995
Page 29 of 49

 (a)

By sketching a graph and using the table of the normal distribution, find the
following:
(i)
P(Z ≤ 0 . 957)
(ii)
P(Z > 0 . 957)
(iii)
P(Z ≤ − 0 . 957)
(iv)
P( − 0 . 957 < Z ≤ 0 . 957)
(b)
Find the value of z such that P ( Z < z ) = 0.3
(a)
By sketching a graph and using the table of the normal distribution, find the
following:
(i)
P(Z ≤ 0 . 957)
(ii)
P(Z > 0 . 957)
(iii)
P(Z ≤ − 0 . 957)
(iv)
P( − 0 . 957 < Z ≤ 0 . 957)
Page 30 of 49
YOUR NOTES

(b)
Find the value of z such that P (Z < z) = 0 . 3
Page 31 of 49
YOUR NOTES

 ExamA sketch
Tip
will always help you to visualise the required probability and
can be used to check your answer. Check whether the area shaded is
more or less than 50% and compare this with your answer.
Page 32 of 49
3.3.3 Normal Distribution - Calculations YOUR NOTES


Normal Distribution - Calculations
Throughout this section we will use the random variable X ∼ N(μ ,σ 2) . For normal, X
can take any real number. Therefore any values mentioned in this section will be
assumed to be any real number.
Page 33 of 49
Calculating Normal Probabilities YOUR NOTES

How do I find probabilities using a normal distribution? 
The area under a normal curve between the points x = a and x = b is equal to the
probability P(a < X < b )
Remember for a normal distribution P (a ≤ X ≤ b ) = P (a < X < b ) so you do not
need to worry about whether the inequality is strict (< or >) or weak (≤ or ≥)
The equation of a normal distribution curve is complicated so the area must be
calculated numerically
You will be expected to standardise all normal distributions to z and use the
table of the normal distribution to find the probabilities
It is likely that your calculator has a function that can find normal
probabilities, if so it is a good idea to learn to use it so that you can check
your probabilities
However you must show your calculations to get the z values and use the
tables to get all the marks
How do I calculate the probability for a normal distribution?
A random variable X ∼ N(μ ,σ 2) can be coded to model the standard normal
distribution Z ∼ N(0,12) using the formula
X −μ
Z=
σ
You can calculate a probability P(X < x ) using the relationship
⎛ x − μ ⎞⎟
P(X < x ) = P ⎜⎜ Z < ⎟
⎝ σ ⎠
Always sketch a quick diagram to visualise which area you are looking for
Once you have determined the z value use the table of the normal distribution to
find the probability
Refer to your sketch to decide if you need to subtract the probability from one
The probability of a single value is always zero for a normal distribution
You can picture this as the area of a single line is zero
P (X = x ) = 0
P (X < μ ) = P (X > μ ) = 0 . 5
You can look at which side of the mean x is on and the direction of the
inequality to decide if your answer should be greater or less than 0.5
As P(X = a ) = 0 you can use:
P(X < a ) + P(X > a ) = 1
⎛ a − μ ⎟⎞
P (X > a ) = 1 − P (X < a ) = 1 − Φ ⎜⎜ ⎟
⎝ σ ⎠
⎛ b −μ ⎞⎟ ⎛ a − μ ⎞⎟
P (a < X < b ) = P (X < b ) − P (X < a ) = Φ ⎜⎜ ⎟ − Φ ⎜⎜ ⎟
⎝ σ ⎠ ⎝ σ ⎠
Page 34 of 49

 The random variable X ∼ N(20 , 52) . Calculate:

(a)
P (X ≤ 22) ,
(b)
P(18 ≤ X < 27)
(a)
P (X ≤ 22) ,
(b)
P(18 ≤ X < 27)
Page 35 of 49
Inverse Normal Distribution YOUR NOTES

Given the value of P(X < a) or P(X > a) how do I find the value of a? 
Given a probability you will have to look through the table of the normal
distribution to locate the z-value that corresponds with that probability
Look at whether your probability is greater or less than 0.5 and the direction of
the inequality to determine whether your z-value will be positive or negative
If P (X < a ) is more than 0.5 or P (X > a ) is less than 0.5 then a should be bigger
than the mean
z will be positive
If P (X < a ) is less than 0.5 or P (X > a ) is more than 0.5 then a should be
smaller than the mean
z will be negative
You do not need to remember these, a sketch will help you see it
Always sketch a diagram
If your probability is less than 0.5 you will need to subtract it from one to find the
corresponding z value
Remember that the position of the z-value will not change, only the direction
of the inequality
Once you have the correct z value substitute it into the formula z = a −σ μ and
solve to find the value of a
Page 36 of 49
Always check that your answer makes sense by considering where a is in relation YOUR NOTES
to the mean 
Given the value of P(µ- a < X < µ + a) I find the value of a ?
A sketch making use of the symmetry of the graph is essential
⎛ 100 + α ⎞
If you are given P (μ − a < X < μ + a ) = α % then P (X < μ + a ) will be ⎜⎜ 2 ⎟⎟ %
⎝ ⎠
This is easier to see from a sketch than to remember
You can then look through the tables for the corresponding z-value and
(μ + a ) − μ a
substitute into the formula z = σ
=
σ
 Worked Example
The random variable W ∼ N(50, 36)
Find the value of w such that P (W > w ) = 0 . 7676
Page 37 of 49
YOUR NOTES

 ExamTheTip
most common mistake students make when finding values from
given probabilities is forgetting to check whether the z-value should be
negative or not. Avoid this by checking early on using a sketch whether
z is positive or negative and writing a note to yourself before starting
the other calculations.
Page 38 of 49
3.3.4 Finding Sigma and Mu YOUR NOTES


Finding Sigma and Mu
How do I find the mean (μ) or the standard deviation (σ) if one of them is
unknown?
If the mean or standard deviation of the X ∼ N (μ ,σ 2) is unknown then you will
need to use the standard normal distribution
You will need to use the formula
x −μ
z=
σ
or its rearranged form x = μ + σz
You will be given a probability for a specific value of x (P (X < x ) = p or P (X > x ) = p )
To find the unknown parameter:
STEP 1 : Sketch the normal curve
Label the known value and the mean
STEP 2 : Find the z-value for the given value of x
Use the table of the Normal Distribution to find the value of z such that
P (Z < z ) = p or P (Z > z ) = p
Make sure the direction of the inequality for Z is consistent with X
The table gives the z -value to three decimal places to avoid rounding errors
Use the sketch to help you decide whether your z value is positive or
negative
You should use the 3 decimal places throughout your calculations so that
your final answer can be rounded to 3 significant figures
x −μ
STEP 3 : Substitute the known values into z = or x = μ + σz
σ
You will be given x and one of the parameters (μ or σ) in the question
You will have calculated z in STEP 2
STEP 4 : Solve the equation
How do I find the mean (μ) and the standard deviation (σ) if both of them
are unknown?
If both of them are unknown then you will be given two probabilities for two
specific values of x
The process is the same as above
You will now be able to calculate two z-values
You can form two equations (rearranging to the form x = μ + σz is helpful)
You now have to solve the two equations simultaneously (you can use your
calculator to do this)
Be careful not to mix up which z-value goes with which value of x
Page 39 of 49

 It is known that the times, in minutes, taken by students at a school to eat

their lunch can be modelled using a normal distribution with mean μ
minutes and standard deviation σ minutes.
Given that 10% of students at the school take less than 12 minutes to eat
their lunch and 5% of the students take more than 40 minutes to eat their
lunch, find the mean and standard deviation of the time taken by the
students at the school.
 ExamThese
Tip
questions are normally given in context so make sure you identify
the key words in the question. Check whether your z-values are positive
or negative and be careful with signs when rearranging.
Page 40 of 49
3.4 Working with Distributions YOUR NOTES


3.4.1 Modelling with Distributions
Modelling with Distributions
When should I use a binomial distribution?
A random variable that follows a binomial distribution is a discrete random
variable
A binomial distribution is used when the random variable counts something
The number of successful trials
The number of members of a sample that satisfy a criterion (satisfying the
criteria can be seen as a successful trial)
There are four conditions that X must fulfil to follow a binomial distribution
There is a fixed finite number of trials (n)
The trials are independent
There are exactly two outcomes of each trial (success or failure)
The probability of success (p) is constant
When should I use a geometric distribution?
A random variable that follows a geometric distribution is a discrete random
variable
A geometric distribution is used when the random variable counts something
The number of trials until a successful trial
The conditions that X must fulfil to follow a geometric distribution are exactly the
same as for a binomial distribution except there is no fixed number of trials
Instead, the trials will continue until the first time a success occurs
When should I use a normal distribution?
A random variable that follows a normal distribution is a continuous random
variable
A normal distribution is used when the random variable measures something and
the distribution is:
Symmetrical
Bell-shaped
A normal distribution can be used to model real-life data provided the histogram
for this data is roughly symmetrical and bell-shaped
If the variable is normally distributed then as more data is collected the outline
of the histogram should get smoother and resemble a normal distribution
curve
Page 41 of 49
YOUR NOTES

Can the binomial distribution and the normal distribution be used in the
same question?
Some questions might require you to first use the normal distribution to find the
probability of success and then use the binomial distribution
Remember a discrete distribution is either a binomial or geometric distribution
The key is to make sure you are very clear about what each parameter/variable
represents
 Worked Example
In a population of cows, the masses of the cows can be modelled using a
normal distribution with mean 550 kg and standard deviation 80 kg. A
farmer classifies cows as beefy if they weigh more than 700 kg. The farmer
takes a random sample of 10 cows and weighs them.
Find the probability that at most one cow is beefy.
Page 42 of 49
YOUR NOTES

 ExamAlways
Tip
state what your variables and parameters represent. Make sure
you know the conditions for when each distribution is (or is not) a
suitable model.
Page 43 of 49
3.4.2 Normal Approximation of Binomial YOUR NOTES


Normal Approximation of Binomial
When can I use a normal distribution to approximate a binomial
distribution?
A binomial distribution X ∼ B (n ,p ) can be approximated by a normal distribution
X N ∼ N(μ ,σ 2) provided
n is large
p is close to 0.5
np > 5
nq > 5 where q = 1 − p
The mean and variance of a binomial distribution can be calculated by:
μ = np
σ 2 = np (1 − p )
Why do we use approximations?

If there are a large number of values for a binomial distribution there could be a
lot of calculations involved and it is inefficient to work with the binomial
distribution
These days calculators can calculate binomial probabilities so approximations
are no longer necessary
However it is easier to work with a normal distribution
You can calculate the probability of a range of values quickly
You can use the inverse normal distribution function (most calculators
don't have an inverse binomial distribution function)
Page 44 of 49
In your exam you must use the formula and not a calculator to find binomial YOUR NOTES
probabilities so you are limited to small values of n 
What are continuity corrections?
The binomial distribution is discrete and the normal distribution is continuous
A continuity correction takes this into account when using a normal
approximation
The probability being found will need to be changed from a discrete variable, X,
to a continuous variable, XN
For example, X = 4 for binomial can be thought of as 3. 5 ≤ X N < 4. 5 for normal
as every number within this interval rounds to 4
Remember that for a normal distribution the probability of a single value is
zero so P(3. 5 ≤ X N < 4. 5) = P(3. 5 < X N < 4. 5)
How do I apply continuity corrections?
Think about what is largest/smallest integer that can be included in the inequality
for the discrete distribution and then find its upper/lower bound
P (X = k ) ≈ P (k − 0 . 5 < X N < k + 0 . 5)
P (X ≤ k ) ≈ P (X < k + 0 . 5)
N
You add 0.5 as you want to include k in the inequality
P (X < k ) ≈ P (X < k − 0 . 5)
N
You subtract 0.5 as you don't want to include k in the inequality
P (X ≥ k ) ≈ P (X > k − 0 . 5)
N
You subtract 0.5 as you want to include k in the inequality
P (X > k ) ≈ P (X > k + 0 . 5)
N
You add 0.5 as you don't want to include k in the inequality
For a closed inequality such as P (a < X ≤ b )
Think about each inequality separately and use above
P (X > a ) ≈ P (X N > a + 0 . 5)
P (X ≤ b ) ≈ P (X < b + 0. 5)
N
Combine to give
P (a + 0 . 5 < X N < b + 0 . 5)
How do I approximate a probability?

STEP 1 : Find the mean and variance of the approximating distribution
μ = np
σ 2 = np (1 − p )
STEP 2 : Apply continuity corrections to the inequality
STEP 3 : Find the probability of the new corrected inequality
Find the standard normal probability and use the table of the normal
distribution
The probability will not be exact as it is an approximate but provided n is large
and p is close to 0.5 then it will be a close approximation
To decide if n is large enough and if p is close enough to 0.5 check that:
Page 45 of 49
np > 5 YOUR NOTES

np > 5 where q =1−p 
 Worked Example
The random variable X ∼ B (1250, 0 . 4) .
Use a suitable approximating distribution to approximate P(485 ≤ X ≤ 530)
.
 ExamIn the
Tip
exam, the question will often tell you to use a normal
approximation but sometimes you will have to recognise that you
should do so for yourself. Look for the conditions mentioned in this
revision note, n is large, p is close to 0.5, np > 5 and nq > 5.
Page 46 of 49
Page 47 of 49
Page 48 of 49
Page 49 of 49

Statistical Distributions

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Statistical Distributions

Uploaded by

Copyright:

Available Formats

Head to savemyexams.co.

uk for more awesome resources

3.1 Probability Distributions YOUR NOTES

Probability Distributions (Discrete) YOUR NOTES

Cumulative Probabilities (Discrete) YOUR NOTES

Worked Example YOUR NOTES

(b) Calculate P(X ≤ 3) .

3.1.2 E(X) & Var(X) (Discrete) YOUR NOTES

Is E(X²) equal to (E(X))²?

What does Var(X) mean and how do I calculate Var(X)?

Worked Example YOUR NOTES

3.2 Binomial & Geometric Distribution YOUR NOTES

What are the important properties of a binomial distribution?

Modelling with Binomial Distribution YOUR NOTES

3.2.2 Calculating Binomial Probabilities YOUR NOTES

3.2.3 The Geometric Distribution YOUR NOTES

i.e. The probability of “success” does not change between trials

What does a geometric distribution look like?

Answer: a = p , r = q = 1 − p , S ∞ = 1 YOUR NOTES

P (X = r ) > 0 for all r

Geometric distributions have the recurrence relation P (X = r ) = qP (X = r − 1)

Modelling with Geometric Distribution YOUR NOTES

Calculating Geometric Probabilities YOUR NOTES

Logic can be used to deduce most geometric distribution questions so

3.3 Normal Distribution YOUR NOTES

What are the important properties of a normal distribution?

Modelling with Normal Distribution YOUR NOTES

3.3.2 Standard Normal Distribution YOUR NOTES

Why is the standard normal distribution important?

The four cases in terms of Φ(z ) are: YOUR NOTES

Worked Example YOUR NOTES

3.3.3 Normal Distribution - Calculations YOUR NOTES

Calculating Normal Probabilities YOUR NOTES

Worked Example YOUR NOTES

Inverse Normal Distribution YOUR NOTES

3.3.4 Finding Sigma and Mu YOUR NOTES

Worked Example YOUR NOTES

3.4 Working with Distributions YOUR NOTES

3.4.2 Normal Approximation of Binomial YOUR NOTES

Why do we use approximations?

How do I approximate a probability?

np > 5 YOUR NOTES

You might also like