You are on page 1of 102

Random Variables

A random variable is a variable whose value depends on


the outcome of a random event

Random Event Random Variable

Rolling a fair six-sided The number of 2s that


dice five times were rolled

Flipping a fair coin The number of times


ten times the coin landed on tails
The normal distribution
The normal distribution focuses on continuous random
variables
These can take any numerical value and are modelled using a
curve instead

The lengths of a random sample of 100


sidewinder snakes in the Sahara desert
The normal distribution
The normal distribution focuses on continuous random
variables
These can take any numerical value and are modelled using a
curve instead

The scores achieved by 250 students in a


university entrance exam
The normal distribution
The normal distribution focuses on continuous random
variables
These can take any numerical value and are modelled using a
curve instead

The masses of honey badgers in a random


sample of 1000
The normal distribution
The normal distribution focuses on continuous random
variables
These can take any numerical value and are modelled using a
curve instead

The shoe sizes of 200 randomly selected


women in a particular town
Let

𝑃 ( 𝑥=4)
𝑥 ≤ 3) 𝑃 (𝑥= 𝑋 )
The probability of getting 4 heads will
The probability of getting 3 heads or
be equal to the height of this bar
less will be equal to the sum of the
heights of these bars

0 1 2 3 4 5 6 7 8
The normal distribution
 For a quantity such as height, the majority of people will be close to the average

 As we move further away from the average height, there are fewer people
 ie – there would be a lower probability of randomly selecting one at that height

130 140 150 160 170 180

The shape
If weofreduce
the Histogram becomes
the classwidths smoother. If
used…
we continue indefinitely, we get what is known as
the normal distribution curve…
The normal distribution
 For a quantity such as height, the majority of people will be close to the average

 As we move further away from the average height, there are fewer people
 ie – there would be a lower probability of randomly selecting one at that height

130 140 150 160 170 180

If we reduce the classwidths used…

The shape of the Histogram becomes smoother. If


we continue indefinitely, we get what is known as
the normal distribution curve…
The normal distribution

140 150 160 170 180 190 200 210


𝐻
The normal distribution
The normal distribution
The normal distribution has 2 parameters
the mean ()
the variance () so the standard deviation is

2
𝑁 (𝜇 , 𝜎 )
It is symmetrical, so mean = median = mode

It has asymptotes at each end

 It has points of inflexion at and


𝜇 − 𝜎𝜇𝜇+ 𝜎

Where the slope of the normal distribution curve changes


The continuous random variable Y is normally distributed
with a mean of 50 and a standard deviation of 4.
Sketch the distribution of Y.
The lengths, Xmm, of a bolt produced by a particular
machine are normally distributed with mean 35mm and
standard deviation 0.4mm.
Sketch the distribution of X.
The normal distribution
The area under the curve is
used to calculate probabilities

This means the area under the


curve totals to 1 or 100%

Since the mean splits the curve


down the middle, the area on
either side is 0.5 or 50%
The normal distribution
Approximately 68% of the data is
within one standard deviation of the
mean

So what would
For this area be?

Area
= 0.68
.16
a =0
Are
𝐻
The normal distribution
Approximately 95% of the data is
within 2 standard deviations of the
mean

For

Area
= 0.95

𝐻
The normal distribution
Almost all (99.7%) of the data is within 3
standard deviations of the mean

For

Area = 0.997

𝐻
The normal distribution
EXAMPLE 175cm
The heights of males in a city, Xcm, are modelled
as X~N(175, 144)

State the proportion of males that have a height:

a) More than 175cm

b) Between 163cm and 187cm

c) Between 151cm and 199cm

d) Less than 187cm


The normal distribution
EXAMPLE
The diameters of a rivet produced by a
particular machine, mm, is modelled as .
Find:

a)

b)

c)

3A
Exercise 3A – Page 40
The normal distribution
The lengths of a colony of adders, Ycm, are modelled as Y~N(100,).
If 68% of the adders have a length between 93cm and 107cm, find .
Exercise 3A – Page 40
Finding probabilities for normal
distributions
 Now, on your calculator, press
menu
𝜇=30
You need to be able to use the
normal cumulative distribution
𝜎 =4
function on your calculator to
find probabilities  Now press 7 for distributions

Given that , find:


30 33

a) ¿ 0.773 3  Now press 2 for the normal cumulative


distribution
b)
c)  Now input the lower limit. If there is
no lower limit, choose a value at least 5
d) standard deviations below the mean (so
in this case any value less than or equal
to 10 would be fine)

Even though we will use your


calculator, we should still sketch  Now input the upper limit (in this
the situation… case 33), the standard deviation and
the mean

 Press equals and the calculator will tell you the area!
3B
Finding probabilities for normal
distributions
𝜇=30
You need to be able to use the
normal cumulative distribution
𝜎 =4
function on your calculator to
find probabilities

Given that , find:


24 30
 Input the lower limit (in this case
a) ¿ 0.773 3 24 – it does not matter whether the
inequality also includes 24 or not)
b) ¿ 0.9331
c)
d)  Now input the upper limit (in this
case there is none, so any value
above 50), the standard deviation
Even though we will use your and the mean
calculator, we should still sketch
the situation…
 Press equals and the calculator will tell you the area

3B
Finding probabilities for normal
distributions
𝜇=30
You need to be able to use the
normal cumulative distribution
𝜎 =4
function on your calculator to
find probabilities

Given that , find:


3033.5 38.2

a) ¿ 0.773 3  Now input both limits,


the standard deviation and
b) ¿ 0.9331 the mean (it does not
matter whether the
c) ¿ 0.1706 inequality includes the
value or not)
d)

 Press equals and the calculator will tell you the area
Even though we will use your
calculator, we should still sketch
the situation…

3B
Finding probabilities for normal
distributions
𝜇=30
You need to be able to use the
normal cumulative distribution
𝜎 =4
In this case, we can find the
function on your calculator to area between 27 and 32, and
find probabilities then subtract it from 1…

Given that , find: 27 3032

a) ¿ 0.773 3
 Now input the limits,
b) ¿ 0.9331 mean and standard
c) ¿ 0.1706 deviation

d) ¿ 0.5352
 Press equals
¿ 0.4648
Even though we will use your
calculator, we should still sketch  In this case, remember
that you need to subtract
the situation… the answer from 1…

3B
Finding probabilities for normal
distributions
You need to be able to use the Use your calculator as before. The lower limit should be
normal cumulative distribution 140, the upper limit should be at least 175 (5 times the
function on your calculator to standard deviation above the mean).
find probabilities
The mean is 100 and the standard deviation is 15…

An IQ test is applied to a population of


adults. The scores, X, on the test are
found to be normally distributed with .
Adults scoring more than 140 on the
test are classified as ‘genius’.

a) Find the probability that an adult


chosen at random achieves a ‘genius’
classification
= 0.00383
b) Twenty adults take the test. Find
the probability that two or more
are classified as ‘genius’

3B
Statistical Distributions
With a number of trials, , and a
probability of success,
If is binomially
distributed

𝑋 𝐵 (𝑛, 𝑝)
The Binomial Distribution
𝑋 𝐵 (𝑛, 𝑝), then:
If
Probability of
failure to the power

𝑃 ( 𝑋 =𝑟 ) =
( )
𝑛
𝑟
𝑟
𝑝 (1 −𝑝 )
𝑛− 𝑟

The probability of getting Probability of


successful trials is… No. ways of choosing
successes from success to the
trials power r

This is in the formula booklet in the A-level section!


The Binomial Distribution
EXAMPLE
Gary is playing chess against
Nigel and has a chance of
winning each game. If Gary
𝑋 𝐵 5, ( ) 2
3
plays 5 games against Nigel,
work out the following
𝑃 ( 𝑋 =𝑟 ) =
( )
𝑛
𝑟
𝑟
𝑝 (1 −𝑝 )
𝑛− 𝑟

probabilities:

() ( ) ( )
2 3
( ) 5 2 1
𝑃 𝑋 =2 = × ×
2 3 3
a)
40
¿ 𝑜𝑟 0.165 (3 𝑠 . 𝑓 )
b) 243

c)
The Binomial Distribution
The random variable . Find:

a)

b)

c)

6B
The random variable . Find:

a)

b)

c)

d)
The normal distribution
For this part you will need to use what you learned in year 12
You need to be able to use the (the binomial distribution)
normal cumulative distribution
function on your calculator to There are 20 adults in the sample, and the probability of a
find probabilities genius is 0.00383

Therefore, for this part,


An IQ test is applied to a population of
adults. The scores, X, on the test are If we let , we want:
found to be normally distributed with .
Adults scoring more than 140 on the
test are classified as ‘genius’.
which equals

a) Find the probability that an adult which equals


chosen at random achieves a ‘genius’
classification
On your calculator, press 7, down, 1, 2, and input the
= 0.00383 number of successes we are considering (1), the number of
trials (20), and the probability of success (0.00383)
b) Twenty adults take the test. Find
the probability that two or more
are classified as ‘genius’  The calculator will give you the answer 0.9973…
Remember to subtract this from 1

 So the answer is 0.00266


3B
Exercise 3B – Page 43
• All questions
Teachings for
Exercise 3C
Inverse normal distribution
 Now, on your calculator, press
menu
𝜇=20
You also need to be able to use
the inverse function on you
𝜎 =3
calculator, to find values that

Area = 0.75
will give specific probabilities
 Now press 7 for distributions

Given that find, to two decimal places,


the values of such that: 20 𝑎

 Now press 3 for the Inverse


a) normal distribution
b)
¿ 22.02
c)
 Now input the probability we
want, as well as the standard
deviation and mean
Draw a sketch to get an idea of where
the value should be
 You can use the area to estimate the  Press equals and the calculator
value we are looking for… will tell you the value which has this
area below it (as in, below the value,
to the left of it)

3C
Inverse normal distribution
𝜇=20
You also need to be able to use
the inverse function on you
𝜎 =3
calculator, to find values that
will give specific probabilities
Area =
0.4

Given that find, to two decimal places,


the values of such that: 𝑎
20

a)  Be careful here! The calculator


b)
¿ 22.02 bases calculations on the area below
the value given. So in this case, we
c)
¿ 20.76 need to put in the probability as 0.6.

Draw a sketch to get an idea of where  After putting the mean and standard deviation in as well, the
the value should be calculator will tell you the value…

 If the area above a value is 0.4, that


value must be above the mean

3C
Inverse normal distribution
𝜇=20
You also need to be able to use
the inverse function on you
𝜎 =3
calculator, to find values that Area =
will give specific probabilities 0.3

Area =
0.0912
Given that find, to two decimal places,
the values of such that: 16𝑎20

 The total area ‘below’ value will


a)
be equal to 0.3 plus the area below
b)
¿ 22.02 16. We can find the area below 16

c)
¿ 20.76 using the function on your calculator
we used in section 3B
¿ 19.17
 It is 0.0912…
Draw a sketch to get an idea of where
the value should be
 So put in an area of 0.3912, as well
as the mean and standard deviation, in
order to find the value

3C
Inverse normal distribution
𝜇=20
You also need to be able to use
the inverse function on you
𝜎 =1.5
calculator, to find values that

Area = 0.6
will give specific probabilities

Plates made using a particular


manufacturing process have a 20 𝑥
diameter, , which can be modelled using
a normal distribution, . Use your calculator as before to find the value .

 The mean is 20, the standard deviation is 1.5, and


a) Given that 60% of plates are less the area/probability is 0.6
than , find the value of .
¿ 20.38 𝑐𝑚  The answer is 20.38cm
b) Find the interquartile range of the
plate diameters

3C
Inverse normal distribution
𝜇=20
You also need to be able to use
the inverse function on you
Area =
0.25
𝜎 =1.5
Area =
calculator, to find values that 0.25
will give specific probabilities
Area = Area =
0.25 0.25
Plates made using a particular
manufacturing process have a 𝐿𝑄 20 𝑈𝑄
diameter, , which can be modelled using
a normal distribution, .  Firstly, remember that for normally distributed
data, Mean = Median = Mode, so the value of 20 is
also the Median
a) Given that 60% of plates are less
than , find the value of .
 Secondly, remember that the quartiles and the
¿ 20.38 𝑐𝑚 median split the data into 4 equal parts, with the
same quantity of data in each
b) Find the interquartile range of the
plate diameters
 Therefore, the area below the lower quartile (and
all the other areas), will be 0.25

3C
Inverse normal distribution
𝜇=20
You also need to be able to use
the inverse function on you
𝜎 =1.5
calculator, to find values that
will give specific probabilities
Area =
0.25
Plates made using a particular
manufacturing process have a 𝐿𝑄 20 𝑈𝑄
18.988 21.011
diameter, , which can be modelled using 1.011 1.011
a normal distribution, .  So find the value that has an area of 0.25 less that
it, using your calculator as before.
a) Given that 60% of plates are less  You should get 18.988…
than , find the value of .
¿ 20.38 𝑐𝑚  Subtract this value from 20 to find the distance
between the LQ and the Median
b) Find the interquartile range of the
plate diameters
 You should get 1.011

 Then, add this to the Median to get the UQ

 You should get 21.011

3C
Inverse normal distribution
𝜇=20
You also need to be able to use
the inverse function on you
𝜎 =1.5
calculator, to find values that
will give specific probabilities
Area =
0.25
Plates made using a particular
manufacturing process have a 18.988 20 21.011
diameter, , which can be modelled using
a normal distribution, .  So the interquartile range will be the difference
between the two values we calculated
a) Given that 60% of plates are less  This is equal to 2.02, to 2dp
than , find the value of .
¿ 20.38 𝑐𝑚 (note that we could also find the difference between
the Median and the LQ, and just double that!)
b) Find the interquartile range of the
plate diameters
¿ 2.02 𝑐𝑚

3C
Teachings for
Exercise 3D
The normal distribution
𝜇=25
You need to be able to use the
standard normal distribution. If 𝜎 =3
you need to work out an unknown Let
mean or standard deviation, you
will need to use this…

A set of normally distributed data can The coding will change all 25 31
be coded so it fits the standard normal values in the same way, such
distribution that the mean becomes 0 and
the standard deviation

 The standard normal distribution has


becomes 1
𝜇=0
a mean of 0 and a standard deviation of 𝜎 =1
1
Then
All values maintain their relative positions

 So a value of 31 in the original


distribution will become a value of 2 in
the standard distribution (2 standard 0 2
deviations above the mean)
Note that the 2 diagrams are NOT TO SCALE!

3D
𝑋 −𝜇
𝑍=
𝜎

The normal distribution


You need to be able to use the
standard normal distribution. If
you need to work out an unknown
mean or standard deviation, you
will need to use this…
The original set
of values
The coding used is shown, and would be
applied to all values in the original data The mean
set…

𝑋 −𝜇
𝑍=
𝜎

The The standard


standardised deviation
set of values

3D
𝑋 −𝜇
𝑍=
𝜎

The normal distribution


Shoe size Frequency Coded Frequency
You need to be able to use the () data ()
standard normal distribution. If 6 1 -2.21 1
you need to work out an unknown -1.47 3
mean or standard deviation, you 7 3
will need to use this… -0.73 6
8 6
0 10
9 10
For example, imagine we had this table 0.73 6
of data on shoe sizes () 10 6 1.47 3
11 3 2.21 1
12 1
We could calculate the mean and We could calculate the mean and standard
standard deviation for this set of data deviation for this new set of data

Mean of= 9 𝑋 −𝜇 Mean of= 0


𝑍=
𝜎
Standard Deviation of = 1.36 Standard Deviation of = 1
 Now, we code the data using the
formula we saw, to create a new data Notice that the numbers in the new data
set, Z (the frequencies stay the same) set are also symmetrical about 0…

3D
𝑋 −𝜇
𝑍=
𝜎

The normal distribution


You could calculate using what you have learnt so far this
You need to be able to use the chapter, along with your calculator
standard normal distribution. If
 This question is asking you to write an equivalent
you need to work out an unknown
probability, based on the standardised normal
mean or standard deviation, you
distribution
will need to use this…
 The letter is the Greek capital phi
The random variable . Write in terms of 𝑋 −𝜇
for some value : 𝑧=
𝜎 Sub in values, using the value of given in
the question. A lowercase is used here as
53 −50 we are now working out a specific value
a)
𝑧=
¿ Φ (0.75) 4
Calculate

𝑧 =0.75
This is telling us that when we code the original data set to
the standardized one, a value of will become

The phi here means the area below  Therefore:


the value given (0.75), in the 𝑃 ( 𝑋 <53 )=𝑃 ( 𝑋 < 0.75)
standardised normal distribution
¿ Φ (0.75)
3D
𝑋 −𝜇
𝑍=
𝜎

The normal distribution


You need to be able to use the 𝜇=50
standard normal distribution. If
you need to work out an unknown
𝜎 =4
mean or standard deviation, you Original
will need to use this…

The random variable . Write in terms of


for some value : 50 53

a)
¿ Φ (0.75)
𝜇=0
If we compare the diagram for the
𝜎 =1
original distribution, and the new one, Standardised
the probability of getting below 53 on
the first is the same as the probability
of getting below 0.75 on the second

 You can also think of it is ’53 is 0.75 0 0.75


standard deviations above the mean’

3D
𝑋 −𝜇
𝑍=
𝜎

The normal distribution


𝑋 −𝜇
𝑍=
You need to be able to use the 𝜎 Sub in values, using the value of given in
standard normal distribution. If the question. A lowercase is used here as
you need to work out an unknown 55 −50 we are now working out a specific value
mean or standard deviation, you 𝑧=
4
will need to use this… Calculate

𝑧 =1.25
The random variable . Write in terms of
for some value :
This is telling us that when we code the original data set to
the standardized one, a value of will become
b)
 Therefore:
Remember we
𝑃 ( 𝑋 ≥ 53 ) =𝑃 ( 𝑍 ≥ 1.25) would need to
use the area
below the value
¿ 1 − 𝑃 ( 𝑍 <1.25) given

¿ 1 −Φ (1.25)

 You can also think of it as ’55 is 1.25


standard deviations above the mean’

3D
𝑋 −𝜇
𝑍=
𝜎

The normal distribution


You need to be able to use the
standard normal distribution. If
you need to work out an unknown
mean or standard deviation, you
will need to use this…

You might be asked to find a value for z


which corresponds to a particular
probability. You are given some
standard values in the formula booklet… This is saying that the value with an
probability of 0.3 above it is 0.5244

 Be careful here, these values 𝜇=0


correspond to the region above a
value in the standard normal 𝜎 =1
distribution
Standardised

Area = 0.3

0?
0.5244

3D
𝑋 −𝜇
𝑍=
𝜎

The normal distribution 2


𝑆 𝑁 (127 , 16
You need to be able to use the 𝜇=127
standard normal distribution. If  The 95th percentile 𝜎 =16
you need to work out an unknown has 5% of the data above
mean or standard deviation, you it, so there is a value
will need to use this… which has this property… Area =
0.05

The systolic blood pressure (pressure 127 ?


when the heart beats) of an adult
population, , is modelled as a normal
distribution with mean 127 and standard 2
deviation 16. 𝑍 𝑁 (0 , 1 )
𝜇=0
 We will need to start
A medical researcher wants to study
by finding the equivalent 𝜎 =1
adults with blood pressures higher than
the 95th percentile. Find the minimum value on the standardised
blood pressure needed for an adult to distribution
Area =
be included in her survey 0.05

0 ?
 Start by drawing a sketch for the
original distribution, and the
standardised one…
3D
𝑋 −𝜇
𝑍=
𝜎

The normal distribution 2


𝑆 𝑁 (127 , 16
You need to be able to use the 𝑍=
𝑋 −𝜇 𝜇=127
𝜎
standard normal distribution. If Sub in
𝜎 =16
you need to work out an unknown values
𝑥 − 127
mean or standard deviation, you 1.6449=
16
will need to use this… Multiply
Area =
by 16
0.05
26.3181=𝑥 −127
 Use the table from before to find
Add 127 127 153
?
and round
the value which has if needed
153=𝑥
2
𝑍 𝑁 (0 , 1 )
𝜇=0
𝜎 =1

Area =
0.05

0 1.6449
?
 Now we need to use the formula in
reverse. We have the value of and
want the corresponding value of
3D
Teachings for
Exercise 3E
𝑋 −𝜇
𝑍=
𝜎

The normal distribution 2


𝑋 𝑁 (𝜇, 3 )
We now know that the point 20
You need to be able to find on the original distribution is 𝜇=?
unknown values of , , or both 0.8416 standard deviations above
the mean… 𝜎 =3
 Use the formula above to find
The random variable . Given that , find what the mean is… Area =
the value of . 0.2
𝑥 −𝜇
𝑧=
𝜎
? 20
 As in the previous section, start by Sub in
drawing 2 sketches, one of the 20 − 𝜇 values
distribution for , and another for the 0.8416= 𝑍 𝑁 (0 , 1 )
2
3
standard distribution Multiply
by 3 𝜇=0
2.5248=20 − 𝜇
 Use the table to find the Rearrange
𝜎 =1
corresponding value of … 𝜇=17.5 (3 𝑠𝑓 )
Area =
So we found that the 20 must be 0.8416 0.2
standard deviations above the mean
0 0.8416
?
 We then subtracted 0.8416 standard
deviations from this (

3E
𝑋 −𝜇
𝑍=
𝜎

The normal distribution 2


𝑋 𝑁 (50 , ? )
We now know that the point 46
You need to be able to find on the original distribution is 𝜇=50
unknown values of , , or both 0.7998 standard deviations below
the mean… 𝜎 =?
 Use the formula above to find
A machine makes metal sheets with what the standard deviation is… Area =
width , modelled as a normal 0.2119
distribution such that .
𝑥 −𝜇
𝑧=
𝜎
46 50
Sub in
a) Given that , find the value of . 46 −50 values
− 0.7998= 2
𝜎 Multiply by , 𝑍 𝑁 (0 , 1 )
divide by -
 Draw the sketches… 46 − 50 0.7798
𝜇=0
𝜎=
−0.7998
𝜎 =1
Calculate
 You can use the inverse function on 𝜎 =5.00 (3 𝑠𝑓 )
your calculator (see section 3C) to Area =
find the missing value on the 0.2119
standard distribution
 The probability is 0.2119, the mean is −0.7998
?0
0 and the standard deviation is 1

3E
𝑋 −𝜇
𝑍=
𝜎

The normal distribution


You need to be able to find
unknown values of , , or both

A machine makes metal sheets with


width , modelled as a normal
distribution such that .

a) Given that , find the value of .

𝜎 =5.00 (3 𝑠𝑓 )
b) Find the 90th percentile of the
widths

 You can do this using the inverse


function on your calculator So for this question, , and

 Using the inverse probability


 The 90 percentile is the value with
th
function, the value will be 56.4cm
90% of the data below it

3E
𝑋 −𝜇
𝑍=
𝜎

The normal distribution 2


𝑋 𝑁 (𝜇, 𝜎 )
You need to be able to find Lower tail 𝜇=?
unknown values of , , or both 𝜎 =?
𝑥 −𝜇
𝑧=
𝜎
Sub in
The random variable . Given that and , values Area = Area =
15 −𝜇
find the values of and . −1.05= 0.1469 0.025
𝜎
Multiply
by 15 𝜇 35
 Start by drawing the two sketches… −1.05 𝜎=15 − 𝜇

2
𝑍 𝑁 (0 , 1 )
 Find the two values for in the
standard distribution diagram. You Upper tail 𝜇=0
can use the table of values for the 𝜎 =1
𝑥 −𝜇
upper tail, and the inverse function 𝑧=
on your calculator for the lower tail… 𝜎
Sub in
35 −𝜇 values Area = Area =
1.96= 0.1469 0.025
𝜎
 Then use the formula above with Multiply
each of these… by ? 0
−1.05 1.96
?
1.96 𝜎=35 − 𝜇

3E
𝑋 −𝜇
𝑍=
𝜎

The normal distribution


You need to be able to find
unknown values of , , or both 1) −1.05 𝜎=15 − 𝜇

2) 1.96 𝜎=35 − 𝜇
The random variable . Given that and ,
find the values of and . 2) - 1)

3.01 𝜎 =20
 Now we have 2 equations, we can Divide by 3.01
solve them simultaneously 𝜎 =6.64 Use this value to find
the mean
𝜇=22.0

3E
Teachings for
Exercise 3F
𝑋 −𝜇
𝑍=
𝜎

The normal distribution


You need to be able to approximate
a binomial distribution
Let’s compare the two quickly…

Binomial distribution Normal distribution


 Discrete data  Continuous data

No. heads when flipping a Height of the flipped coin


coin 7 times

You can use the processes you have learnt for the normal distribution to approximate a binomial
distribution under 2 conditions

 The value of (the number of trials) must be ‘large’ (this will mean the distribution is smoother)

 The probability of success must be close to 0.5 (if it is not, then the distribution will not be
symmetrical)

3F
𝑋 −𝜇 Using normal distribution to  must be large
𝑍= approximate a binomial distribution  must be close to 0.5
𝜎

The normal distribution 𝜇=𝑛𝑝

You need to be able to approximate Let


a binomial distribution
 So is binomially distributed, we are doing 100
trials with a 0.45 chance of success on each
trial
Starting with a Binomial distribution,
 Therefore, we would expect that on average,
we get successes
 We are going to use a normal
distribution to approximate it (for  So the ‘mean’ is the product of and
example, the binomial distribution
tables you are given only go up to  Therefore,

 So we are going to use a normal


distribution
 Finding a relationship for the standard
deviation is much more difficult to
understand…
 We need to be able to get from
and to the values of and  We will need to clarify some statements first…

3F
𝑋 𝐵 (𝑛, 𝑝)
𝑋 𝑁 (𝜇, 𝜎)

The normal distribution 𝜇=𝑛𝑝

You need to be able to approximate


a binomial distribution 1 10 10
2 10 20
The expected value of a set of data is 3 10 30
essentially its mean.
4 10 40
5 10 50
Imagine we had a normal unbiased 6
6 10 60
sided dice. If we rolled it 60 times we
would expect to get 10 of each
number.
𝑛=60 ∑ 𝑓𝑥=210
Mean =
Sub in values
Let be the number we get when
rolling the dice, be the frequency of Mean =
that number, and be the total number Calculate
of trials
Mean =

3F
𝑋 𝐵 (𝑛, 𝑝)
𝑋 𝑁 (𝜇, 𝜎)

The normal distribution 𝜇=𝑛𝑝

You need to be able to approximate


a binomial distribution 1
2
The expected value of a set of data is 3
essentially its mean.
4
5
However, rather than doing a trial and
6
using that to calculate the mean (ie its
expected value), you can calculate it
using the probabilities of each event 21
happening… ∑ 𝑥 × 𝑝(𝑥) = 6

 Doing this calculates the expected


value straight away, without needing
∑ 𝑥×𝑝(𝑥) =3.5
to divide by

3F
𝑋 𝐵 (𝑛, 𝑝)
𝑋 𝑁 (𝜇, 𝜎)

The normal distribution 𝜇=𝑛𝑝

You need to be able to approximate


a binomial distribution

What you have just seen is showing


that the following two calculations are
equivalent…

∑𝑥
𝑛 ¿ ∑ 𝑥 ×𝑝 (𝑥)

Multiply each outcome by its
Add all the outcomes up, and divide by probability, and then add them all
how many there are in total up after
(note that in the example we used , but So really it is
the idea is the same – add up all the
values! The fx just sped things up a bit)

3F
∑𝑥 𝑋 𝐵 (𝑛, 𝑝)
𝑛 ¿ ∑ 𝑥 ×𝑝 (𝑥) 𝑋 𝑁 (𝜇, 𝜎)

The normal distribution



𝜇=𝑛𝑝

You need to be able to approximate


a binomial distribution 1 1 10 10
2 4 10 40
It does not matter what set of values 3 9 10 90
we start with.
4 16 10 160
5 25 10 250
 For example, if the starting values
6 36 10 360

∑ 𝑓 𝑥 =910
were all instead of … 2
𝑛=60

Mean of =
Sub in values
Mean of =
Calculate

Mean of =

So the ‘expected value’ of is 15.16…

3F
∑𝑥 𝑋 𝐵 (𝑛, 𝑝)
𝑛 ¿ ∑ 𝑥 ×𝑝 (𝑥) 𝑋 𝑁 (𝜇, 𝜎)

The normal distribution



𝜇=𝑛𝑝

You need to be able to approximate


a binomial distribution 1 1
2 4
It does not matter what set of values 3 9
we start with.
4 16
5 25
 Like the previous example, we can
6 36
find the expected value of by using
the original probabilities…
91
∑ 𝑥 × 𝑝(𝑥) =
2
6

∑ 𝑥 ×𝑝 ( 𝑥 ) =15.16…
2

3F
∑𝑥 ∑ 𝑥2 𝑋 𝐵 (𝑛, 𝑝)
𝑛 ¿ ∑ 𝑥 ×𝑝 (𝑥) 𝑛 ¿ ∑ 𝑥2 ×𝑝 (𝑥) 𝑋 𝑁 (𝜇, 𝜎)

The normal distribution


❑ ❑
𝜇=𝑛𝑝

You need to be able to approximate


a binomial distribution

What you have just seen is showing


that the following two calculations are
equivalent…

∑𝑥 2

𝑛 ¿ ∑ 𝑥 ×𝑝 (𝑥)
2


Multiply each squared outcome by
Add all the squared outcomes up, and its probability, and then add
divide by how many there are in total them all up after

So really it is

3F
∑𝑥 ∑ 𝑥2 𝑋 𝐵 (𝑛, 𝑝)
𝑛 ¿ ∑ 𝑥 ×𝑝 (𝑥) 𝑛 ¿ ∑ 𝑥2 ×𝑝 (𝑥) 𝑋 𝑁 (𝜇, 𝜎)

The normal distribution


❑ ❑
𝜇=𝑛𝑝

∑ ∑𝑥
( )
2 2
You need to be able to approximate 2 𝑥 The second part of this is
𝜎 = − the mean squared
a binomial distribution 𝑛 𝑛
 Remember that we
already have an expression
Can you remember the formula for the 𝜎
2
=
∑ 𝑥
2
− ( 𝑛𝑝 )
2 for the mean from before!
variance of a data set from year 12? 𝑛

What we need to do now is to find a way to replace the


 So we need to focus on rewriting first part in terms of and , the number of trials and the
probability of success in the binomial distribution
this part…

∑ (𝑥 ×𝑝(𝑥))
2  We can replace the first part using the equivalent
calculation that we saw before…

𝜎
2
=
∑ 𝑥
2
− ( 𝑛𝑝 )
2

𝑛
Replace first term
using the above

𝜎 =∑ (𝑥 ×𝑝(𝑥))− ( 𝑛𝑝 )
2 2 2

3F
𝑋 𝐵 (𝑛, 𝑝)
𝑋 𝑁 (𝜇, 𝜎)

The normal distribution 𝜇=𝑛𝑝

You need to be able to approximate Imagine we are considering the number of heads when
a binomial distribution tossing a biased coin 5 times. Let , and

 Lets put together a table…


Can you remember the formula for the
variance of a data set from year 12?
𝑥 𝑝 ( 𝑥 ) 𝑥 2 × 𝑝 ( 𝑥)
 So we need to focus on rewriting
0 (50 ) 0.4 ( 0.6 ) 0 × (50 ) 0.4 ( 0.6 )
0 5 2 0 5

this part… 1 (51 ) 0.4 ( 0.6 ) 1 ×( 51 ) 0.4 ( 0.6 )


1 4 2 1 4

∑ (𝑥 ×𝑝(𝑥))
2 2
3
(52) 0.4 ( 0.6 ) 2 × (52 ) 0.4 ( 0.6 )
2 3 2

(53 ) 0.4 ( 0.6 ) 3 × (53 ) 0.4 ( 0.6 )


2 3

( () )
𝑛 3 2 2 3 2

∑ 𝑥2 × 𝑛 𝑝
Remember how
𝑥 to find
( 1 −𝑝 ) the
𝑛− 𝑥
¿
calculation for
𝑥 a an event that is
𝑥=0
binomially distributed? 4 ( 54 ) 0.4 ( 0.6 ) 4 × (54 ) 0.4 ( 0.6 )
4 1 2 4 1

5 (55 ) 0.4 ( 0.6 ) 5 × (55 ) 0.4 ( 0.6 )


5 0 2 5 0

(We
𝑃 ( 𝑋will
=𝑟
( )
be) =
from up
𝑛
summing
𝑝 the
𝑟
𝑟 to …)
values
(1 − 𝑝)
𝑛− 𝑟

2
𝑥 ×
(𝑛𝑥 ) 𝑝 𝑥
( 1 − 𝑝 )𝑛 −𝑥

So we are doing this calculation for each value of , and then adding them up…
3F
( )
𝑃 ( 𝑋 =𝑟 ) = 𝑛 𝑝 𝑟 ( 1 − 𝑝 )𝑛 − 𝑟
𝑟
𝑋 𝐵 (𝑛, 𝑝)
𝑋 𝑁 (𝜇, 𝜎)

The normal distribution 𝜇=𝑛𝑝

You need to be able to approximate You hopefully also remember the following from
a binomial distribution year 12 (Binomial expansion powerpoint)

Can you remember the formula for the ()


𝑛
𝑟
=
𝑛!
(𝑛 − 𝑟 ) ! 𝑟 !
variance of a data set from year 12?

 So now we need to rewrite this…

( () )
𝑛
¿ ∑ 𝑥2 × 𝑛 𝑝 𝑥 ( 1 −𝑝 )
𝑛− 𝑥

𝑥=0 𝑥 Replace the first


bracket using the
relationship shown…

( )
𝑛
𝑛!
¿ ∑ 𝑥2 ×
𝑛− 𝑥
𝑝 𝑥 ( 1 −𝑝 )
𝑥=0 ( 𝑛− 𝑥 ) ! 𝑥 !
Group
together

( )
𝑛 𝑥 𝑛− 𝑥
𝑛! 𝑝 ( 1 −𝑝 )
¿∑
2
𝑥 ×
𝑥=0 ( 𝑛 − 𝑥) ! 𝑥 !

3F
( )
𝑃 ( 𝑋 =𝑟 ) = 𝑛 𝑝 𝑟 ( 1 − 𝑝 )𝑛 − 𝑟
𝑟
𝑋 𝐵 (𝑛, 𝑝)
𝑋 𝑁 (𝜇, 𝜎)

The normal distribution 𝜇=𝑛𝑝

( )
𝑛 𝑥 𝑛− 𝑥
You need to be able to approximate 𝑛 ! 𝑝 ( 1 −𝑝 )
a binomial distribution
∑ 2
𝑥 ×
( 𝑛− 𝑥 ) ! 𝑥 !
The first term will be when
𝑥 =0
 This first term will therefore
be 0, due to the part, so we do
not need to include it in the
Can you remember the formula for the summation…
variance of a data set from year 12?
( )
𝑛
𝑛 ! 𝑝 𝑥 ( 1− 𝑝 )𝑛− 𝑥  Therefore, we can start the
∑ 2
𝑥 ×
(𝑛− 𝑥 )! 𝑥 ! summation from instead…
𝑥 =1

 So now we need to rewrite this…

( )
𝑛 𝑥 𝑛− 𝑥
𝑛! 𝑝 ( 1 −𝑝 )
¿∑
2
𝑥 × This term is
𝑥=0 ( 𝑛 − 𝑥) ! 𝑥 !
 We can ‘cancel’ one of the first terms with the from
this factorial…

 So this term would now become

( )
𝑛 𝑥 𝑛− 𝑥
𝑛 ! 𝑝 ( 1 −𝑝 )
¿∑ 𝑥×
𝑥=1 ( 𝑛 − 𝑥 ) ! ( 𝑥 − 1) !

3F
( )
𝑃 ( 𝑋 =𝑟 ) = 𝑛 𝑝 𝑟 ( 1 − 𝑝 )𝑛 − 𝑟
𝑟
𝑋 𝐵 (𝑛, 𝑝)
𝑋 𝑁 (𝜇, 𝜎)

The normal distribution 𝜇=𝑛𝑝

Remember that this is a

( )
𝑛 𝑥 𝑛− 𝑥
𝑛 ! 𝑝 ( 1 −𝑝 )
You need to be able to approximate ¿ ∑ 𝑥× summation of a number of
a binomial distribution 𝑥=1 ( 𝑛 − 𝑥 ) ! ( 𝑥 − 1) ! terms

 The number of terms ()


and the probability () are
Can you remember the formula for the a constant
variance of a data set from year 12?
( )
𝑛 𝑥 −1 𝑛− 𝑥
( 𝑛− 1) ! 𝑝 ( 1− 𝑝 )  This means we can
¿ 𝑛𝑝 ∑ 𝑥× factorise them outside of
𝑥=1 ( 𝑛− 𝑥 ) !( 𝑥 −1) ! the summation
 So now we need to rewrite this…
The term has been changed as in the previous example

( )
𝑛 𝑥 𝑛− 𝑥
𝑛! 𝑝 ( 1 −𝑝 )
¿∑
2
𝑥 ×
𝑥=0 ( 𝑛 − 𝑥) ! 𝑥 ! The term has been adjusted as well…

It is important to note that we cannot factorise the term


outside.

 This is because in the summation, for the first term ,


for the second term , and so on

 Since takes different values, we cannot factorise it


out as would then multiply everything by only a single
value
3F
( )
𝑃 ( 𝑋 =𝑟 ) = 𝑛 𝑝 𝑟 ( 1 − 𝑝 )𝑛 − 𝑟
𝑟
𝑋 𝐵 (𝑛, 𝑝)
𝑋 𝑁 (𝜇, 𝜎)

The normal distribution Let


𝜇=𝑛𝑝

So

( )
𝑛 𝑥 −1 𝑛 −𝑥
(𝑛 − 1) ! 𝑝 ( 1− 𝑝 )
You need to be able to approximate 𝑛𝑝 ∑ 𝑥×
(𝑛 − 𝑥 ) !( 𝑥 − 1) ! This will affect most
a binomial distribution
𝑥=1
terms

 We will now be
starting on instead of
Can you remember the formula for the
variance of a data set from year 12?  We have not changed
the number of terms
though, so the final
term will now be

( )
𝑛− 1 𝑦 𝑛− 𝑦 − 1
 So now we need to rewrite this… (𝑛 −1)! 𝑝 ( 1 −𝑝 )
𝑛𝑝 ∑ ( 𝑦 +1)×
( 𝑛 − 𝑦 −1 ) ! (𝑦 )!
( )
𝑛 𝑥 𝑛− 𝑥 𝑦=0
𝑛! 𝑝 ( 1 −𝑝 )
¿∑
2
𝑥 ×
𝑥=0 ( 𝑛 − 𝑥) ! 𝑥 !

3F
( )
𝑃 ( 𝑋 =𝑟 ) = 𝑛 𝑝 𝑟 ( 1 − 𝑝 )𝑛 − 𝑟
𝑟
𝑋 𝐵 (𝑛, 𝑝)
𝑋 𝑁 (𝜇, 𝜎)

The normal distribution 𝜇=𝑛𝑝

You need to be able to approximate


a binomial distribution

( )
𝑛− 1 𝑦 𝑛− 𝑦 − 1
(𝑛 −1)! 𝑝 ( 1 −𝑝 )
𝑛𝑝 ∑ ( 𝑦 +1)×
𝑦=0 ( 𝑛 − 𝑦 −1 ) ! (𝑦 )!
We can write this as two
separate summations (using

[ ]
𝑛−1 𝑦 𝑛− 𝑦 −1 𝑛− 1 𝑦 𝑛− 𝑦 −1 the
(𝑛− 1)!𝑝 (1 −𝑝 ) (𝑛−1)!𝑝 ( 1 −𝑝 )
¿ 𝑛𝑝 ∑ 𝑦 × +∑
𝑦= 0 ( 𝑛− 𝑦 −1 ) !( 𝑦)! 𝑦 =0 ( 𝑛− 𝑦 − 1 ) !(𝑦 )!

()𝑛
𝑟
=
𝑛!
(𝑛 − 𝑟 ) ! 𝑟 ! What if we let and ?
( 𝑛 −1
𝑦
=¿
) (𝑛 − 1) !
( 𝑛− 𝑦 −1 ) ! 𝑦 !

[ ]
𝑛−1 𝑛− 1

𝑦= 0 𝑦 ( )
¿ 𝑛𝑝 ∑ 𝑦 × 𝑛−1 𝑝 𝑦 ( 1 −𝑝 )
𝑛− 𝑦 − 1
+ ∑ 𝑛−1 𝑝 𝑦 ( 1 −𝑝 )
𝑦=0 𝑦 ( )
𝑛− 𝑦 − 1

3F
( )
𝑃 ( 𝑋 =𝑟 ) = 𝑛 𝑝 𝑟 ( 1 − 𝑝 )𝑛 − 𝑟
𝑟
𝑋 𝐵 (𝑛, 𝑝)
𝑋 𝑁 (𝜇, 𝜎)

The normal distribution 𝜇=𝑛𝑝


This term represents all possible outcomes in a
trial, so it will always equal 1 regardless of the
You need to be able to approximate values chosen for , and …
a binomial distribution

[∑ ]
𝑛−1 𝑛−1
𝑛𝑝
𝑦=0
( )
𝑦 × 𝑛−1 𝑝 𝑦 ( 1− 𝑝 )
𝑦
𝑛− 𝑦 − 1

𝑦 =0 𝑦( )
+ ∑ 𝑛−1 𝑝 𝑦 ( 1− 𝑝 )
𝑛− 𝑦 −1

Lets think about the second summation, using heads on the biased coin from before as an example

 Let the number of trials, , equal 3


 Let
Remember this is going to 2
be the sum of the terms
from to ∑( 𝑦 )
2 0.4 𝑦 0.6 2 − 𝑦
𝑦=0
When When When

( )
2
0
0
0.4 0.6
2
+¿( ) 2 0.4 1 0.61
1 +¿( )
2 0.4 2 0.60
2
What will these add up to?

 They will add up to 1 as they represent all possible outcomes (from 2 choose 0 + from 2
choose 1 + from 2 choose 2)
3F
( )
𝑃 ( 𝑋 =𝑟 ) = 𝑛 𝑝 𝑟 ( 1 − 𝑝 )𝑛 − 𝑟
𝑟
𝑋 𝐵 (𝑛, 𝑝)
𝑋 𝑁 (𝜇, 𝜎)

The normal distribution 𝜇=𝑛𝑝

You need to be able to approximate


a binomial distribution

[∑ ]
𝑛−1 𝑛−1
𝑛𝑝
𝑦=0
( )
𝑦 × 𝑛−1 𝑝 𝑦 ( 1− 𝑝 )
𝑦
𝑛− 𝑦 − 1

𝑦 =0
( )
+ ∑ 𝑛−1 𝑝 𝑦 ( 1− 𝑝 )
𝑦
𝑛− 𝑦 −1

If is the number of heads on a coin, this

[ ]
𝑛−1 part represents the probability of each
𝑛𝑝 ∑ 𝑦 × (𝑛−1
𝑦 )
𝑝 𝑦
( 1− 𝑝 )
𝑛− 𝑦 − 1
+1 value of happening
𝑦=0
 ie) We can rewrite this as

[∑ ]
𝑛−1  The only difference is that the number
𝑛𝑝 𝑦 ×𝑝 ( 𝑦 ) +1 of terms is , which is accounted for in
𝑦=0 the summation term

𝑃 ( 𝑋 =𝑟 ) =
( 𝑛𝑟) 𝑝 𝑟
( 1 − 𝑝 )𝑛 − 𝑟

3F
( )
𝑃 ( 𝑋 =𝑟 ) = 𝑛 𝑝 𝑟 ( 1 − 𝑝 )𝑛 − 𝑟
𝑟
𝑋 𝐵 (𝑛, 𝑝)
𝑋 𝑁 (𝜇, 𝜎)

The normal distribution 𝜇=𝑛𝑝

You need to be able to approximate


a binomial distribution

[∑ ]
𝑛−1
This term is saying to multiply all the possible values of by
𝑛𝑝 𝑦 ×𝑝 ( 𝑦 ) +1 their probabilities, and then add the answers up
𝑦=0
 What does that give us?

 It gives us the expected value of

 And how do we calculate the expected value if we know the


number of trials and the probability of success?
¿ 𝑛𝑝 [ ( 𝑛 − 1 ) 𝑝+1 ]
 Multiply the number of trials by the probability of success!

In this case, the probability of success is , and


the number of trials is

So the expected value of is

3F
( )
𝑃 ( 𝑋 =𝑟 ) = 𝑛 𝑝 𝑟 ( 1 − 𝑝 )𝑛 − 𝑟
𝑟
𝑋 𝐵 (𝑛, 𝑝)
𝑋 𝑁 (𝜇, 𝜎)

The normal distribution 𝜎 = √𝑛𝑝(1 −𝑝 ) 𝜇=𝑛𝑝


( )
∑𝑥
2 2
You need to be able to approximate 2 𝑥
𝜎 = −
a binomial distribution 𝑛 𝑛
Earlier, we replaced
the mean with

If you remember (quite a while back), 𝜎


2
=
∑ 𝑥
2
− ( 𝑛𝑝 )
2
Now we can replace the
we were finding an expression for the 𝑛
expected value of with
variance the expression we
found…
𝜎 =𝑛𝑝 [ ( 𝑛 − 1 ) 𝑝+ 1 ] − ( 𝑛𝑝 )
2 2

Multiply the
The expression we just found is that: inner bracket
𝜎 2=𝑛𝑝 [ 𝑛𝑝 − 𝑝+ 1 ] − ( 𝑛𝑝 )2
Multiply the
𝑛𝑝 [ ( 𝑛− 1 ) 𝑝 +1 ] square bracket
𝜎 2=( 𝑛𝑝 )2 − 𝑛 𝑝 2 +𝑛𝑝 − ( 𝑛𝑝 )2
Simplify
 A reminder that we were finding 2
𝜎 =𝑛 𝑝 −𝑛𝑝
2

this so we could use it to replace the Factorise


expected value of , which we can now 2
𝜎 =𝑛 𝑝 (1 − 𝑝 )
do!
Square root
𝜎 = √ 𝑛𝑝(1 −𝑝 )

3F
𝑋 𝐵 (𝑛, 𝑝)
𝑋 𝑁 (𝜇, 𝜎)

The normal distribution 𝜎 = √𝑛𝑝(1 −𝑝 ) 𝜇=𝑛𝑝

You need to be able to approximate 𝜇=𝑛𝑝


a binomial distribution Sub in values
𝜇=(100)(0.53)
Calculate
A biased coin has . The coin is tossed 𝜇=53
100 times and the number of heads, X,
is recorded.
𝜎 = √ 𝑛𝑝(1 −𝑝 )
a) Write down a binomial model for Sub in values
𝑋 𝐵 (100 , 0.53) 𝜎 = √ (100)(0.53)( 1− 0.53)
Calculate
b) Explain why can be approximated 𝜎 = 4.99
using a normal distribution
Since is large and is close to 0.5
c) Find the values of and in this
approximation

3F
𝑋 𝐵 (𝑛, 𝑝)
𝑋 𝑁 (𝜇, 𝜎)

The normal distribution 𝜎 = √𝑛𝑝(1 −𝑝 ) 𝜇=𝑛𝑝

You need to be able to approximate  Something to be careful is that the normal


a binomial distribution distribution is continuous and the binomial
distribution is discrete

The binomial random variable is  You need to apply a ‘continuity correction’


approximated by the normal random
variable . Since is the continuous distribution, any
𝑃 ( 𝑋 ≤70) values up to 70.5 would round to 70 when
converted to the binomial distribution

a) Use this approximation to find So you need to go up to 70.5 when


¿ 𝑃 (𝑌 <70 .5) calculating…

¿ 0 .4032 Use your calculator with , , lower


limit of -100 and upper limit of
70.5
¿ 0 .4032

3F
𝑋 𝐵 (𝑛, 𝑝)
𝑋 𝑁 (𝜇, 𝜎)

The normal distribution 𝜎 = √𝑛𝑝(1 −𝑝 ) 𝜇=𝑛𝑝

You need to be able to approximate


a binomial distribution 𝑃 (80 ≤ 𝑋 < 90) Apply the continuity correction

 Be careful! The upper limit should


not include 90. As such, in the
The binomial random variable is continuous distribution we should not
approximated by the normal random ¿ 𝑃 (79.5 ≤ 𝑌 <89.5) go higher than 89.5

variable . Use your calculator with a lower


limit of 79.5, upper limit of 89.5,
and
a) Use this approximation to find ¿ 0 .1081

¿ 0 .4032
b) Also use the approximation to find

3F
𝑋 𝐵 (𝑛, 𝑝)
𝑋 𝑁 (𝜇, 𝜎)

The normal distribution 𝜎 = √𝑛𝑝(1 −𝑝 ) 𝜇=𝑛𝑝

You need to be able to approximate


a binomial distribution

( )
𝑃 ( 𝑌 =𝑟 ) =
𝑛
𝑟
𝑟
𝑝 (1 −𝑝 )
𝑛− 𝑟
Sub in values, with
the number of yellow
For a particular type of flower bulb, bulbs being 30
55% will produce yellow flowers. A
random sample of 80 bulbs is planted. ( )
𝑃 ( 𝑌 =50 ) =
80
50
50
0.55 ( 0.45 )
30

Calculate

𝑃 ( 𝑌 =50 ) =0.0365
Calculate the percentage error
incurred when using a normal
approximation to estimate the
probability that there are exactly 50
yellow flowers.
𝑃 ( 𝑌 =50 ) =0.0365
 First, calculate the actual
probability using the binomial
distribution

3F
𝑋 𝐵 (𝑛, 𝑝)
𝑋 𝑁 (𝜇, 𝜎)

The normal distribution 𝜎 = √𝑛𝑝(1 −𝑝 ) 𝜇=𝑛𝑝

You need to be able to approximate 𝜇=𝑛𝑝


a binomial distribution Sub in values
𝜇=(80)( 0.55)
Calculate
For a particular type of flower bulb, 𝜇=44
55% will produce yellow flowers. A
random sample of 80 bulbs is planted.
𝜎 = √ 𝑛𝑝(1 −𝑝 )
Calculate the percentage error Sub in values
incurred when using a normal 𝜎 = √ (80)( 0.55)(1− 0.55)
approximation to estimate the Calculate
probability that there are exactly 50 𝜎 =4.45 …
yellow flowers.
𝑃 ( 𝑌 =50 ) =0.0365
 Now convert the problem to a
normal distribution…
𝜇=44 𝜎 =4.45 …

3F
𝑋 𝐵 (𝑛, 𝑝)
𝑋 𝑁 (𝜇, 𝜎)

The normal distribution 𝜎 = √𝑛𝑝(1 −𝑝 ) 𝜇=𝑛𝑝

You need to be able to approximate


𝑌 𝐵 (80 , 0.55)
a binomial distribution Convert using the values
we calculated
2
𝑋 𝑁 ( 44 , 4.45 )
For a particular type of flower bulb,
55% will produce yellow flowers. A
random sample of 80 bulbs is planted. 𝑃 (𝑌 =50) Apply continuity corrections and
change to a normal distribution
calculation
Calculate the percentage error ¿ 𝑃 ( 49.5< 𝑋 <50.5)
Use your calculator with a lower
incurred when using a normal limit of 49.5, upper limit of 50.5,
approximation to estimate the and
probability that there are exactly 50
¿ 0.0362
yellow flowers.
𝑃 ( 𝑌 =50 ) =0.0365
 Now convert the problem to a
normal distribution…
𝜇=44 𝜎 =4.45 …

3F
𝑋 𝐵 (𝑛, 𝑝)
𝑋 𝑁 (𝜇, 𝜎)

The normal distribution 𝜎 = √𝑛𝑝(1 −𝑝 ) 𝜇=𝑛𝑝

You need to be able to approximate


a binomial distribution
𝑃 ( 𝑌 =50 ) =0.0365 𝑃 ( 49.5< 𝑋< 50.5 )=0.0362
Divide the difference by the correct value…
For a particular type of flower bulb,
55% will produce yellow flowers. A
random sample of 80 bulbs is planted. 0.0003
% 𝑒𝑟𝑟𝑜𝑟 = ×100
0.0365

Calculate the percentage error ¿ 0.82 %


incurred when using a normal
approximation to estimate the
probability that there are exactly 50
yellow flowers.
𝑃 ( 𝑌 =50 ) =0.0365
 Now convert the problem to a
normal distribution…
𝜇=44 𝜎 =4.45 …

3F
Teachings for
Exercise 3G
The normal distribution
You need to be able to do hypothesis
testing using the normal distribution

Imagine we had a set of data as follows:

1 2 3 4 4 5 5 6
6 6 7 7 8 8 9 10 11

For this data, the information is as In this case, the data set is so small that we can
follows: include all the data

 In more practical situations, we would most


likely take a sample of the population

 We need to consider how this would affect our


use of the mean and variance

3G
The normal distribution
 Imagine we took samples from this data, all of
You need to be able to do hypothesis
testing using the normal distribution size 2 (ie – choosing 2 values at random)

 For example:
Imagine we had a set of data as follows:
Value 1 Value 2 Mean
4 5 4.5
1 2 3 4 4 5 5 6
3 9 6
6 6 7 7 8 8 9 10 11
6 5 5.5
2
𝑋 𝑁 (6 ,2.65 ) 2 8 5
7 10 8.5
Notation for the population 6 8 7
Notation for this sample  Let’s now calculate the mean and variance of this
data set…

2
Sample mean = 6.08
𝑋 𝑁 (6.08 , 1.33 ) Variance (of the sample means) = 1.78
Standard Deviation (of the sample means) = 1.33
3G
The normal distribution
 Imagine we took samples from this data, all of
You need to be able to do hypothesis
testing using the normal distribution size 2 (ie – choosing 2 values at random)

 For example:
Imagine we had a set of data as follows:
Value 1 Value 2 Mean
4 5 4.5
1 2 3 4 4 5 5 6
3 9 6
6 6 7 7 8 8 9 10 11
6 5 5.5
2
𝑋 𝑁 (6 ,2.65 ) 2 8 5
7 10 8.5
Notation for the population 6 8 7
Notation for this sample
 If we take enough samples, then because the data
is normally distributed, the sample mean will
2 average out to be the same as the population
𝑋 𝑁 (6.08 , 1.33 ) mean

 How is the variance affected though?


3G
The normal distribution
Value 1 Value 2 Mean
You need to be able to do hypothesis
testing using the normal distribution 4 5 4.5
3 9 6
Imagine we had a set of data as follows: 6 5 5.5
2 8 5
1 2 3 4 4 5 5 6 7 10 8.5
6 6 7 7 8 8 9 10 11 6 8 7

2  When choosing only 2 values for our sample, consider


𝑋 𝑁 (6 ,2.65 ) the range of possible means
2
𝑋 𝑁 (6.08 , 1.33 )  The smallest possible mean would be 1.5 (if 1 and 2
were selected), and the highest would be 10.5, if 10
Population range: 1-11 and 11 were selected)

Sample size 2 range:  If we took a sample size 10, the smallest possible
mean would be 4.2, and the largest would be 7.8
1.5-10.5

Sample size 10 range: 4.2-7.8


3G
The normal distribution
Value 1 Value 2 Mean
You need to be able to do hypothesis
testing using the normal distribution 4 5 4.5
3 9 6
Imagine we had a set of data as follows: 6 5 5.5
2 8 5
1 2 3 4 4 5 5 6 7 10 8.5
6 6 7 7 8 8 9 10 11 6 8 7

2  The key idea to take from this is that with a larger


𝑋 𝑁 (6 ,2.65 ) sample size, the range of possible values is lower
2
𝑋 𝑁 (6.08 , 1.33 )  Hence, the variance is also going to be lower as
there is a smaller range for the values to exist in
Population range: 1-11
 When dealing with a sample, the variance is
Sample size 2 range: estimated by dividing the population variance by the
size of the sample
1.5-10.5
 That way, a larger sample will have a smaller
Sample size 10 range: 4.2-7.8 estimated value for the variance…
3G
The normal distribution
You need to be able to do hypothesis
testing using the normal distribution

When hypothesis testing with a


population, you use values as follows:

2
𝑋 𝑁 (𝜇 , 𝜎 )

With a sample, we use the following


relationship instead:

𝑋 𝑁 𝜇, (
𝜎2
𝑛 )
We divide the population
variance by the sample size,

 Remember that this


represents the variance of the
We use the
sample means!
population mean

3G
2
𝑋 𝑁 (𝜇 , 𝜎 )

( ) The normal distribution


2
𝜎
𝑋 𝑁 𝜇,
𝑛

1) Write your Hypotheses


You need to be able to do hypothesis
testing using the normal distribution 𝐻 0 :𝜇=60 𝐻 1 : 𝜇<60
 Remember that the  The alternative hypothesis is
A company sells fruit juice in cartons. The
null hypothesis is the where we consider the
amount of juice in a carton has a normal default position ie) statement in the question. In
distribution with a standard deviation of 3ml. Nothing has changed this case, the claim is that the
mean is lower than the company
The company claims that the mean amount of is saying
juice per carton, , is 60ml. A trading
inspector has received complaints that the 2) Convert the population distribution to the sample
company is overstating the amount of juice distribution
per carton and wishes to investigate this Population Sample
complaint. distribution distribution

The inspector takes a random sample of 16


cartons and finds that the mean amount of
𝑋 𝑁 (60 , 3 )
2
(
𝑋 𝑁 60 ,
32
16 )
juice per carton is 59.1ml.
3) Now, using the sample distribution, we need to find
Using a 5% significance level, and stating the probability that a random sample of 16 gives a
your hypotheses clearly, test whether or not mean equal to or lower than 59.1…
there is sufficient evidence to uphold the
complaints.
3G
2
𝑋 𝑁 (𝜇 , 𝜎 )

( ) The normal distribution


2
𝜎
𝑋 𝑁 𝜇,
𝑛

𝐻 0 :𝜇=60
You need to be able to do hypothesis
testing using the normal distribution 𝐻 1 : 𝜇<60
(
𝑋 𝑁 60 ,
32
16 )
A company sells fruit juice in cartons. The 3) Now, using the sample distribution, we need to find
amount of juice in a carton has a normal the probability that a random sample of 16 gives a
distribution with a standard deviation of 3ml. mean equal to or lower than 59.1…

The company claims that the mean amount of


juice per carton, , is 60ml. A trading (
𝑋 𝑁 60 ,
32
16 )
inspector has received complaints that the
company is overstating the amount of juice 𝜇=60
32

per carton and wishes to investigate this
complaint. Area 𝜎=
=? 16
The inspector takes a random sample of 16
cartons and finds that the mean amount of 59.1 60
juice per carton is 59.1ml.
Use your calculator as you have seen in prior sections
to calculate the probability of being below 59.1
Using a 5% significance level, and stating
your hypotheses clearly, test whether or not 𝑃 ( 𝑋 < 59.1 )=0.1151
there is sufficient evidence to uphold the
complaints.
¿ 11.51 %
3G
2
𝑋 𝑁 (𝜇 , 𝜎 )

( ) The normal distribution


2
𝜎
𝑋 𝑁 𝜇,
𝑛

𝐻 0 :𝜇=60
You need to be able to do hypothesis
testing using the normal distribution 𝐻 1 : 𝜇<60
(
𝑋 𝑁 60 ,
32
16 )
A company sells fruit juice in cartons. The
amount of juice in a carton has a normal
distribution with a standard deviation of 3ml.
𝑃 ( 𝑋 < 59.1 )=0.1151
¿ 11.51 %
The company claims that the mean amount of
juice per carton, , is 60ml. A trading
 Although getting in this range is unlikely , it is not
inspector has received complaints that the
below 5%
company is overstating the amount of juice
per carton and wishes to investigate this
 Therefore, it is not unlikely enough for us to
complaint.
reject the null hypothesis that the mean is
actually 60ml
The inspector takes a random sample of 16
cartons and finds that the mean amount of  So we accept the null hypothesis
juice per carton is 59.1ml.

Using a 5% significance level, and stating


your hypotheses clearly, test whether or not
there is sufficient evidence to uphold the
complaints.
3G
2
𝑋 𝑁 (𝜇 , 𝜎 )

( ) The normal distribution


2
𝜎
𝑋 𝑁 𝜇,
𝑛

You need to be able to do hypothesis


testing using the normal distribution

It is also possible to use the


standardised normal distribution when
using a sample

Population Sample
Variance
Variance

( )
2
𝑋 𝑁 (𝜇 , 𝜎 )
2 𝜎
𝑋 𝑁 𝜇,
𝑛
Converting values to Converting values to
the standard normal Square the standard normal Square
distribution root the distribution root the
variance variance

𝑋 −𝜇 𝑋 −𝜇
𝑍= 𝑍=
𝜎 𝜎
Standard √𝑛
deviation Standard
deviation
3G
𝑋 𝑁 (𝜇 , 𝜎 )
2
𝑋 −𝜇
𝑍=
𝜎

( ) The normal distribution


2
𝜎 √𝑛
𝑋 𝑁 𝜇,
𝑛

1) Write your Hypotheses


You need to be able to do hypothesis
testing using the normal distribution 𝐻 0 :𝜇=0.580 𝐻 1 :𝜇 ≠ 0.580
A machine produces bolts of diameter where  Remember that the  The alternative hypothesis is
null hypothesis is the where we consider the
has a normal distribution with mean 0.580cm default position ie) statement in the question. In
and standard deviation 0.015cm. Nothing has changed this case, the claim is that the
mean has changed (could be
This machine is serviced and after the service higher or lower)
a random sample of 50 bolts from the next
production is taken to see if the mean 2) Convert the population distribution to the sample
diameter of the bolts has changed from distribution (using for diameter)
0.580cm. The distribution of the bolts after Population Sample
the service is still normal with a standard distribution distribution
deviation of 0.015cm.

a) Find, at the 1% level, the critical region


𝐷 𝑁 ( 0.580 , 0.015 )
2
(
𝐷 𝑁 0.580 ,
0.0152
50 )
for this test, stating your hypotheses
clearly
3) Now, using the sample distribution, we need to find
 Remember that the critical region is the the regions that have a probability of 1% or less (ie
region where the null hypothesis would be have an total area of 0.01)
rejected…

3G
𝑋 𝑁 (𝜇 , 𝜎 )
2
𝑋 −𝜇
𝑍=
𝜎

( ) The normal distribution


2
𝜎 √𝑛
𝑋 𝑁 𝜇,
𝑛

You need to be able to do hypothesis


𝐻 0 :𝜇=0.580𝐻 1 :𝜇 ≠ 0.580 (
𝐷 𝑁 0.580 ,
0.0152
50 )
testing using the normal distribution 3) Now, using the sample distribution, we need to find
the regions that have a probability of 1% or less (ie
A machine produces bolts of diameter where have an total area of 0.01)
has a normal distribution with mean 0.580cm
and standard deviation 0.015cm. Specific
distribution
This machine is serviced and after the service
a random sample of 50 bolts from the next
production is taken to see if the mean
diameter of the bolts has changed from Area = Area =
0.580cm. The distribution of the bolts after 0.005 0.005
the service is still normal with a standard
deviation of 0.015cm. ? 0.580 ?
Standard
a) Find, at the 1% level, the critical region distribution Using the standard table of
results given in the booklet, we
for this test, stating your hypotheses
can find that the value with an
clearly area 0.005 above it is 2.5758

 Remember that the critical region is the Area = Area =  Due to symmetry, the lower
region where the null hypothesis would be 0.005 0.005 value will be the negative of
rejected… this…
? 0 2.5758
−2.5758 ?
3G
𝑋 𝑁 (𝜇 , 𝜎 )
2
𝑋 −𝜇
𝑍=
𝜎

( ) The normal distribution


2
𝜎 √𝑛
𝑋 𝑁 𝜇,
𝑛

You need to be able to do hypothesis


𝐻 0 :𝜇=0.580𝐻 1 :𝜇 ≠ 0.580 (
𝐷 𝑁 0.580 ,
0.0152
50 )
testing using the normal distribution 3) Now, using the sample distribution, we need to find
the regions that have a probability of 1% or less (ie
A machine produces bolts of diameter where have an total area of 0.01)
has a normal distribution with mean 0.580cm
and standard deviation 0.015cm. Specific Now use the formula above…
distribution
𝐷 −𝜇
This machine is serviced and after the service 𝑍=
𝜎
a random sample of 50 bolts from the next √𝑛 Sub in
production is taken to see if the mean values
diameter of the bolts has changed from Area = Area =
𝐷 −0.580
0.580cm. The distribution of the bolts after 0.005 0.005 2.5758=
0.015
the service is still normal with a standard
deviation of 0.015cm. ? 0.580 0.5854
? √ 50 Work
out
Standard
a) Find, at the 1% level, the critical region distribution 𝐷=0.5854
for this test, stating your hypotheses
clearly

 Remember that the critical region is the Area = Area =


region where the null hypothesis would be 0.005 0.005
rejected…
−2.5758 0 2.5758
3G
𝑋 𝑁 (𝜇 , 𝜎 )
2
𝑋 −𝜇
𝑍=
𝜎

( ) The normal distribution


2
𝜎 √𝑛
𝑋 𝑁 𝜇,
𝑛

You need to be able to do hypothesis


𝐻 0 :𝜇=0.580𝐻 1 :𝜇 ≠ 0.580 (
𝐷 𝑁 0.580 ,
0.0152
50 )
testing using the normal distribution 3) Now, using the sample distribution, we need to find
the regions that have a probability of 1% or less (ie
A machine produces bolts of diameter where have an total area of 0.01)
has a normal distribution with mean 0.580cm
and standard deviation 0.015cm. Specific Now use the formula above for
distribution the lower value…
This machine is serviced and after the service 𝐷 −𝜇
𝑍=
a random sample of 50 bolts from the next 𝜎
production is taken to see if the mean √𝑛 Sub in
diameter of the bolts has changed from Area = Area = values
0.580cm. The distribution of the bolts after 0.005 0.005
𝐷 − 0.580
the service is still normal with a standard −2.5758=
0.015
deviation of 0.015cm. ? 0.580 0.5854
0.5745
√50 Work
Standard
out
a) Find, at the 1% level, the critical region distribution
for this test, stating your hypotheses 𝐷=0.5745
clearly

 Remember that the critical region is the Area = Area =


region where the null hypothesis would be 0.005 0.005
rejected…
−2.5758 0 2.5758
3G
𝑋 𝑁 (𝜇 , 𝜎 )
2
𝑋 −𝜇
𝑍=
𝜎

( ) The normal distribution


2
𝜎 √𝑛
𝑋 𝑁 𝜇,
𝑛

You need to be able to do hypothesis


𝐻 0 :𝜇=0.580𝐻 1 :𝜇 ≠ 0.580 (
𝐷 𝑁 0.580 ,
0.0152
50 )
testing using the normal distribution 3) Now, using the sample distribution, we need to find
the regions that have a probability of 1% or less (ie
A machine produces bolts of diameter where have an total area of 0.01)
has a normal distribution with mean 0.580cm
and standard deviation 0.015cm. Specific So the critical region
distribution
(to 3sf) will be where:
This machine is serviced and after the service
a random sample of 50 bolts from the next
or
production is taken to see if the mean
diameter of the bolts has changed from Area = Area =
0.580cm. The distribution of the bolts after 0.005 0.005
the service is still normal with a standard
deviation of 0.015cm. 0.5745 0.580 0.5854
Standard
a) Find, at the 1% level, the critical region distribution
for this test, stating your hypotheses
clearly

 Remember that the critical region is the Area = Area =


region where the null hypothesis would be 0.005 0.005
rejected…
−2.5758 0 2.5758
3G
𝑋 𝑁 (𝜇 , 𝜎 )
2
𝑋 −𝜇
𝑍=
𝜎

( ) The normal distribution


2
𝜎 √𝑛
𝑋 𝑁 𝜇,
𝑛

𝐻 0 :𝜇=0.580 𝐻 1 :𝜇 ≠ 0.580
You need to be able to do hypothesis
testing using the normal distribution
Since this mean is inside the critical region,
A machine produces bolts of diameter where
there is sufficient evidence to reject the null
has a normal distribution with mean 0.580cm
hypothesis
and standard deviation 0.015cm.
 It seems that the mean diameter has
changed…
This machine is serviced and after the service
a random sample of 50 bolts from the next
production is taken to see if the mean
diameter of the bolts has changed from
0.580cm. The distribution of the bolts after
the service is still normal with a standard
deviation of 0.015cm.

a) Find, at the 1% level, the critical region


for this test, stating your hypotheses
clearly

or
b) The mean diameter of a sample of 50
bolts is found to be 0.587mm. Comment
on this in light of the critical region
3G

You might also like