You are on page 1of 40

Measures of Dispersion

Dispersion
• The measure of the spread or variability

• No Variability – No Dispersion
Measures of Variation
• There are 3 values that we will look at to
measure the amount of dispersion or variation.
(The spread of the group)

1. Range
2. Standard Deviation
3. Quartile deviation
Why is it Important?
• You want to choose the best brand of
medicine for your patients. You are
interested in how long the drugs takes to
cure a disease. The choices are narrowed
down to 2 different drugs. The results are
shown in the chart. Which drug would
you choose?
Drug A Drug B
The chart indicates 10 35
the number of days 60 45
a drug takes to 50 30
cure a particular 30 35

disease. 40 40
20 25
210 210
Does the Average Help?
• Drug A: Avg = 210/6 = 35 days

• Drug B: Avg = 210/6 = 35 days

• They both last 35 days to cure a disease. No


help in deciding which to buy.
Consider the Spread
• Drug A: Spread = 60 – 10 = 50 days

• Drug B: Spread = 45 – 25 = 20 days

• Drug B has a smaller variability which means that it


performs more consistently. Choose drug B.
Range
• The range is the difference between the lowest
value in the set and the highest value in the set.

• Range = High # - Low #


Example
• Find the range of the data set.

• 40, 30, 15, 2, 100, 37, 24, 99

• Range = 100 – 2 = 98
Deviation from the Mean
• A deviation from the mean, x – x , is the difference
between the value of x and the mean x
• We base our formulas for variance and standard
deviation on the amount that they deviate from the
mean.
• The mean deviation of a set of observations
𝑥1 , 𝑥2 , ⋯ , 𝑥𝑁 is the mean of the absolute deviations
from the mean and equals
1 𝑁
σ𝑖=1 |𝑥𝑖 − 𝑥|ҧ
N
Formulae for sample and population
variances
Computation formulae Definition formulae
( x) 2
x − 2
σ𝑛
(𝑥
𝑖=1 𝑖 − 𝑥)
lj 2
n 2
𝑠 =
s2 = 𝑛−1
n −1

( xi ) 2 N
 −  (x − )
2 2
x
 =
2 N i
N 2 = i =1

N
Standard Deviation
• The standard deviation is the square root of the
variance.

s = s 2
Example – Using Formula
• Find the variance of the following
dataset 6, 3, 8, 5, 3 (in hours)
x x 2

6 36
3 9
8 64
5 25
3 9
 x = 25  x = 143
2
Example – Using Formula

( x) 2
x 2

s2 = n
n −1

252
143 −
5 143 − 125 18
s =
2
= = = 4.5
4 4 4
Find the standard deviation
• The standard deviation is the positive square
root of the variance.

s = 4.5 = 2.12
Example: Mean, variance and standard deviation of data

Example 4.1 in Clarke and Cooke (1998), 4th ed.

• In a city there are six professional football clubs. Last season they had
25, 30, 18, 27, 28 and 22 players respectively on their full-time paid
staffs. Find the mean, variance and standard deviation of the number
of full-time paid staffs.

• Let us call the number of full-time paid staff r. It is easier to layout the
calculation in form of a table
Example: Mean, variance and standard deviation of data
𝟐
Club 𝒓𝒊 𝒓𝒊 − 𝒓ത 𝒓𝒊 − 𝒓ത 𝒓𝟐𝒊
A 25 0 0 625
B 30 5 25 900
C 18 -7 49 324
D 27 2 4 729
E 28 3 9 784
F 22 -3 9 484
6|150 6|96 3846
Mean 𝑟ҧ = 25 Variance = 16

Hence the standard deviation is 4


Variance and standard deviation of grouped
frequency distribution
Let the table of values of discrete variable be:

Variable values 𝑟1 𝑟2 𝑟3 ⋯ 𝑟𝑘 Total


Frequency 𝑓1 𝑓2 𝑓3 ⋯ 𝑓𝑘 𝑘
෍ 𝑓𝑖 = 𝑁
𝑖=1

The total of all observations is equal to σ𝑘𝑖=1 𝑓𝑖 𝑟𝑖


Variance and standard deviation of grouped
frequency distribution
The variance of a set of N observations of a discrete variable, group so
that the values 𝑟𝑖 𝑖 = 1, 2, ⋯ , 𝑘 occurs with frequency 𝑓𝑖 , is

2
1 𝑘 1 σ𝑘
𝑖=1 𝑓𝑖 𝑟𝑖
σ 𝑓 𝑟𝑖 − 𝑟ҧ 2 or σ𝑘𝑖=1 𝑓𝑖 𝑟𝑖2 −
𝑁 𝑖=1 𝑖 𝑁 𝑁

The expression in the large brackets may be stated in words as “the


sum of the squares of all the observations minus the total squared and
divided by N”.
Example: Mean, variance and standard deviation of grouped
discrete frequency distribution
Number of people Number of days, 𝒇𝒊 𝒇𝒊 𝒓𝒊 𝒇𝒊 𝒓𝟐𝒊
absent, 𝒓𝒊
0 44 0 0
1 19 19 19
2 10 20 40
3 8 24 72
4 7 28 112
5 3 15 75
6 or more 0 0 0
𝑘
෍ 𝑓𝑖 = 𝑁 = 91 105 318
𝑖=1
Variance and standard deviation of grouped
frequency distribution
The mean is
σ 𝑓𝑖 𝑟𝑖 106
= = 1.165
𝑁 91
The variance is
𝑘 𝑘 2
1 σ𝑖=1 𝑓𝑖 𝑟𝑖 1 1062
෍ 𝑓𝑖 𝑟𝑖2 − = 318 −
𝑁 𝑁 91 91
𝑖=1

194.53
= = 2.13377
91
The standard deviation is 2.13377 = 1.46
Variance and standard deviation of grouped
observations of a continuous variable
The variance of a set of N observations of a continuous variable, in
which 𝑓𝑖 observations fall in the interval whose centre is 𝑥𝑖 (𝑖 =
1, 2, ⋯ , 𝑘 ), is
2
1 𝑘 1 σ𝑘
𝑖=1 𝑓𝑖 𝑥𝑖
σ𝑖=1 𝑓𝑖 𝑥𝑖 − 𝑥ҧ 2 or σ𝑘𝑖=1 𝑓𝑖 𝑥𝑖2 −
𝑁 𝑁 𝑁
The semi-inter-quartile range
(or quartile deviation)
The semi-inter-quartile range (or quartile deviation)
• The variance, the standard deviation and the mean deviation go
naturally with the mean.

• They are based on deviations from the mean, and the averaging
process is the same as that for calculating the mean.

• We are going to look at the measure of variability that is based on the


rank order of a set of observations, and therefore related to the
median.

• The range, which we briefly mention in earlier lessons is also based


on the rank order of a set of observations.
Example
• An airline flies a daily service between two cities, using a 100-seat
aircraft for the flight. The number of seats sold on the first fifteen
days of September are 87, 67,98, 57, 74, 100, 83, 60, 99, 88, 54, 72,
78, 75, 93 in the date order.
• Let us arrange these observations in rank order:

Rank order 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Number of seats sold 54 57 60 67 72 74 75 78 83 87 88 93 98 99 100

Quartiles . . . Q1 . . . M . . . Q3
Example
Rank order 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Number of seats sold 54 57 60 67 72 74 75 78 83 87 88 93 98 99 100

Quartiles . . . Q1 . . . M . . . Q3

Median divides the


rank order into two
First or lower quartile divides equal parts. Third or upper quartile divides
the lower half of the rank the upper half of the rank
order into two equal parts. order into two equal parts.
Example
• We could (though we hardly do!) call the median M the second
quartile and label it Q2.
• The point is that Q1, M, Q3 divide the distribution of ranked
observations into quarters, and this is the reason for them quartiles.
Definitions
Quartiles: The quartiles of a set of observations are the values below
which fall 25%, 50% and 75% of the observations as arranged in rank
order.
These are called respectively the first, second and third quartiles and
are denoted by 𝑄1 , 𝑄2 or 𝑀, and 𝑄3 .
• The point is that Q1, M, Q3 divide the distribution of ranked
observations into quarters, and this is the reason for calling them
quartiles.
Definitions
Inter-quartile range (mid-spread): The inter-quartile range also known
as mid-spread is the difference between the upper and lower quartiles.
In our examples this is 𝑄3 − 𝑄1 = 93 − 67 = 26.

Semi-inter-quartile range (mid-spread): The semi-inter-quartile range


also known as the quartile deviation of a set of observations equals
1
(𝑄3 −𝑄1 ).
2
1 1
In our example above, this is (𝑄3 −𝑄1 ) = × 26 = 13.
2 2
Exercise (Clarke & Cooke, 4th Ed.) 4.6.1
In the first fifteen days of December in the same year, on the same air
service as described above, the numbers of seats sold were 38, 75, 50,
84, 62, 46, 96, 67, 33, 55, 65, 42, 83, 70, 49 in date order. Find the
median, quartiles and semi-inter-quartile range for these observations.
Plot September and December observations on a sheet of graph-paper,
mark in the measures you have calculated, and compare them. Work
out also the range, mean deviation and standard deviation for each
month’s data.
Exercise (Clarke & Cooke, 4th Ed.) 4.6.1
In the first fifteen days of December in the same year, on the same air
service as described above, the numbers of seats sold were 38, 75, 50,
84, 62, 46, 96, 67, 33, 55, 65, 42, 83, 70, 49 in date order. Find the
median, quartiles and semi-inter-quartile range for these observations.
Plot September and December observations on a sheet of graph-paper,
mark in the measures you have calculated, and compare them. Work
out also the range, mean deviation and standard deviation for each
month’s data.
Box-and-whisker plot
• J.W. Tukey introduced the box-and-whisker plot as well as the stem-
and-leaf plot seen before.

Example: Consider the “Airline’s seats sold” data.


• On a linear scale (see figure on next slide), make vertical marks
corresponding to the minimum (54), the first quartile (67), the
median (78), the third quartile (93) and the maximum (100).
• The lines for the first and third quartile are then joined to make a box
corresponding to the central 50% of the observations, and whiskers
are taken out from the box to the maximum and minimum values.
• This is often a convenient way of representing a distribution and is
particularly useful for comparing distributions.
Box-and-whisker plot

Figure: Box-and-whisker plot for the “Airline’s seats sold” data


Quantiles
• If we have very large mass of data, we might want to divide it into
more than four parts to make a summary of it.
• It is often helpful to draw a cumulative frequency curve, that is to plot
the cumulative frequency in the vertical direction against increasing
values of the variable plotted horizontally.
• Such a curve is often called ogive to describe its general shape. From
this required results can be read (see example later).
• The median and quartiles divide a set of ranked observations into four
parts. Two other sets of measure, deciles and percentiles are useful.
Quantiles - definitions
Deciles: the deciles of a set of ranked observations on a variable are
the variable values which divide the set into ten equal parts.

Percentiles: the percentiles of a set of ranked observations on a


variable are the variable values which divide the set into one hundred
equal parts.

More generally we can consider measures which divide a set of ranked


observations into q parts; one of these measures will be the p-th q-tile
where p takes one of the values 1, 2, …, q-1. Thus for the quartiles q = 4
and p = 1, 2, 3.

The general term for these measures is quantiles; the quartiles, deciles
and percentiles are examples.
Example: Deciles and quartiles of a grouped continuous
variable (Clarke & Cooke, 4th Ed., example 4.6.1)
The incomes of married couples over retiring age in 1973 are shown in columns 1
and 2 of the Table below. Draw a cumulative frequency curve for the data, and
from it estimate the lowest decile, the median, the lower and upper quartiles of
income. Use the curve to estimate the proportion of married couples who had a
gross weekly income between £22 and £28.

Number of married Cumulative frequency


Income (£) couples (thousands) (thousands)
12- 144 144
14- 330 474
16- 329 803
18- 247 1050
20- 412 1462
25- 206 1668
30 or over 391 2059
Example: Deciles and quartiles of a grouped continuous
variable (Clarke & Cooke, 4th Ed., example 4.6.1)
• The first step is to add the third column to the table containing cumulative
frequency
• Then these values of cumulative frequency are plotted against the upper end-
points of the class-intervals of the income distribution.
• The end of the first interval is £13.99, so that the cumulative frequency for
income £13.99 is 144.
• There were no incomes below £12, so we set the cumulative frequency at £11.99
equal to zero.
• We add the remaining points from the table, as shown by the circles in the Figure
below.
• Put the best smooth curve through the circles on the diagram to get the
cumulative frequency curve as asked for.
Cumulatative frequency curve for income of married women
couples

2000

1500
Frequency

1000

500

0
0 5 10 15 20 25 30 35 40 45 50
Gross weekly income (£)
Example: Deciles and quartiles of a grouped continuous
variable (Clarke & Cooke, 4th Ed., example 4.6.1)
• Note that the upper limit has been conveniently set at £50
• The total frequency, in thousands, is 2059.
• One-tenth of the total frequency must lie below the first decile. Thus from the
graph we need to find the income corresponding to 205.9 on the vertical scale: it
is £14.40 as accurately as we can read it. The first decile is thus £14.40.
• From the graph, the median has to have half the total frequency, i.e. 1029.5,
below it. The income corresponding to this is £19.80.
1
• The quartiles correspond to cumulative frequencies of × 2059 = 514.75 and
4
3
× 2059 = 1544.25; they are therefore, from the graph, 𝑄1 = £16.20 and 𝑄3 =
4
£26.20
Example: Deciles and quartiles of a grouped continuous
variable (Clarke & Cooke, 4th Ed., example 4.6.1)
• Finally from the graph the cumulative frequency up to £22 is 1230 thousands,
and up to £28 is 1590 thousands, so that the number of married couples having
360
incomes between £22 and £28 is 360 thousands. This is a proportion =
2059
1
0.175 (i.e 17 %) of the whole
2

You might also like