You are on page 1of 61

Measures of dispersion are descriptive statistics

that show how similar or varied the data are for


a particular variable (or data item).

Measures of spread include the range, quartiles


and the interquartile range, variance, standard
deviation and coefficient of variation.
The mode, median, and mean summarise the
data into a single value that is typical or
representative of all the values in the dataset.
But this is only part of the 'picture' that
summarises a dataset.
Measures of spread summarise the data in a
way that shows how scattered the values are
and how much they differ from the mean value.
Batsman A has four innings and scores 25, 25, 25, 25
Batsman B has four innings and scores 0, 0, 0, 100

They both average 25 but they are very different scores.


Measures of dispersion are sometimes referred
to as variation or spread.
The main measures of dispersion are:
◦ Range
◦ Quartile deviation
◦ Mean deviation
◦ Standard deviation
◦ Variance
◦ Coefficient of variation
Measures the difference between the highest and
the lowest item of the data.

Range = highest observation – lowest observation

While easy to calculate and understand, the range


can easily be distorted by extreme values.
.
The quartiles divide the set of measurements into four equal parts. 

•Twenty-five per cent of the measurements are less than the lower quartile
•Fifty per cent of the measurements are less than the median

•Seventy-five per cent of the measurements are less than the upper quartile. 

So, fifty per cent of the measurements are between the lower quartile and the
upper quartile.

The lower quartile, median and upper quartile are often denoted by Q1, Q2 and
Q3 respectively.

The median is also denoted by m.


A quartile is found by dividing by
dividing the arrayed data into four
quarters.

There will be three quartiles (not four!).


To determine the interquartile range
deduct Q1 from Q3
Let n = the number of observations
Where n/4 is not a whole number -
let m= the next whole number larger than n/4
the lower quartile is the mth observation of the
sorted data counting from the lower end.
the upper quartile is the mth observation of
the sorted data counting from the upper end.
Where n/4 is a whole number - let m= n/4

the lower quartile is halfway between the mth


observation and the (m + 1)th observation of
the sorted data counting from the lower end.

theupper quartile is similarly defined counting


from the upper end
The median of an even data set is calculated as
the average of n/2 and [(n/2) +1]
By measuring the middle 50% of values only,
the interquartile range overcomes the problem
of outlying observations.

It may be calculated from grouped frequency


distributions that contain open-ended class
intervals
Deviation is the difference between each item
of data and the mean.
The mean deviation measures the average
distance of each observation away from the
mean of the data.
Mean deviation gives an equal weight to each
observation and is generally more sensitive
than either the range or interquartile range,
since a change in any value will affect it.
1. Calculate the mean of the data
2. Subtract the mean from each observation
and record the difference
3. Write down the absolute value of each of
the differences (i.e. ignore positive and
negative signs)
4. Calculate the mean of the absolute values
The four steps for mean deviation are written as

1. Find x̅

2. For each x, find x – x̅

3. Now find Ix - x̅I for each x

4. Find ΣIx - x̅I and divide by n


The batting score of two cricketers, Joe and John were
recorded over their 10 completed innings to date. Their scores
were
Joe 32 27 38 25 20 32 34 28
40 29
John 3 80 64 5 11 87 0 2
53 0
1. For each cricketer calculate the batting average (mean
score) and the mean deviation

2. There is only one batting position left on the team for


the next match.

Would you pick Joe or John? Why?


x̅ = 32+27+38+25+20+32+34+28+40+29
10

= 30.5 runs
x̅ = 3+80+64+5+11+87+0+2+53+0
10
x̅ = 30.5 runs
Mean Deviation calculations for Joe Mean = 30.5

Score Deviation from mean Absolute value of


(x) ( x - x̅ ) deviation
I x - x̅ I
32 +1.5 1.5
27 -3.5 3.5
38 +7.5 7.5
25 -5.5 5.5
20 -10.5 10.5
32 +1.5 1.5
34 +3.5 3.5
28 -2.5 2.5
40 +9.5 9.5
29 -1.5 1.5
Σ( x - x̅ ) = 0 ΣI x - x̅ I = 47.0
Joe = ΣIx - x̅I
n
= 47.0
10
= 4.7
Mean Deviation calculations for John Mean = 30.5

Score Deviation from mean Absolute value of


(x) ( x - x̅ ) deviation
I x - x̅ I
3 -27.5 27.5
80 +49.5 49.5
64 +33.5 33.5
5 -25.5 25.5
11 -19.5 19.5
87 +56.5 56.5
0 -30.5 30.5
2 -28.5 28.5
53 +22.5 22.5
0 -30.5 30.5
Σ( x - x̅ ) = 0 ΣI x - x̅ I = 324.0
John = ΣI x - x̅I
n

= 324.0
10

= 32.4
It depends on your priorities!
If you are looking for a consistent batter, the
choice will be Joe, since he has a much smaller
mean deviation.

While he probably would not make a large


score, his past record indicates he can be relied
on to make a score fairly close to his average
(the mean deviation of his score is less than 5).
If you are looking for a batter who could
possibly obtain a large score (and in
doing so considerably help to win a
match) then John will be the choice.

However there also seems a high risk


that he would get a very low score.
The standard deviation measures the average
distance each item of data is from the mean.
It differs from the mean deviation in that it
squares each deviation and then finds the
square root of this rather than taking the
absolute value.
Standard deviation is the most commonly used
measure of dispersion for statisticians.
.
. Ϯ
_____
√Σ ;dž- džͿำ
E
In practice, it is rare to calculate the value of mu
since populations are usually very large.
Instead, it is far more likely that the sample
standard deviation (denoted by S) will be
required.

The formula for calculating S is not the same as


simply substituting S for and n for N. There
are good theoretical reasons for not doing so.
If we did this, and used the value of S to
estimate the value of , the result would be
too small.
To correct this error, instead of dividing by n
we divide by (n-1). This results in the following
formula for S:
A market researcher, Gavin, was interested in the
discrepancy in the prices charged by supermarkets
for a leading brand of pet food. To check this he selected
a random sample of 12 stores and recorded the
price displayed for the same 400 gram can.

The prices in cents were


89 72 77 78 82 94
80 88 85 73 78 76

Find
a) the mean
b) the range of prices
c) the mean deviation of prices
d) the standard deviation of prices
Now use the Financial Calculator to Find the Mean and Standard
Deviation… check the question to see if it a sample or a population.
 The standard deviation can not be negative
 The more scattered the data, the greater the
standard deviation
 The standard deviation of a set of data is zero if,
and only if, the observations are of equal value
 A rough guide to whether a calculated answer is
‘reasonable’ is for the standard deviation to be
approximately 30% of the range
Note for this data set is the standard deviation around 30% of the range?

Range is …… 94 – 72 = 22

Standard Deviation is 6.7 … 22 x .3 = 6.6 …. It won’t always be this close


 The standard deviation can never exceed the range of
data
 Due to the squaring operation involved in its
calculation, the standard deviation is more influenced
by extreme values than is the mean deviation and is
usually slightly larger than the mean deviation
 The square of the standard deviation is called variance
Variance measures the spread (in total)
of the data.

Variance is equal to the square of the


standard deviation so

Variance = (Standard Deviation) 2


Example using standard deviation
Batsman A has four innings & scores 25, 25, 25, 25
Batsman B scores 0, 0, 0, 100
What are their averages ?
What are their Standard Deviations?
Using the calculator Stat Mode 1,1 then

25, xy, 0, ENT,


25,xy, 0, ENT,
25, xy, 0, ENT,
25, xy, 100, ENT
RCL 4 and RCL 7 will give the calculation for
the mean score for each batsman.
 What is the difference between the
Population and a Sample?

How can I remember that on my calculator?


Sample smaller than the population 5<6 and
8<9? OR “S” for sample
Back to our batsmen ….
Batsman A has four innings and scores 25, 25, 25, 25

Batsman B scores 0, 0, 0, 100

What are their Standard Deviations? If we took a sample of


their batting scores – perhaps there were 20 innings and we
sampled 4 innings – or the population that is they had only
batted 4 times – these were the complete scores

Batsman A has a standard deviation of 0 whether it is a


sample or not (RCL 5, RCL 6) and Batsman B has a Standard
Deviation of 50 if it was a sample (RCL 8) and 43.3 if it was
the population (total data) (RCL 9)

Long Hand calculation : -


Long Hand calculation :

Sample for A (0^2 + 0^2 + 0^2 + 0^2) / 3 = 0


Population for A (0^2 + 0^2 + 0^2 + 0^2) / 4 = 0

    Dev Dev
Scores B From mean Squared
1 0 -25 625
2 0 -25 625
3 0 -25 625
4 100 75 5625
Total 7500

Sum of deviations divided by 3 2500


Now find the square root 50

Sum of deviations divided by 4 1875


Now find the square root 43.30127
This is a measure of relative variability.
It is used to measure the changes that have
taken place in a population over time, or to
compare the variability of two populations that
are expressed in different units of
measurement.
It is expressed as a percentage rather than in
terms of the units of the particular data.
The formula for the coefficient of variation,
denoted by V is:
V = 100 multiplied by S and divided by x̅
Where x̅ = the mean of the sample
S = the standard deviation of the
sample

V = 100 . S. %

This is the Standard Deviation divided by the mean – that is the ratio of
the standard deviation to the mean – the higher the figure the greater the
deviation

Back to Batsman B we would have a Coefficient of variation of 50 / 25 =


2 – quite a significant variation
Using the calculator for the Standard Deviation – Mode 1,0 , then 10, ENT, 15,
ENT………. Then RCL 5 since the question said it was a sample ( not RCL 6)
Answer is 4.1231
Using Calc – Mode, 1,0 (2nd f , Alpha,0,0 – to clear just in case
36, xy, 3, ENT, 37, xy, 3, ENT ………. Then RCL 4 for the mean and RCL 5 for
sample deviation = 1.70
Note we will get the calculator to calculate the standard deviation – just
demo long hand calculation here – also shouldn’t be asked for the Mean
Deviation in a class test.
Suggested Questions from Textbook……

Select a range of questions from the Problems in this chapter – enough so that you feel
comfortable with this topic

You might also like