You are on page 1of 59

INTRODUCTION TO STATISTICS &

PROBABILITY
CHAPTER-IV
Measures of Dispersion
(Variation)
4.1 Introduction

• We have seen that averages are representatives of


a frequency distribution. But averages fail to give a
complete picture of the distribution. They do not tell
anything about the spread or dispersion of the data.
Suppose that we have the distribution of yield (kg
per plot) of two rice varieties from 5 plots each.
Variety 1: 45 42 42 41 40
Variety 2: 54 48 42 33 30
The mean yield of both varieties is 42 kg. The
mean doesn’t tell us how the observations are
close to each other .This example suggests that a
measure of central tendency alone is not
sufficient to describe a frequency distribution.
Therefore, we should have a measure of spreads
of observations.
Objectives of measuring variation

To describe dispersion (variability) in a data.


To compare the spread in two or more
distributions.
To determine the reliability of an average.
4.2 Absolute and relative measures
Measures of variation may be either absolute or
relative.
a)Absolute measures
Absolute measures of variation are expressed in
the same unit of measurement in which the
original data are given.
These values may be used to compare the
variation in two distributions provided that the
variables are in the same units and of the same
average size.
Continued…
Absolute measures can not be used :
1) In case the two sets of data are expressed in
different units.
E.g: Quintals of sugar versus tones of sugarcane.
2) In case the average sizes are very different
E.g:Manager’s salary versus worker’s salary.
In such cases measures of relative dispersion should
be used.
b)Relative measures

A relative measure of dispersion is the ratio of


absolute measure of dispersion to an
appropriate measure of central tendency. It is a
unit less measure.
Example: relative range
4.3 Types of measures of variation

a) The range and relative range


The range is greatly affected by extreme
scores, it may give a distorted picture of the
scores. The following two distributions have
the same range, 13, yet appear to differ greatly
in the amount of variability
Merits and Demerits of range

Merits:
• It is rigidly defined.
• It is easy to calculate and simple to understand.
Demerits:
• It is not based on all observation.
• It is highly affected by extreme observations.
• It is affected by fluctuation in sampling.
• It is not liable to further algebraic treatment.
• It can not be computed in the case of open end distribution.
• It is very sensitive to the size of the sample
Example:

1. Find the relative range of the above two distribution.


Solution: In both distribution the
𝑅𝑎𝑛𝑔𝑒 13
𝑅𝑅 = = = 0.17
𝑀𝑎𝑥. 𝑉𝑎𝑙𝑢𝑒 + 𝑀𝑖𝑛. 𝑉𝑎𝑙𝑢𝑒 45 + 32
2. If the range and relative range of a series are 4 and 0.25
respectively. Then
what is the value of:
a) Smallest observation
b) Largest observation
Solution:
Given that: Range=4 and RR=0.25, but
I)Range = 𝐿 − 𝑆 ⟺ 4 = 𝐿 − 𝑆
𝑅𝑎𝑛𝑔𝑒
𝐼𝐼) 𝑅𝑅 =
𝐿+𝑆
4
⇔ 0.25 =
𝐿+𝑆
4
⟺𝐿+𝑆 = 0.25
= 16
Then when we solve equations in I and II simultaneously
we get
L=10 and S=6
b)The Quartile Deviation (Semi-inter
quartile range), Q.D

The inter quartile range is the difference between


the third and the first quartiles of a set of items
and quartile deviation (or semi-inter quartile
range) is half of the inter
quartile range.
Inter quartile range=𝑄3 − 𝑄1

𝑄3 − 𝑄1
𝑄. 𝐷 =
2
c) Coefficient of Quartile Deviation (C.Q.D)

It gives the average amount by which the two


quartiles differ from the median.
Example:Compute Q.D and its coefficient for the
following distribution.

LCF
Values Frequency

140- 150 17 17
150- 160 29 46
160- 170 42 88
170- 180 72 160
180- 190 84 244
190- 200 107 351
200- 210 49 400
210- 220 34 434
220- 230 31 465
230- 240 16 481
240- 250 12 493
Solution:
𝑖𝑛 𝑤
𝑄𝑖 = 𝐿𝑄𝑖 + − 𝑐𝑓
4 𝑓𝑄𝑖

Where: i=1,2,3.
Note the first quartile value is
𝑖𝑛 𝑡ℎ 1∗493 𝑡ℎ
item= item=123.25𝑡ℎ item. Which is between
4 4
𝑡ℎ 𝑡ℎ
123 𝑖𝑡𝑒𝑚 𝑎𝑛𝑑124 𝑖𝑡𝑒𝑚 𝑖𝑡 belongs to the fourth class.
1∗𝑛 𝑤 1∗493 10
𝑄1 = 𝐿𝑄1 +
4
− 𝑐𝑓
𝑓𝑄1
= 170 +
4
− 88
72

=174.9.
Note the third quartile value is
𝑖𝑛 𝑡ℎ 3∗493 𝑡ℎ
item= item=369.75𝑡ℎ item. Which is between
4 4
369𝑡ℎ 𝑖𝑡𝑒𝑚 𝑎𝑛𝑑370𝑡ℎ 𝑖𝑡𝑒𝑚 𝑖𝑡 belongs to the seventh class.
3∗𝑛 𝑤 3∗493 10
𝑄3 = 𝐿𝑄3 + − 𝑐𝑓 =200 + − 351
4 𝑓𝑄1 4 49
=203.83.
𝑄 ;𝑄 203.83;174.9
Then 𝑄. 𝐷 = 3 1 = = 14.47
2 2
𝑄3 − 𝑄1 203.83 − 174.9
𝐶. 𝑄. 𝐷 = = = 0.076
𝑄3 + 𝑄1 203.83 + 174.9

Remark: Q.D or C.Q.D includes only the middle 50% of the


observation.
d)The Mean Deviation (M.D):
 The mean deviation of a set of items is defined as the
arithmetic mean of the values of the absolute deviations from a
given average.
 Depending up on the type of averages used we have different
mean deviations.
1) Mean Deviation about the mean denoted by M. D(X) and given
by:
𝑛
𝑖=1 𝑥𝑖 ;𝑋
𝑀. 𝐷 𝑋 = , for ungrouped data
𝑛
𝑛
𝑖=1 𝑓𝑖 𝑥𝑖 ;𝑋
𝑀. 𝐷 𝑋 = 𝑛 , for grouped data
𝑖=1 𝑓𝑖
Steps to calculate M.D (𝑿):

1. Find the arithmetic mean, (𝑿)


2. Find the deviations of each reading from (𝑿)
3. Find the arithmetic mean of the deviations, ignoring
sign
2) Mean Deviation about the median denoted by M. D(𝑋) and
given by:
𝑛
𝑖=1 𝑥𝑖 ;𝑋
𝑀. 𝐷(𝑋) = , for ungrouped data
𝑛
𝑛
𝑖=1 𝑓𝑖 𝑥𝑖 ;𝑋
𝑀. 𝐷(𝑋) = 𝑛 , for grouped data
𝑖=1 𝑓𝑖
Steps to calculate M.D (𝑿):

1. Find the median, (𝑿)


2. Find the deviations of each reading from (𝑿) .
3. Find the arithmetic mean of the deviations,
ignoring sign.
3) Mean Deviation about the Mode denoted by M. D(𝑋)and given by
𝑛
𝑖=1 𝑥𝑖 ;𝑋
𝑀. 𝐷(𝑋) = , for ungrouped data
𝑛
𝑛
𝑖=1 𝑓𝑖 𝑥𝑖 ;𝑋
𝑀. 𝐷(𝑋) = 𝑛 𝑓 , for grouped data
𝑖=1 𝑖
Steps to calculate M.D ( 𝑿 ):

1. Find the mode, ( 𝑿 )


2. Find the deviations of each reading from ( 𝑿 ):
3. Find the arithmetic mean of the deviations,
ignoring sign.
Example-1:
1. The following are the number of visit made by
ten mothers to the local doctor’s surgery. 8, 6, 5, 5,
7, 4, 5, 9, 7, 4
Find mean deviation about mean, median and mode.
Solution:
First calculate the three averages
Clearly:𝑋 = 6, 𝑋 = 5.5 𝑎𝑛𝑑 𝑋=5
Then take the deviations of each observation
from these averages.
Continued…
𝑿𝒊 𝑿𝒊 − 𝑿 = 𝑿𝒊 − 𝟔 𝑿𝒊 − 𝑿 = 𝑿𝒊 − 𝟓. 𝟓 𝑿𝒊 − 𝑿 = 𝑿𝒊 − 𝟓

4 2 1.5 1
4 2 1.5 1
5 1 0.5 0
5 1 0.5 0
5 1 0.5 0
6 0 0.5 1
7 1 1.5 2
7 1 1.5 2
8 2 2.5 3
9 3 3.5 4
Total 14 14 14
𝑛 𝑛
𝑥𝑖 ;𝑋 𝑥𝑖 ;6 14
• 𝑀. 𝐷 𝑋 = 𝑖=1
= 𝑖=1
= = 1.4
𝑛 𝑛 10
𝑛 𝑛
𝑥𝑖 ;𝑋 𝑥𝑖 ;5.5 14
• 𝑀. 𝐷(𝑋) = 𝑖=1
= 𝑖=1
= = 1.4
𝑛 𝑛 10
𝑛 𝑛
𝑥𝑖 ;𝑋 𝑥𝑖 ;6 14
• 𝑀. 𝐷 𝑋 = 𝑖=1
= 𝑖=1
= = 1.4
𝑛 𝑛 10

Example-2:
Find mean deviation about mean, median and
mode for the following distributions.
Continued…
Class Frequency
40-44 7
45-49 10
50-54 22
55-59 15
60-64 12
65-69 6
70-74 3
Solution: First find the mean deviation about
the arithmetic mean
Class 𝒇 𝒊 ∗ 𝑿𝒊 𝑿𝒊 − 𝑿 𝒇 𝒊 𝑿𝒊 − 𝑿
Frequency
Class Mark
𝑓𝑖
𝑿𝒊
40-44 7 42 294 13 91
45-49 10 47 470 8 80
50-54 22 52 1144 3 66
55-59 15 57 855 2 30
60-64 12 62 744 7 84
65-69 6 67 402 12 72
70-74 3 72 216 17 51
Total 75 4125 474
Continued…
8
𝑖=1 𝑓𝑖 𝑋𝑖 4125
Arithmetic Mean=𝑋 = 8 = = 55
𝑖=1 𝑓𝑖 75
𝑛
𝑖<1 𝑓𝑖 𝑥𝑖 − 𝑋 474
𝑀. 𝐷 𝑋 = 𝑛 = = 6.3
𝑖<1 𝑓𝑖 75
Solution: Second find the mean deviation about
the median and the mode
Freque Class LCF 𝑿𝒊 − 𝑿 𝒇 𝒊 𝑿𝒊 − 𝑿
𝒇 𝒊 𝑿𝒊 − 𝑿
Class ncy Mark 𝑿𝒊 − 𝑿
𝑓𝑖 𝑿𝒊
40-44 7 42 7 12.2 85.4 10.7 74.9
45-49 10 47 17 7.2 72 5.7 57
50-54 22 52 39 2.2 48.4 0.7 15.4
55-59 15 57 54 2.8 42 4.3 64.5
60-64 12 62 66 7.8 93.6 9.3 111.6
65-69 6 67 72 12.8 76.8 14.3 85.8
72 75 17.8 53.4 19.3
70-74 3 57.9

Total 75 471.6 467.1


Continued…
• The Median=𝑥 = 𝐿𝑚 +
𝑛
2
− 𝑐𝑓
𝑤
𝑓𝑚
75 5
= 49.5 + − 17 =54.2
2 22
∆1
• The Mode=𝑋 = 𝐿𝑚 + 𝑤
∆1 :∆2
12
= 49.5 + 5 = 52.7
12 + 7
𝑛
𝑖=1 𝑓𝑖 𝑥𝑖 ;𝑋 471.6
• 𝑀. 𝐷 𝑋 = 𝑛 = = 6.3
𝑖=1 𝑓𝑖 75

𝑛
𝑖=1 𝑓𝑖 𝑥𝑖 ;𝑋 467.1
• 𝑀. 𝐷 𝑋 = 𝑛 𝑓 = = 6.2
𝑖=1 𝑖 75
• Remark: Mean deviation is always minimum about the median.
e)Coefficient of Mean Deviation (C.M.D)
Example:
calculate the C.M.D about the mean, median and
mode for the data in example 1 above.
Solution:
f) Variance
Definition : The variance is the arithmetic mean of the
squares of the distance each value is from the mean. The
symbol for the population variance is σ2 (σ is the Greek lower
case letter sigma). Let x1,x2,…,xN be the measurements on N
population units then, the population variance is given by the
formula:
𝑁 2 𝑁 2
𝑖=1(𝑥𝑖 ;µ) 𝑖=1 𝑓𝑖 (𝑥𝑖 ;µ)
𝜎2 = and 𝜎 2 = ,for frequency
𝑁 𝑁
distribution.
𝑁
𝑖=1 𝑥𝑖
where µ = 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑚𝑒𝑎𝑛 = and
𝑁
N=Population size.
g) standard deviation

Definition : The standard deviation is the square


root of the variance. The symbol for the
population standard deviation is 𝜎. The
corresponding formula for the standard deviation
is
𝑁 (𝑥 ;µ)2
𝑖=1 𝑖
𝜎 = 𝜎2 = .
𝑁
Example : The height of members of a certain committee was
measured in inches and the data is presented below.
Height(x): 69 66 67 69 64 63 65 68 72
Solution:
𝒙𝒊 𝒙𝒊 − 𝝁 𝒙𝒊 − 𝝁 𝟐
𝑁
𝑖<1 𝑥𝑖
63 -4 16 µ = 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑚𝑒𝑎𝑛 =
𝑁
64 -3 9 69 + 66 + ⋯ + 72 603
= =
65 -2 4 9 9
= 67 𝑖𝑛𝑐𝑕𝑒𝑠
66 -1 1
67 0 0
𝑁 2
68 1 1 2 𝑖<1(𝑥𝑖 −µ) 64
𝜎 = = = 7.11𝑖𝑛𝑐𝑕2
𝑁 9
69 2 4
69 2 4
72 5 25 and    2  7.11  2.66
9 9 𝟐 = 64
𝑖<1 𝒙𝒊 − 𝝁
𝑥𝑖 = 603
𝑖<1
Continued…
Definition : The sample variance is denoted by S2, and its
formula is
𝑛 2
2 𝑖=1(𝑥𝑖 ;𝑥 ) 𝑓𝑖 (𝑥;𝑥)2
𝑆 = = .
𝑛;1 𝑛;1
Definition : The sample standard deviation, denoted by S, is the
square root of the sample variance
𝑛 (𝑥 ;𝑥 )2
𝑖=1 𝑖 𝑓𝑖 (𝑥;𝑥)2
𝑆= 𝑆2 = = .
𝑛;1 𝑛;1
The following steps are used to calculate the sample variance:

1. Find the arithmetic mean.

2. Find the difference between each observation and the mean.

3. Square these differences.

4. Sum the squared differences.

5. Since the data is a sample, divide the number (from step 4 above) by

the number of observations minus one, i.e., n-1 (where n is equal to the

number of observations in the data set).


Examples:Find the variance and standard deviation
of the following sample data
1) 5, 17, 12, 10.
2)The data is given in the form of frequency distribution.

Class Frequency
40-44 7
45-49 10
50-54 22
55-59 15
60-64 12
65-69 6
70-74 3
Solution:1
𝒙𝒊 𝒙𝒊 − 𝒙 𝒙𝒊 − 𝒙 𝟐

5 -6 36
10 -1 1
12 1 1
17 6 36
𝟐
𝑥𝑖 = 44 𝒙𝒊 − 𝒙 = 74

𝑁
𝑖<1 𝑥𝑖 44
𝑥 = 𝑠𝑎𝑚𝑝𝑙𝑒 𝑚𝑒𝑎𝑛 = = = 11
𝑁 4
𝑛 2
𝑖=1(𝑥𝑖 ;𝑥) 74
Sample variance= 𝑠2= = 4;1 = 24.67 and
𝑛;1
sample standard deviation= 𝑠 2 =s = 4.97
Solution:
Freque Class 𝒇𝒊 𝒙𝒊 − 𝒙 𝒙𝒊 − 𝒙 𝟐 𝒇𝒊 𝒙𝒊 − 𝒙 𝟐

Class ncy Mark ∗ 𝑿𝒊


𝑓𝑖 𝑿𝒊
40-44 7 42 294 -13 169 1183
45-49 10 47 470 -8 64 640
50-54 22 52 1144 -3 9 198
55-59 15 57 855 2 4 60
60-64 12 62 744 7 49 588
65-69 6 67 402 12 144 864
70-74 3 72 216 17 289 867
Total 75 4125 4400

8
𝑖=1 𝑓𝑖 𝑋𝑖 4125
Arithmetic Mean=𝑋 = 8 = = 55
𝑖=1 𝑓𝑖 75
𝑓𝑖 (𝑥𝑖 ;𝑥)2 4400
Sample Variance=𝑆 2= = = 59.46.
𝑛;1 75;1
sample standard deviation= 𝑠 2 =s = 59.46 = 7.71
Continued…

Example: Suppose a distribution has mean 50 and standard


deviation
6.What percent of the numbers are:
a) Between 38 and 62
b) Between 32 and 68
c) Less than 38 or more than 62.
d) Less than 32 or more than 68.
Solutions:

b) 32 and 68 are at equal distance from the mean,50 and this distance
is 18.
𝑘𝑠 = 18 ⟺ 𝑘 = 3, 𝑠𝑖𝑛𝑐𝑒 𝑠 = 6
1 1 8
Then (1 − 2 ) ∗ 100% = (1 − 2 ) ∗ 100% = ∗ 100% = 88.89% of
𝑘 3 9
the data lies between
32 and 68.
Continued…
1
c) It is just the complement of a) i.e. at most ∗ 100% =
1 𝑘2
∗ 100% = 25% of the numbers lie less than 38 or more
22
than 62.
1
d) It is just the complement of b) i.e. at most 2 ∗ 100% =
1 𝑘
∗ 100% = 11.11% of the numbers lie less than 32 or more
32
than 68.
Example :
The average score of a special test of knowledge of wood
refinishing has a mean of 53 and standard deviation of 6. Find
the range of values in which at least 75% the scores will lie.
Solution:
From Chebyshev's Theorem we have that
1
at least 1 − 𝑘2 ∗ 100% 𝑡𝑕𝑒 𝑑𝑎𝑡𝑎 𝑏𝑒𝑙𝑜𝑛𝑔𝑠 𝑡𝑜 𝑖𝑛 𝑡𝑕𝑒 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙
𝑥 − 𝑘𝑠, 𝑥 + 𝑘𝑠 .
1
But 1 − ∗ 100% = 75%
𝑘2
1 3 𝑘2 − 1 3 2 − 4 = 3𝑘 2 ⟺ 𝑘 2 − 4 = 0
1− 2 = ⟺ = ⟺ 4𝑘
𝑘 4 𝑘2 4
⟺ 𝑘 = 2 𝑜𝑟 𝑘 = −2 𝑏𝑢𝑡 𝑘 𝑐𝑎𝑛 𝑛𝑜𝑡 𝑏𝑒 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒. Thus k=2.
Then at least 75% of the data belongs to in the interval
𝑥 − 𝑘𝑠, 𝑥 + 𝑘𝑠 53 − 2 ∗ 6,53 + 2 ∗ 6 =(41,65).
Solutions:
Using property c) above the new standard deviation
= 𝑘 𝑠 = 2 ∗ 3=6

Example:
The mean and the standard deviation of a set of numbers are
respectively 500 and 10.
a. If 10 is added to each of the numbers in the set, then what
will be the variance and standard deviation of the new set?
b. If each of the numbers in the set are multiplied by -5, then
what will be the variance and standard deviation of the new
set?
Solutions:

a. They will remain the same.


b. New standard deviation= 𝑘 S = 5 *10=50

h)Coefficient of Variation (C.V)


Is defined as the ratio of standard deviation to the
mean usually expressed as percent.

The distribution having less C.V is said to be less variable


or more consistent.
Examples:

An analysis of the monthly income paid (in Birr)


to workers in FTVTI for administrative staff and
academic staff gives the following results

Value Admin. staff (A) Aca.Staff(B)


Mean income 52.5 47.5
Median income 50.5 45.5
Variance 100 121

In which staff administrative or academic is there


greater variability in individual incomes?
Solution:
• Calculate coefficient of variation for both
staffs.

Since C.V-academic. staff > C.V-admin. staff,


there is greater variability in individual
incomes of academic. Staff.
Example:
A meteorologist interested in the consistency of temperatures in
three cities during a given week collected the following data. The
temperatures for the five days of the week in the three cities were
City 1 25 24 23 26 17

City2 22 21 24 22 20

City3 32 27 35 24 28

Which city have the most consistent temperature, based on these


data?
Solution:
City 1 25 24 23 26 17
𝑥𝑖 − 𝑥 2 4 1 0 9 6
City2 22 21 24 22 21
𝑥𝑖 − 𝑥 2 0 1 4 0 1
City3 32 27 35 21 28
𝑥𝑖 − 𝑥 2
9 4 36 64 1
Continued…
Cities Arithmetic mean Standard deviation C.V
City-1 23 2.23 0.097
City-2 22 1.22 0.055
City-3 29 5.34 0.184

The temperature in city-2 is consistent because it


has less C.V.
i)Standard Scores (Z-scores)

• If X is a measurement from a distribution with


mean 𝑋 and standard deviation S, then its
value in standard units is
Continued…
• Z gives the deviations from the mean in units
of standard deviation
• Z gives the number of standard deviation a
particular observation lie above or below the
mean.
• It is used to compare two observations
coming from different groups.
Example:
Two sections were given introduction to statistics examinations.
The following information was given.

Value Section 1 Section 2


Mean 78 90
Stan.deviation 6 5

Student A from section 1 scored 90 and student B from section


2 scored 95.Relatively speaking who performed better?

Solutions:
Calculate the standard score of both students.
Continued…

Student A performed better relative to his section because the


score of student A is two standard deviation above the mean
score of his section while, the score of student B is only one
standard deviation above the mean score of his section.
Example:
Two groups of people were trained to perform a certain task
and tested to find out which group is faster to learn the task. For
the two groups the following information was given:
Continued…
Value Group one Group two
Mean 10.4 min 11.9 min
Stan.dev. 1.2 min 1.3 min

Relatively speaking:
a) Which group is more consistent in its
performance
b) Suppose a person A from group one take 9.2
minutes while person B from Group two take 9.3
minutes,who was faster in performing the task?
Why?
Solution:
Continued…
• Child B is faster because the time taken by
child B is two standard deviation shorter(less
than) than the average time taken by group 2
while, the time taken by child A is only one
standard deviation shorter(less) than the
average time taken by group 1.

You might also like