You are on page 1of 17

Chapter 4:

Measures of Dispersion, Moments and Skewness


Sometimes when two or more different data sets are to be compared using measure of central
tendency or averages, we get the same result. Consider the runs scored by two batsman in their
last ten matches as follows:
Batman A: 30, 91, 0, 64, 42, 80, 30, 5, 117, 71
Batman B: 53, 46, 48, 50, 53, 53, 58, 60, 57, 52
Both batsmen A and B have same mean (i.e. 𝑋̅ = 53) but differ in their variation. There is
more variation in the scores of batsman A as compared to runs scored by batsman B. Thus, the
measure of central tendency are not sufficient to give complete information about a given data.
In such a situation the comparison becomes very difficult. We therefore, need some additional
information for comparison, concerning with, how the data are dispersed about (more spread
out) the average. This can be done by measuring the dispersion. The extent to which
observations tend to spread about an average is called dispersion. A quantity that measure this
characteristic, is called a measure of dispersion, scatter or variability.

Why Study Dispersion?


 A measure of location, such as the mean or the median, only describes the centre of the
data. It is valuable from that standpoint, but it does not tell us anything about the spread
of the data.
 It gives us additional information that enables us to judge the reliability of our measure
of the central tendency. A small value for a measure of dispersion indicates that the data
are clustered closely, say, around the arithmetic mean. The mean is therefore considered
representative of the data. Conversely, a large measure of dispersion indicates that the
mean is not reliable.
 Another reason for studying the dispersion in a set of data is to compare the spread in
two or more distributions.

4.1 Measures of Dispersion:


There are two types of measures of dispersion:
i. Absolute measures of dispersion
ii. Relative measures of dispersion

1|Page
Types of Measures of Dispersion

Absolute Measures Relative Measures

Range Coefficient of Range

Quartile Deviation Coefficient of Q.D

Mean Deviation Coefficient of M.D

Variance and S.D Coefficient of


Variation & S.D

An absolute measure of dispersion is one that measures the dispersion in terms of the same
units or in the square of units, as the units of the data. For example, if the units of the data are
rupees, meters, kilograms, etc., the units of the measures of dispersion will also be rupees,
meters, kilograms, etc.
On the other hand, relative measure of dispersion is one that is expressed in the form of a ratio,
co-efficient or percentage and is independent of the units of measurement. These measures are
useful for the comparison of data of different nature.
The main measures of dispersion are the following:
i) Range
ii) Semi-Interquartile Range or Quartile Deviation
iii) Mean Deviation or Average Deviation
iv) Variance and Standard Deviation
The range is based on the largest and the smallest values in the data set, that is, only two values
are considered. The mean deviation, the variance, and the standard deviation use all the values
in a data set and are all based on deviations from the arithmetic mean.

4.2 Range
The simplest measure of dispersion is the range. It is defined as the difference between the
largest value and smallest value in the data set.
For ungrouped data, Let 𝑥0 is the smallest value and 𝑥𝑚 is the largest value in a data set, then
the range, denoted by R, is defined as
𝑅 = 𝑥𝑚 − 𝑥0
For grouped data, range may be defined as the difference between the upper class boundary
of the highest class and the lower class boundary of the lowest class. We may also defined as

2|Page
the difference between the class mark of the highest class and the class mark of the lowest
class.
The range is easy to understand and to find, but its usefulness as a measure of dispersion is
limited. The range considers only the highest and lowest values of a distribution and fails to
take account of any other observation in the data set. As a result, it ignores the nature of the
variation among all the other observations, and it is heavily influenced by extreme values.
Because it measures only two values, the range is likely to change drastically from one sample
to the next in a given population, even though the values that fall between the highest and
lowest values may be quite similar. Keep in mind, too, that open-ended distributions have no
range because no “highest” or “lowest” value exists in the open-ended class.

The range is an absolute measure of dispersion. Its relative measure called the coefficient of
range or coefficient of dispersion, is defined by the relation:
𝑥𝑚 − 𝑥0
Co − efficient of Dispersion =
𝑥𝑚 + 𝑥0

Example 1: Find range and the coefficient of range from the following data:
23, 51, 50, 90, 40, 75, 20, 60, 44, 30
Solution: Here 𝑥𝑚 = 90; 𝑥0 = 20. So,
𝑅 = 𝑥𝑚 − 𝑥0 = 90 − 20 = 70
𝑥𝑚 − 𝑥0 90 − 20
Co − efficient of dispersion = = = 0.64
𝑥𝑚 + 𝑥0 90 + 20

Example 2: The following two data sets show the salaries (in thousands of dollars) for five
industrial salesmen from each of two different companies. Compute the range in salaries for
each sample.
Company A: 27, 51, 69, 66, 37
Company B: 56, 48, 41, 53, 52
Solution: For company A: R = Largest salary – Smallest salary
= 69 – 27 = 42
The salaries for these salesmen in company A have a range of $ 42, 000. Similarly, for company
B:
R = 56 – 41 = 15
The sample salaries from company A are more variation than those from company B. (or
therefore, we conclude that there is less dispersion in salesmen salaries of company B than
salaries of company A).

Example 3: Find the range of weights of the students of a university.


Weights(kg) 60-62 63-65 66-68 69-71 72-74
No. of students 5 18 42 27 8

3|Page
Solution:

weights f C. B Midpoints
60-62 5 59.5-62.5 61
63-65 18 62.5-65.5 64
66-68 42 65.5-68.5 67
69-71 27 68.5-71.5 70
72-74 8 71.5-74.5 73

𝑅 = Upper C. B of the highest class − Lower C. B of the lowest class


= 74.5 − 59.5 = 15 𝑘𝑔
Or
𝑅 = 𝑀𝑖𝑑 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 𝑡ℎ𝑒 ℎ𝑖𝑔ℎ𝑒𝑠𝑡 𝑐𝑙𝑎𝑠𝑠 − 𝑀𝑖𝑑 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 𝑡ℎ𝑒 𝑙𝑜𝑤𝑒𝑠𝑡 𝑐𝑙𝑎𝑠𝑠
= 73 − 61 = 12𝑘𝑔

4.3 Semi-Interquartile Range or Quartile Deviation


A measure of variation that tends to overcome the range’s susceptibility to extreme values is
called the interquartile range.
The difference between the third and first quartiles is called the interquartile range (I.Q.R).
Interquartile range = 𝑄3 − 𝑄1
Half of the difference between the third and first quartiles is called the semi-interquartile range
(S. I.Q.R.) or quartile deviation (Q.D). Symbolically, we have
𝑄3 − 𝑄1
𝑄. 𝐷 =
2
where 𝑄1 and 𝑄3 are the first and third quartiles of the data.
The quartile deviation is superior to range as it is not affected by extremely large or small
observations. It is a simple to understand and easy to calculate. It has certain disadvantages. It
gives no information about the position of observations lying outside the two quartiles; it is not
capable of further mathematical treatment. The quartile deviation is also an absolute measure
of dispersion. Its relative measure called the Coefficient of Quartile Deviation or of Semi-
Interquartile Range, is defined by the relation:
𝑄3 − 𝑄1
𝐶𝑜 − 𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑄. 𝐷 =
𝑄3 + 𝑄1

Example 4: Calculate interquartile range, quartile deviation and the coefficient of quartile
deviation by using the data of example 23(available in measures of central tendency).
Solution: Given 𝑄1 = 15, 𝑄3 = 80. Then
Interquartile range = 𝑄3 − 𝑄1 = 80 − 15 = 65
𝑄3 − 𝑄1 80 − 15
𝑄. 𝐷 = = = 32.5
2 2

4|Page
𝑄3 − 𝑄1 80 − 15 65
𝐶𝑜 − 𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑄. 𝐷 = = = = 0.684
𝑄3 + 𝑄1 80 + 15 95

Example 5: Calculate quartile deviation and the coefficient of quartile deviation from the
following data: 20, 21, 22, 23, 24, 25, 26, 27
Solution:
1(𝑛+1)𝑡ℎ 1(8+1)𝑡ℎ
𝑄1 = 𝑣𝑎𝑙𝑢𝑒 = 𝑣𝑎𝑙𝑢𝑒 = 2.25𝑡ℎ 𝑣𝑎𝑙𝑢𝑒
4 4
= 2𝑛𝑑 + 0.25(3𝑟𝑑 − 2𝑛𝑑)𝑣𝑎𝑙𝑢𝑒

= 21 + 0.25(22 − 21) = 21.25


3(𝑛+1)𝑡ℎ 3(8+1)𝑡ℎ
𝑄3 = 𝑣𝑎𝑙𝑢𝑒 = 𝑣𝑎𝑙𝑢𝑒 = 6.75𝑡ℎ 𝑣𝑎𝑙𝑢𝑒
4 4

= 6𝑡ℎ + 0.75(7𝑡ℎ − 6𝑡ℎ)𝑣𝑎𝑙𝑢𝑒

= 25 + 0.75(26 − 25) = 25.75

Here 𝑄1 = 21.25, 𝑄3 = 25.75 . So,


𝑄3 − 𝑄1 25.75 − 21.25
𝑄. 𝐷 = = = 2.25
2 2
𝑄3 − 𝑄1 25.75 − 21.25
𝐶𝑜 − 𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑄. 𝐷 = = = 0.0957
𝑄3 + 𝑄1 25.75 + 21.25

Example 6: Calculate the quartile deviation and coefficient of quartile deviation from the data
given below:
Maximum load (short-tons) No. of cables
9.25-9.75 2
9.75-10.25 5
10.25-10.75 12
10.75-11.25 17
11.25-11.75 14
11.75-12.25 6
12.25-12.75 3
12.75-13.25 1

Solution:
Maximum load(short-tons) No. of cables (f ) c.f
9.25-9.75 2 2
9.75-10.25 5 7
10.25-10.75 12 19
10.75-11.25 17 36
11.25-11.75 14 50
11.75-12.25 6 56
12.25-12.75 3 59
12.75-13.25 1 60
Total ∑f = 60

5|Page
For 𝑄1:

𝑛 𝑡ℎ 60 𝑡ℎ
(4) 𝑣𝑎𝑙𝑢𝑒 = ( 4 ) 𝑣𝑎𝑙𝑢𝑒 = 15th student which lies in the class 10.25-10.75. Therefore
ℎ 1.𝑛
𝑄1 = 𝑙 + ( − 𝑐)
𝑓 4

where 𝑙 = 10.25, ℎ = 0.5, 𝑓 = 12, 𝑐 = 7


0.5
= 10.25 + (15 − 7) = 10.25 + 0.33 = 10.58
12

For 𝑄3 :

3𝑛 𝑡ℎ 3×60 𝑡ℎ
( 4 ) 𝑣𝑎𝑙𝑢𝑒 = ( ) 𝑣𝑎𝑙𝑢𝑒 = 45th student which lies in the class (11.25-11.75). Therefore
4

ℎ 3.𝑛
𝑄3 = 𝑙 + ( − 𝑐)
𝑓 4

where 𝑙 = 11.25, ℎ = 0.5, 𝑓 = 14, 𝑐 = 36


0.5
= 11.25 + (45 − 36) = 11.25 + 0.32 = 11.57
14

𝑄3 − 𝑄1 11.57 − 10.58 0.99


Quartile Deviation (𝑄. 𝐷) = = = = 0.495
2 2 2
𝑄3 − 𝑄1 11.57 − 10.58 0.99
𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑄. 𝐷 = = = = 0.045
𝑄3 + 𝑄1 11.57 + 10.58 22.15

4.4 Mean Deviation (Or Average Deviation)


The arithmetic mean of the absolute deviations from an average (from mean or from median)
is called mean deviation (M.D). By absolute deviations, we mean that we consider all
deviations as positive. The reason to count the deviations as positive, i.e. to disregard the
algebraic signs (+ and -) is to avoid the difficulty arising from the property that the sum of
deviations of the observations from their mean is zero.
The difference (X – average) is called deviation and when we ignore the negative sign, this
deviation is written as |𝑋 − 𝑎𝑣𝑒𝑟𝑎𝑔𝑒| and is read as modulus deviation. The mean of these
modulus or absolute deviations is called mean deviation or mean absolute deviation (M.A.D).
Symbolically, we have
Mean Deviation from Mean:
∑|𝑥𝑖 −𝑥̅ |
𝑀. 𝐷 = (for ungrouped data)
𝑛

where 𝑥𝑖 is the value of each observation, 𝑥̅ is the mean of all values, 𝑛 is the no. of observation
in a sample, | | indicates the absolute values.
∑ 𝑓|𝑥𝑖 −𝑥̅ |
𝑀. 𝐷 = ∑𝑓
(for grouped data)

6|Page
The mean deviation is also defined in terms of absolute deviations from the median in a similar
way. Theory tells us that the mean deviation is least when the deviations are measured from
the median. But in practice, it is generally calculated from the arithmetic mean. The mean
deviation gives more information than the range or the quartile deviation as it is based on all
the observed values. It is easily calculated and readily understood. As it is not amenable to
mathematical treatment, its usefulness is limited.
The mean deviation is an absolute measure of dispersion. Its relative measure called the
coefficient of mean deviation, is obtained by dividing the mean deviation by the average used
in the calculation of deviations. Thus
𝑀. 𝐷. 𝑀. 𝐷.
𝐶𝑜 − 𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑀. 𝐷. = 𝑜𝑟
𝑀𝑒𝑎𝑛 𝑀𝑒𝑑𝑖𝑎𝑛
Example 7: Calculate the mean deviation from (i) mean, (ii) median, of the following set of
examination marks: 45, 32, 37, 46, 39, 36, 41, 48, 36.
Also calculate the co-efficient of mean deviation.
Solution: (i) Mean deviation from Mean
𝒙𝒊 ̅ = 𝒙𝒊 − 𝟒𝟎
𝒙𝒊 − 𝒙 |𝒙𝒊 − 𝒙 ̅|
45 45 - 40 = 5 5
32 -8 8
37 -3 3
46 6 6
39 -1 1
36 -4 4
41 1 1
48 8 8
36 -4 4
𝛴𝑥𝑖 = 360 0 𝛴|𝑥𝑖 − 𝑥̅ | = 40

∑𝑥 360
Here 𝑥̅ = 𝑛
= 9
= 40 𝑚𝑎𝑟𝑘𝑠

∑|𝑥𝑖 −𝑥̅ | 40
𝑀. 𝐷 = = = 4.4 𝑚𝑎𝑟𝑘𝑠
𝑛 9

𝑀. 𝐷. 4.4
𝐶𝑜 − 𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑀. 𝐷. = = = 0.11
𝑀𝑒𝑎𝑛 40
(ii) Mean deviation from Median
Arrange the data in the ascending order as 32, 36, 36, 37, 39, 41, 45, 46, 48. Here n = 9 (odd).
So
𝑛 + 1 𝑡ℎ
𝑀𝑒𝑑𝑖𝑎𝑛 = ( ) 𝑣𝑎𝑙𝑢𝑒
2
9 + 1 𝑡ℎ
𝑀𝑒𝑑𝑖𝑎𝑛 = ( ) 𝑣𝑎𝑙𝑢𝑒
2
10 𝑡ℎ
𝑀𝑒𝑑𝑖𝑎𝑛 = ( ) 𝑣𝑎𝑙𝑢𝑒 = 5𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 = 39 𝑚𝑎𝑟𝑘𝑠
2

7|Page
𝒙𝒊 𝒙 − 𝒎𝒆𝒅𝒊𝒂𝒏 |𝒙𝒊 − 𝒎𝒆𝒅𝒊𝒂𝒏|
32 -7 7
36 -3 3
36 -3 3
37 -2 2
39 0 0
41 2 2
45 6 6
46 7 7
48 9 9
𝛴𝑥𝑖 = 360 0 𝛴|𝑥𝑖 − 𝑚𝑒𝑑𝑖𝑎𝑛| = 39

∑|𝑥𝑖 − 𝑚𝑒𝑑𝑖𝑎𝑛| 39
𝑀. 𝐷 = = = 4.3 𝑚𝑎𝑟𝑘𝑠
𝑛 9
𝑀. 𝐷. 4.3
𝐶𝑜 − 𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑀. 𝐷. = = = 0.11
𝑀𝑒𝑑𝑖𝑎𝑛 39
Example 8: The weights of containers being shipped to Ireland are (in thousands of pounds):
95, 103, 105, 110, 104, 105, 112, 90
(a)What is the range of the weights?
(b) Compute the mean deviation of the weights?
Example 9: Calculate the mean deviation of the following frequency distribution showing the
weights of apples (grams):
Weight 65-84 85-104 105-124 125-144 145-164 165-184 185-204
f 9 10 17 10 5 4 5

Solution:
Weight f 𝒙𝒊 𝒇𝒊 𝒙𝒊 𝒙𝒊 − 𝒙̅ 𝒇𝒊 |𝒙𝒊 − 𝒙
̅|
65-84 9 74.5 670.5 -48.0 432.0
85-104 10 94.5 945.0 -28.0 280.0
105-124 17 114.5 1946.5 -8.0 136.0
125-144 10 134.5 1345.0 +12.0 120.0
145-164 5 154.5 772.5 32.0 160.0
165-184 4 174.5 698.0 52.0 208.0
185-204 5 194.5 972.5 72.0 360.0
Total 60 --- 7350.0 --- 1696.0

∑ 𝑓𝑥 7350.0
𝑥̅ = ∑𝑓
= = 122.5 𝑔𝑟𝑎𝑚𝑠
60

∑ 𝑓|𝑥𝑖 −𝑥̅ | 1696.0


𝑀. 𝐷 = ∑𝑓
= = 28.27 𝑔𝑟𝑎𝑚𝑠
60

8|Page
4.5 Variance and Standard Deviation
The most comprehensive descriptions of dispersion are those that deal with the average
deviation from some measure of central tendency. Two of these measures are important to our
study of statistics: the variance and the standard deviation. Both of these tell us an average
distance of any observation in the data set from the mean of the distribution.

Variance:
The variance of a set of observations is defined as the mean of the squares of deviations of all
observations from their mean.
When it is calculated from the entire population, the variance is called population variance,
traditionally denoted by 𝜎 2 (𝜎 is the Greek lower-case ‘sigma’). If, instead, the data from the
sample are used to calculate the variance, it is referred to as the sample variance and is denoted
by 𝑆 2 . Symbolically, we have
∑(𝑥𝑖 −𝜇)2
𝜎2 = , for population data
𝑁

∑(𝑥𝑖 −𝑥̅ )2
𝑆2 = , for sample data
𝑛

It may be remembered that the population variance 𝜎 2 is usually not calculated. The sample
variance 𝑆 2 is calculated and if needed, this 𝑆 2 is used to make inference about the population
variance. The variance is also denoted by Var(X).
It should be noted that the variance is in square of units in which the observations are expressed
and the variance is a large number compared to observations themselves. For frequency
distribution, the sample variance 𝑆 2 is defined as:

2
∑ 𝑓(𝑥𝑖 − 𝑥̅ )2
𝑆 =
∑𝑓
Using the alternative method: (Short-cut formula)
∑ 𝑥𝑖2 ∑ 𝑥𝑖 2 ∑ 𝑥2 ∑𝑥 2
𝑆2 = −( ) = − ( 𝑛 ) , for ungrouped data
𝑛 𝑛 𝑛

∑ 𝑓𝑥𝑖2 ∑ 𝑓𝑥𝑖 2 ∑ 𝑓𝑥 2 ∑ 𝑓𝑥 2
𝑆2 = −( ) = −( ) , for grouped data
𝑛 𝑛 𝑛 𝑛

Note:
The variance is non-negative and is zero only if all observations are the same.

Standard Deviation:
The positive square root of the variance is called standard deviation. When it is calculated from
∑(𝑥𝑖 −𝜇)2
the entire population, then it is denoted by 𝜎 (i.e. 𝜎 = √ ). And if it is calculated from
𝑁
the sample data then it is called sample standard deviation (S). Symbolically,

∑(𝑥𝑖 −𝑥̅ )2
𝑆=√ , for ungrouped data
𝑛

9|Page
∑ 𝑓(𝑥𝑖 −𝑥̅ )2
𝑆=√ ∑𝑓
, for grouped data

The standard deviation is expressed in the same units as the observations themselves and is a
measure of the average spread around the mean. It has a definite mathematical meaning, utilizes
all the observed values and is amenable to mathematical treatment but it is affected by extreme
values. The standard deviation is an absolute measure of dispersion. Its relative measure called
the coefficient of standard deviation, is defined as
Standard Deviation
𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑆. 𝐷. =
𝑀𝑒𝑎𝑛
And, multiplying this quantity by 100, we obtain a very important and well-known measure
called the coefficient of variation. In other words, it is defined as the ratio of the standard
deviation to the mean expressed in percentage. It is denoted by C.V. and its formula is
𝑆
𝐶. 𝑉. = × 100, 𝑓𝑜𝑟 𝑠𝑎𝑚𝑝𝑙𝑒 𝑑𝑎𝑡𝑎
𝑥̅
It is therefore used to compare the variation in two or more data sets or distributions that are
measured in different units, e.g. one may be measured in hours and the other in kilograms or
rupees.
A large value of C.V. shows that the values have much variation while smaller value of C.V.
shows that the data are more consistent (less variation). It is also used to check the consistent
performance of two sets of data.
Example 10: Calculate the variance and standard deviation from the following marks obtained
by 9 students:
45, 32, 37, 46, 39, 36, 41, 48, 36.
Also calculate the co-efficient of variation.
Solution:

𝒙𝒊 ̅ = 𝒙𝒊 − 𝟒𝟎
𝒙𝒊 − 𝒙 (𝒙𝒊 − 𝒙̅) 𝟐
45 45 - 40 = 5 25
32 -8 64
37 -3 9
46 6 36
39 -1 1
36 -4 16
41 1 1
48 8 64
36 -4 16
𝛴𝑥𝑖 = 360 0 𝛴(𝑥𝑖 − 𝑥̅ )2 = 232

∑𝑥 360
Here 𝑥̅ = 𝑛
=
9
= 40 𝑚𝑎𝑟𝑘𝑠

∑(𝑥𝑖 −𝑥̅ )2 232


𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 = 𝑆 2 = = = 25.78(𝑚𝑎𝑟𝑘𝑠)2
𝑛 9

10 | P a g e
∑(𝑥𝑖 − 𝑥̅ )2
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 = 𝑆 = √ = √25.78 = 5.08 𝑚𝑎𝑟𝑘𝑠
𝑛

𝑆 5.08
𝐶. 𝑉. = × 100 = × 100 = 12.7%
𝑥̅ 40
Using the alternative method:

𝒙𝒊 𝒙𝒊 𝟐
45 2025
32 1024
37 1369
46 2116
39 1521
36 1296
41 1681
48 2304
36 1296
𝛴𝑥𝑖 = 360 𝟐
𝛴𝒙𝒊 = 14632

2
∑ 𝑥𝑖2
2
∑ 𝑥𝑖
𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 = 𝑆 = −( )
𝑛 𝑛
14632 360 2
= −( ) = 1625.78 − 1600 = 25.78(𝑚𝑎𝑟𝑘𝑠)2
9 9

2
∑ 𝑥2 ∑ 𝑥𝑖
𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 = 𝑆 = √ 𝑖 − ( ) = √25.78 = 5.08 𝑚𝑎𝑟𝑘𝑠
𝑛 𝑛

Example 11: Calculate the variance and standard deviation of the following frequency
distribution showing the weights of apples (grams):
Weight 65-84 85-104 105-124 125-144 145-164 165-184 185-204
f 9 10 17 10 5 4 5

Solution:
Weight f 𝒙𝒊 𝒇𝒊 𝒙𝒊 𝒙𝒊 − 𝒙̅ (𝒙𝒊 − 𝒙 ̅) 𝟐 ̅) 𝟐
𝒇𝒊 (𝒙𝒊 − 𝒙
65-84 9 74.5 670.5 -48 2304 20736
85-104 10 94.5 945 -28 784 7840
105-124 17 114.5 1946.5 -8 64 1088
125-144 10 134.5 1345 +12 144 1440
145-164 5 154.5 772.5 32 1024 5120
165-184 4 174.5 698 52 2704 10816
185-204 5 194.5 972.5 72 5184 25920
Total 60 --- 7350.0 --- 72960
∑ 𝑓𝑥 7350.0
𝑥̅ = ∑𝑓
= = 122.5 𝑔𝑟𝑎𝑚𝑠
60

11 | P a g e
∑ 𝑓(𝑥𝑖 − 𝑥̅ )2 72960
2
𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 = 𝑆 = = = 1216(𝑔𝑟𝑎𝑚𝑠)2
∑𝑓 60

∑ 𝑓(𝑥𝑖 − 𝑥̅ )2
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 = 𝑆 = √ = √1216 = 34.87 𝑔𝑟𝑎𝑚𝑠
𝑛

Using the alternative method:

Weight 𝒇𝒊 𝒙𝒊 𝒇𝒊 𝒙𝒊 𝒙𝒊 𝟐 𝒇𝒊 𝒙𝒊 𝟐
65-84 9 74.5 670.5 5550.25 49952.25
85-104 10 94.5 945 8930.25 89302.5
105-124 17 114.5 1946.5 222874.25
125-144 10 134.5 1345 180902.5
145-164 5 154.5 772.5 119351.25
165-184 4 174.5 698 121801
185-204 5 194.5 972.5 189151.25
Total 60 --- 7350 973335

2
∑ 𝑓𝑥𝑖2
2
∑ 𝑓𝑥𝑖
𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 = 𝑆 = −( )
𝑛 𝑛

973335 7350 2
= −( ) = 16222.25 − 15006.25 = 1216(𝑔𝑟𝑎𝑚𝑠)2
60 60
2
∑ 𝑓𝑥𝑖2 ∑ 𝑓𝑥𝑖
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 = 𝑆 = √ −( ) = √1216 = 34.87 𝑔𝑟𝑎𝑚𝑠
𝑛 𝑛

Example 12: Calculate the sample standard deviation and co-efficient of variation from the
following data:
Heart rates 67 68 69 70 72 75
f 1 1 3 5 8 2

Solution:
𝒙𝒊 𝒇𝒊 𝒇𝒊 𝒙𝒊 𝒙𝒊 𝟐 𝒇𝒊 𝒙𝒊 𝟐
67 1
68 1
69 3
70 5
72 8
75 2
Total 20 1418 --- 100618

∑ 𝑓𝑥
𝑥̅ = = __________
∑𝑓

12 | P a g e
2
∑ 𝑓𝑥𝑖2 ∑ 𝑓𝑥𝑖
𝑆=√ −( ) = ___________
𝑛 𝑛

𝑆
𝐶. 𝑉. = × 100 = _______%
𝑥̅
Example 13: Suppose that, in a particular year, the mean weekly earnings of skilled factory
workers in one particular country were $ 19.50 with a standard deviation of $ 4, while for its
neighboring country the figures were Rs. 75 and Rs. 28 respectively. From these figures, it is
not immediately apparent which country has the greater variability in earnings. The coefficient
of variation quickly provides the answer:
For country No. 1:
𝑆 4
𝐶. 𝑉. = × 100 = × 100 = 20.5 𝑝𝑒𝑟 𝑐𝑒𝑛𝑡
𝑥̅ 19.50
For country No. 2:
𝑆 28
𝐶. 𝑉. = × 100 = × 100 = 37.3 𝑝𝑒𝑟 𝑐𝑒𝑛𝑡
𝑥̅ 75
From these calculations, it is immediately obvious that the spread of earnings in country No. 2
is greater than that in country No. 1, and the reasons for this could then be sought.

4.6 Properties of Variance:


(i) The variance of a constant is zero. i.e. 𝑉𝑎𝑟(𝑐) = 0.
(ii) The variance is independent of origin i.e. if we add or subtract any constant from
all the observations, variance remains unchanged. i.e. 𝑉𝑎𝑟(𝑋 ± 𝑐) = 𝑉𝑎𝑟(𝑋).
∴ 𝑉𝑎𝑟(𝑋 + 𝑐) = 𝑉𝑎𝑟(𝑋) + 𝑉𝑎𝑟(𝑐) = 𝑉𝑎𝑟(𝑋) + 0 = 𝑉𝑎𝑟(𝑋)
𝑉𝑎𝑟(𝑋 − 𝑐) = 𝑉𝑎𝑟(𝑋) − 𝑉𝑎𝑟(𝑐) = 𝑉𝑎𝑟(𝑋) − 0 = 𝑉𝑎𝑟(𝑋)
(iii) The variance is affected by change of scale i.e. when all the values are multiplied
or divided by a constant, the variance of the values is multiplied or divided by the
square of the constant. i.e.
𝑉𝑎𝑟(𝑐. 𝑋) = 𝑐 2 𝑉𝑎𝑟(𝑋)
𝑋 1
𝑉𝑎𝑟 ( ) = 2 𝑉𝑎𝑟(𝑋)
𝑐 𝑐
(iv) 𝑉𝑎𝑟(𝑋 ± 𝑌) = 𝑉𝑎𝑟(𝑋) + 𝑉𝑎𝑟(𝑌), where X and Y are two independent variables.
(v) Variance is non-negative. i.e. 𝑉𝑎𝑟 ≥ 0.

4.7 Properties of Standard Deviation:

(i) 𝑆. 𝐷. (𝑐) = 0, where c is a constant.


(ii) 𝑆. 𝐷. (𝑋 ± 𝑐) = 𝑆. 𝐷. (𝑋)
(iii) 𝑆. 𝐷. (𝑐. 𝑋) = |𝑐|𝑆. 𝐷. (𝑋)
𝑋 1
(iv) 𝑆. 𝐷. ( 𝑐 ) = |𝑐 | 𝑆. 𝐷. (𝑋)
(v) 𝑆. 𝐷. (𝑋 ± 𝑌) = √𝑉𝑎𝑟(𝑋) + 𝑉𝑎𝑟(𝑌)

13 | P a g e
4.8 Interpretation of Standard Deviation:
The standard deviation (𝜎 𝑜𝑟 𝑆) has not a simple interpretation like the arithmetic mean
(𝜇 𝑜𝑟 𝑥̅ ) that is interpreted as the balancing point for the distribution. The standard deviation
is very important concept that serves as a basic measure of variability. We have stressed that
A smaller value of the standard deviation indicates the most of the observations in a data set
are close to the mean while a large value implies that the observations are scattered widely
about the mean.
The Russian mathematician, P.L. Chebychev (1821-1894) developed a theorem that allows us
to determine the minimum proportion of the values that lie within a specified number of
standard deviation of the mean. We interpret the standard deviation using two measure.
Chebychev Rule or Theorem:
For any set of data (population or sample), the proportion of the values that lie be within k
1
standard deviation of the mean is atleast (1 − 𝑘 2 ), where k is any number greater than 1.

“Within k standard deviations” interprets as the interval: 𝑥̅ − 𝑘𝑆 𝑡𝑜 𝑥̅ + 𝑘𝑆 .


Chebychev’s rule applies to any data set, regardless of the shape of the frequency distribution.
Example 14: The arithmetic mean biweekly amount contributed by the Dupree Paint
employees to the company’s profit-sharing plan is $51.54, and the standard deviation is $7.51.
At least what percent of the contributions lie within plus 3.5 standard deviations and minus 3.5
standard deviations of the mean?
Solution: About 92 percent, found by
1 1 1
1− 2
= 1− 2
= 1− = 0.92
𝑘 (3.5) 12.25
Empirical Rule:
This is a rule of thumb that applies to data sets with frequency distributions that are mound-
shaped and symmetrical. According to this empirical rule:
 Approximately 68% of the data values will lie within 1 standard deviation of the mean,
i.e. within the interval (𝑥̅ − 𝑆, 𝑥̅ + 𝑆).
 Approximately 95% of the data values will lie within 2 standard deviations of the mean,
i.e. within the interval (𝑥̅ − 2𝑆, 𝑥̅ + 2𝑆).
 Approximately 99.7% (practically all) of the data values will lie within 3 standard
deviations of the mean, i.e. within the interval (𝑥̅ − 3𝑆, 𝑥̅ + 3𝑆).

14 | P a g e
4.9 Skewness:
In Chapter 3, we described measures of central location for a set of observations by reporting
the mean, median, and mode. We also described measures that show the amount of spread or
variation in a set of data, such as the range and the standard deviation.
Another characteristic of a set of data is the shape. There are four shapes commonly observed:
symmetric, positively skewed, negatively skewed, and bimodal. In a symmetric set of
observations the mean and median are equal and the data values are evenly spread around these
values. The data values below the mean and median are a mirror image of those above.
A set of values is skewed to the right or positively skewed if there is a single peak and the
values extend much further to the right of the peak than to the left of the peak. In this case, the
mean is larger than the median.
In a negatively skewed distribution there is a single peak but the observations extend further
to the left, in the negative direction, than to the right. In a negatively skewed distribution, the
mean is smaller than the median. Positively skewed distributions are more common. Salaries
often follow this pattern. Think of the salaries of those employed in a small company of about
100 people. The president and a few top executives would have very large salaries relative to
the other workers and hence the distribution of salaries would exhibit positive skewness. A
bimodal distribution will have two or more peaks. This is often the case when the values are
from two or more populations. This information is summarized in Chart-1.

CHART-1: Shapes of Frequency Polygons

15 | P a g e
4.10 Kurtosis
Kurtosis is a measure of the tailedness of a distribution. Tailedness is how
often outliers occur. Excess kurtosis is the tailedness of a distribution relative to a normal
distribution.
-Distributions with medium kurtosis (medium tails) are mesokurtic.
-Distributions with low kurtosis (thin tails) are platykurtic.
-Distributions with high kurtosis (fat tails) are leptokurtic.

Exercise:
Q 1: The net operating incomes (in millions dollars) of 12 leading banks in the United States
last year are given below:
257, 182, 180, 105, 81, 56, 313, 127, 91, 96, 85, 38
Find the range in operating incomes for these banks.
Q 2: Two corporations each hired 10 graduates. The starting salaries (in thousands of dollars)
for each graduate are shown below. Find the range of starting salaries for each corporation.
Corporation A: 41, 38, 39, 45, 47, 41, 44, 41, 37, 42
Corporation B: 40, 23, 41, 50, 49, 32, 41, 29, 52, 58.
Q 3: Calculate interquartile range, quartile deviation and the coefficient of quartile deviation
from the following marks obtained by 20 students on a test in statistics.
53, 74, 82, 42, 39, 20, 81, 68, 58, 28, 67, 54, 93, 70, 30, 55, 36, 37, 29, 61
Q 4: Calculate quartile deviation and coefficient of quartile deviation from the following
distribution of aptitude scores.

16 | P a g e
Scores 50-59 60-69 70-79 80-89 90-99
No. of applicants 7 12 15 4 2

Q 5: Calculate the mean deviation from mean from the following marks obtained by students
on a test.
2, 5, 6, 6, 8, 9, 12, 13, 16, 23
Q 6: There were five customer service representatives on duty at the Electronic Super Store
during last weekend’s sale. The numbers of HDTVs these representatives sold are: 5, 8, 4, 10,
3. Calculate the (a) range, (b) quartile deviation (c) mean deviation
Q 7: The Department of Statistics at Western State University offers eight sections of basic
statistics. Following are the numbers of students enrolled in these sections:
34, 46, 52, 29, 41, 38, 36, 28. Calculate the (a) range, (b) quartile deviation (c) mean deviation
Q 8: Find the mean deviation and co-efficient of mean deviation from mean of the following
data given below:
x 88 93 98 103 108 113
f 6 4 10 6 3 1

Q9: Calculate the variance, standard deviation and coefficient of variation from the following
data:
102, 104, 106, 108, 110
Q 10: Suppose a small company started the production line to build computers. The sample
obtained during the five weeks of production, the output is
5, 9, 16, 17, 18
Calculate the variance and standard deviation.
Q11: Calculate the variance and standard deviation from the following data:
x 10 15 20 25 30
f 1 2 3 2 1

Q12: Calculate the variance and standard deviation from the following wages distribution such
as:
Wages 30-35 35-40 40-45 45-50 50-55 55-60
f 12 18 29 32 16 8

17 | P a g e

You might also like