You are on page 1of 15

CHAPTER

Measures of Dispersion 4
Chapter Contents
4.1. Definition of dispersion
4.2. Objectives of Measuring
In unit two we have learnt about how to condense data in table (frequency
variability
distribution table) and then represent in graphs and diagrams. 4.3. Describing Variability
4.4. Measures of Shape
In unit three we also discussed the ways how to describe data by comput-
4.5. Dispertion percentages
ing simple descriptive measures called measures of central tendency and 4.6. Review questions
position.
In this unit (unit four) we will continue our study about data description
by computing descriptive measures known as measures of dispersions and
measures of shape.

Unit objectives
• Define what we mean by measures of dispersion
• Compare and contrast absolute and relative forms.
• Enumerate some measures of dispersion used to describe data.
• Classify shape of a given data set via symmetry and peakedness.

Learning outcomes
After completing this section, successful students will be able to:
- Define what is meant by measure of dispersion
- Tell the objectives of knowing measures of dispersion.
- Understand properties of good measures dispersion.

Key words
• Coefficient of Variation • Dispersion (Variation) • Kurtosis • Mean
Absolute Deviation • Quartile Deviation • Range • Skewness • Standard
Deviation • Standard Score • Variance
Examples 43

4.1 Definition of measures of dispersion


Definition 4.1. Dispersion means scattering of the observations among
themselves or from a central value (Mean/ Median/ Mode) of data.

Definition 4.2. Measures of dispersions are numerical values that in-


dicate the extent or degree to which items (values) tend to spread (vary)
about an average value. They are also called measures of variation.

4.2 Objectives of measuring variation


The idea of dispersion is important in the study of wages of workers, prices
of commodities, standard of living of different people, distribution of wealth,
distribution of land among farmers and various other fields of life.
The term dispersion is used to indicate the fact that, within a given group,
the items differ from one another in size or, in other words, that there is a
lack of uniformity in their magnitudes.
- Why it is necessary to measure variation of a data set?
To judge the reliability of measures of central tendency
To control variability itself.
To compare two or more data sets in terms of their variability.
To make further statistical analysis.

Properties of a good measure of variation


y All properties of a good measure of average can also be applied to mea-
sures of variation.

4.3 Describing the variation of the data set:


Examples
⇒ The most common summary measures used to describe the variations of
the distribution are the depicted in the flow chart below. Among these mea-
sures, standard deviation and coefficient of variation are frequently used.

4.3.1 Absolute and relative forms


There are two types of measure of dispersion namely Absolute Measure of
Dispersion and Relative Measure of Dispersion.
Lecture notes (set by: Z)
Examples Some measures of variations 44

1. Absolute form: a Measure of Dispersion is said to be absolute type


if that measure is expressed in the same units as the units of the
original observations. When the observations are in kilograms, the
absolute measure is also in kilograms.

Example 4.1. range, standard deviation


Relative form: a Measure of Dispersion is said to be relative type if
that measure is free of the units in which the original data is measured.
If the original data is in dollar or kilometers, we do not use these units
with relative measure of dispersion.

Example 4.2. relative range, coefficient of mean deviation

Figure 4.1. Measures of dispersion

⇒ Now it is time to deal the properties and formulas of the above measures
one by one.

4.3.2 Some measures of variations


y Let’s start with the class activity problems

Activity 4.1. Let there be two small groups of boys and girls whose scores
in an achievement test are such as the following. Is the performance of
two groups the same?

Table 4.1. Achievement test score of boys and girls


Test scores of Group A(boys) 40,38,36,17,20,19,18,3,5,4
Test scores of Group B(girls) 19,20,22,18,21,23,17,20,22,18

Activity 4.2. A testing lab wishes to test two experimental brands of out-
door paint to see how long each would last before fading. The testing

Some measures of variations Lecture notes (set by: Z)


Examples Some measures of variations 45

lab makes six gallons of paint (from each brand) to test. The results (in
months) are as given below. Can we say both brands are equally good?

Table 4.2. Outdoor paint life (months)


Brand A 10, 60, 50,30,40,20.
Brand B 35, 45, 30,35,40,25.

(1) The Range and the Coefficient of Range


⇒ Range is the simplest possible measure of dispersion. It is crude (poor)
measure.

Definition 4.1. The range is the difference between the largest and
smallest values of the data set. It is denoted by the letter R.

Computing Range (R)

⇒For raw data :X1 , X2 , ..., Xn

R = Largest value − Smallest valueL − S (4.3.1)

The relative form of range is called relative range and is given by

L−S R
RR = = (4.3.2)
L+S L+S

xm1 xm2 ... xmk


⇒ For grouped data : .
f1 f2 ... fk

R = xmk − xm1 = U CBk − LCB1 (4.3.3)

Merits and demerits of range

⇒Merits
• It is simple to calculate.
• It gives a general idea about the total spread of the observations.
• It is used to approximate standard deviation.

Range
sw (4.3.4)
4
⇒Demerits
• It is sensitive to outliers.
• It misses the middle n − 2 observations (a lot of information).
• It cannot be computed in the case of open end distribution.

Some measures of variations Lecture notes (set by: Z)


Examples Some measures of variations 46

ILLUSTRATION 4.1. Calculating range for raw data


Given a sample of 5 test grades (90, 95, 80, 60, and 75). Find the range.
Solution:Follow the steps below
Step1 Arrange data in ascending order: 60, 75, 80, 90, and 95.
Step2 Use formula (4.3.2) with L=95 and S=60 ; R= L – S =95-60=35
Step3 Interpretation: There is a 35 point difference of student who scores
highest and lowest mark.

ILLUSTRATION 4.2. Find the range and relative range for the achievement
test scores data in (activity 4.1).
Solution: Follow the steps below
Step1 Find the highest and lowest scores. For boys: H=40 and L=3 ; for
girls: H=23 and L=17
Step2 Use formula (4.3.2)
i. R boys= L – S =40-3=37 and R girls= L – S =23-17=6
ii. RR boys= R/L+ S =37/43=0.86 and RR girls= 6/40=0.15
Step3 Interpretation:
i. There is a 37 point difference between boys who scores highest
and lowest mark and 6 point to that of girls.
ii. The variation is large in boys than girls (since 0.86 >0.15).

Activity 4.3. Following are the wages of 8 workers of a factory. Find the
range and the coefficient of range. Wages in ($): 1400, 1450, 1520, 1380, 1485, 1495, 1575, 1440.

(2) The Quartile deviation and its coefficient


Definition 4.3. The quartile deviation is s half of the inter quartile range.

Q3 − Q1
QD = (4.3.5)
2

Note 4.1. It gives the average amount by which the two quartiles differ
from the median.
Coefficient of Quartile Deviation (C.Q.D)
Q3 −Q1
2 2QD Q3 − Q1
CQD = Q3 +Q1 = = (4.3.6)
2
Q3 + Q1 Q3 + Q1

Note 4.2. Q.D or C.Q.D includes only the middle 50% of the observation.

Some measures of variations Lecture notes (set by: Z)


Examples Some measures of variations 47

(3) The mean deviation and its Coefficient


⇒ Mean deviation is a measure of dispersion better than range and quartile
deviation.

Definition 4.4. The Mean Deviation is defined as the mean of the absolute
deviations of observations from some suitable average which may be the
arithmetic mean, median or mode. It is denoted by the letter MD .

Computing mean deviation

⇒For raw data :X1 , X2 , ..., Xn :

Pn
i=1 |Xi − A|
MD (A) = (4.3.7)
n
The relative form of mean deviation is called coefficient of mean deviation
(CMD) and is given by
MD (A)
CMD (A) = (4.3.8)
A
xm1 xm2 ... xmk
⇒ For grouped data : .
f1 f2 ... fk
Pk
i=1 |Xmi − A|fi
MD (A) = Pk (4.3.9)
i=1 fi

Where A=suitable average (mean,median or mode)

Note 4.3. Mean deviation is the least if the deviations are taken from the
median.

Merits and demerits of MD

⇒Merits
• It is based on all the observations.

⇒Demerits
• We use the absolute deviations, which does not seem logical.
• It cannot be used in statistical inference.

ILLUSTRATION 4.3. Calculating MD for raw data


For the marks obtained by nine students given as: Marks (out of 25): 7,
4, 10, 9, 15, 12, 7, 9, 7
(a) Calculate the mean deviation about (i) arithmetic mean (ii) median
(iii) mode
Some measures of variations Lecture notes (set by: Z)
Examples Some measures of variations 48

(b) Verify that the mean deviation about median is the least.
Solution:Let Xi represent mark of ith student.
Step1 Arrange marks in ascending order: 4, 7, 7, 7, 9, 9, 10, 12, and 15.
Step2 Find average. Mean=8.89, median=9, mode=7(since 7 is repeated
maximum number of times).
Step3 Compute sum of deviations ignoring the negative sign and divide by
n.

(a)
Pn
i=1 |Xi − X̄| 4.89 + 1.89 + ... + 6.11 21.21
MD (X̄) = = = = 2.35
n 9 9

Pn
i=1 |Xi − X̂| 3 + 0 + ... + 8 23
MD (X̂) = = = = 2.56
n 9 9

Pn
i=1 |Xi − X̃| 5 + 2 + ... + 6 21
MD (X̃) = = = = 2.33
n 9 9
(b) From the above calculations, it is clear that the mean deviation from
the median has the least value.

(4)The standard deviation and its coefficient (s, c.v)


y As far as the important statistical tools are concerned, the first important
tool is the mean X̄, and the second important tool is the standard deviation
S.

Definition 4.5. The standard deviation is defined as the positive square


root of the mean of the square deviations taken from A.M (mean) of the
data. It indicates how far, on average, the observations are from the mean.
It is denoted by the letter s for sample and σ for population data.

Note 4.4. Large standard deviation implies more variation.

Computing standard deviation

⇒For raw data :X1 , X2 , ..., Xn,N :


sP v
N
u Pn
i=1 (Xi − µ)2 − X̄)2
t i=1 (Xi
u
σ= ,s = (4.3.10)
N n−1

Some measures of variations Lecture notes (set by: Z)


Examples Some measures of variations 49

x1 x2 ... xk
⇒ For grouped data : .
f1 f2 ... fk
v
uP
u k (X 2
t i=1 i − X̄ )fi
s= (4.3.11)
fi − 1
P

Short cut formula (used to calculate s by hand) for standard deviation


⇒For raw data and For grouped data:
v v
u Pn u Pk
2
− nX̄ 2 Xi2 fi − fi X̄ 2
P
t i=1 Xi
u u
i=1
s= ,s = t (4.3.12)
n−1 fi − 1
P

Merits and demerits of s, σ

⇒Merits
• It has definite value.
• It is based on all observations.
• It is used to compare data sets when the means are equal.
• Used for the analysis of data and for the various statistical inferences.

⇒Demerits
• It is difficult to calculate
• Do not compare data sets expressed in different measuring units.
• It is sensitive to outliers.

ILLUSTRATION 4.4. Calculating SD for raw data


Compute the standard deviation for the following sample data using direct
method and short cut method: 2, 4, 8, 6, 10, & 12.
Solution::
Step1 Find mean.⇒ X̄ = 1 P
6
Xi = 42/6 = 7
Step2 Direct method: s
1 X
s= (Xi − X̄)2
n−1
s
1
= [(2 − 7)2 + (4 − 7)2 + (8 − 7)2 + (6 − 7)2 + (10 − 7)2 + (12 − 7)2 ] = 3.42
6−1

Short cut method:


s
1 X 2 22 + 42 + 82 + 62 + 102 + 122 − 6 ∗ 72
s= ( Xi − nX̄ 2 ) = = 3.42
n−1 5

Step3 Interpretation: On average the sample observations spread from the


mean,7, by amounts of 3.42 units.
Some measures of variations Lecture notes (set by: Z)
Examples Some measures of variations 50

Example 4.3. Exercise: a) Compute the standard deviation for the achieve-
ment test score data in activity 4.1 and compare the two groups.
b) Repeat part a) for the life time of paint data in activity 4.2.

Activity 4.4. Compute the standard deviation Consider the data set:5,5,5,5
calculate the SD, what you observe ?

(5)The Variance (σ 2 , s2 )
⇒Variance is another absolute measure of dispersion.

Definition 4.6. It is defined as the average of the squared difference between


each of the observations in a set of data and the mean. For a sample data
the variance is denoted is denoted by s2 and the population variance is
denoted by σ 2 (sigma square).
# The variance formula: In simple words we can say that variance is
the square of standard deviation.

variance = (standard deviation)2 (4.3.13)

Coefficient of Variation(c.v)
The most important of all the relative measure of dispersion is the coefficient
of variation. .It is a pure number

Definition 4.7. Coefficient of variation is defined as the ratio of standard


deviation to the mean expressed as percents. In other words, c.v is the
value of Standard deviation when mean is assumed equal to 100. It is
denoted by the letter C.V.

Note 4.5. A distribution with smaller c.v than the other is taken as more
consistent (less variable) than the other.

Computing coefficient of variation


Standarddeviation
CV = ∗ 100% (4.3.14)
mean
# Symbolically,
s σ
CV = ∗ 100%, CV = ∗ 100% (4.3.15)
x̄ µ

,→:Where n=sample size, N=population size, µ=population mean, σ=population


–sd, s=sample –sd

Some measures of variations Lecture notes (set by: Z)


Examples Some measures of variations 51

⇒ Merits of c.v
• The c.v is used to compare the dispersion in different sets of data
particularly the data which differ in their means or differ in the units
of measurement.
• Used to know the consistency of the data. By consistency we mean
the uniformity in the values of the data distribution.

ILLUSTRATION 4.5. Calculating c.v

Properties of the Standard Deviation

(1) Combined Variance (standard deviation):σ12 2


, σ12 .
Like the mean, we can calculate the combined standard deviation or vari-
ance to measure the overall variations of different sets of data.
Suppose we have two sets of data containing n1 and n2 observations with
means X̄1 and X̄2 , and variances σ12 and σ22 . Then the combined variance
2
,σ12 ,is given by

1
2
σ12 = [n1 σ12 + n2 σ22 + n1 (X̄1 − X¯12 )2 + n2 (X̄2 − X¯12 )2 ] (4.3.16)
n1 + n2
Some measures of variations Lecture notes (set by: Z)
Examples Some measures of variations 52

y Where X¯12 =the combined mean.


y The combined standard deviation is given by
q
2
σ12 = σ12 (4.3.17)

Example 4.4. For a group of 50 male workers the mean and standard de-
viation of their daily wages are 63 and 9 respectively. For a group of 40
female workers these values are 54 and 6 respectively. Find the mean and
variance of the combined group of 90 workers.
Solution:Here n1 = 50, X̄1 = 63, S12 = 81, n2 = 40, X̄2 = 54, s22 = 36.
1. Combined Arithmetic Mean=X̄c
= n1 X̄1 +n2 X̄2
n1 +n2

= 50(63)+40(54)
50+40

= 5310
90

= 59
2. Combined Variance=Sc2
¯ 2 ¯ 2
n1 [S12 +(X1 −barX 2
c )]+n2 [S2 +(X2 −barXc )]
= n1 +n2
50[81+(63−59)2 ]+40[36+(54−59)2 ]
= 50+40
4850+2240
= 90
7290
= 90

= 81

(2) Standard deviation is used to transform individual scores to stan-


dard scores.

Standardized (Z) scores

Definition 4.8. It is a measure of performance (or relative standing). It


is used to compare two observations coming from different groups. It is
denoted by Z. it uses both mean and standard deviation in its computation.

Properties of z score
 An individuals z score has the same percentile rank as did that indi-
viduals original score.
 The shape of the distribution of z scores is the same as that of the
original data.
 The mean of a set of z scores is zero.
 The variance of a group of z score is 1 (also SD is 1).

Some measures of variations Lecture notes (set by: Z)


Examples Some measures of variations 53

Computing z-score or standard score


individual value − mean
Standardscore = (4.3.18)
standard deviation of individual values
# Symbolically, for sample and population

Xi − X̄ Xi − µ
Zi = , Zi = (4.3.19)
s σ

ILLUSTRATION 4.6. Calculating Z-score

(3)It is used to calculate measures of skewness and kurtosis based on


moments

Some measures of variations Lecture notes (set by: Z)


Measures of Shape(MOS) 54

4.4 Measures of Shape(MOS)


Moments
1. The rth raw moment (moment about the origin) is defined as:
Pn
i=1 Xir
µ´r = (4.4.1)
n

2. The rth central moment (moment about the mean) is given by:
Pn
i=1 (Xi − X̄)r
µr = (4.4.2)
n

3. The rth moment about any number a is defined as:


Pn
i=1 (Xi − a)r
µ´a = (4.4.3)
n

Skewness: is Lack of symmetry.

In other words, it is the degree of asymmetry or departure from symmetry


of a distribution. If a distribution is not symmetrical then it is called skewed
distribution.
• Positively skewed distribution: If the frequency curve has longer
tail to right the distribution is known as positively skewed distribution
and M ean > M edian > M ode.
• Negatively skewed distribution: If the frequency curve has longer
tail to left the distribution is known as negatively skewed distribution
and M ean < M edian < M ode. Look at figure 4.2
Measure of Skewness: The difference between the mean and mode gives
as absolute measure of skewness. If we divide this difference by standard
deviation we obtain a relative measure of skewness known as coefficient and
denoted by SK .Sometimes the mode is difficult to find. So, in this case, we
use another formula.
# Karl Pearson coefficient of Skewness:

X̄ − X̂
SK = (4.4.4)
s

Lecture notes (set by: Z)


Measures of Shape(MOS) 55

Figure 4.2. shape of curves based on Symmetry

# Bowley’s coefficient of Skewness ( coefficient of skewness based on quar-


tiles )

Q1 + Q3 − 2X̃
SB = (4.4.5)
Q3 − Q1

# The moment coefficient of skewness is denoted by B1 and is given by:


µ3
B1 = 3 (4.4.6)
µ22

The shape of the curve is determined by the value of SK or SB or B1 .


Kurtosis
It is the degree of peakedness of a distribution, usually taken relative to a normal distribution. A
distribution having relatively high peak is called leptokurtic. If a curve representing a distribution
is flat topped, it is called platykurtic. The normal distribution which is not very high peaked or flat
topped is called mesokurtic.

Measures of kurtosis The moment coefficient of kurtosis denoted by B2 and is given by


µ4
B2 = (4.4.7)
µ22

The peakedness depends on the value of B2 .


Interpretation of B2
If B2 =3, the distribution is said to be normal and the curve is mesokurtic.
If B2 >3, the distribution is said to be more peaked and the curve is leptokurtic.
If B2 < 3, the distribution is said to be flat topped and the curve is platykurtic.

Lecture notes (set by: Z)


Review questions 56

Figure 4.3.

4.5 Dispersion Percentages


Suppose we have a set of data with mean X̄ = 10 and standard deviations
s = 5. How do we interpret this information? The answer is given by the
following rules.

Interpretation of’s’: Tchebychev’s Rule & The Empirical


Rule
Note 4.6. The Empirical Rule For data with a ”bell-shaped” graph,
about 68% of the values lie with in one standared deviations of the mean,
about 95% lie with in two standard deviations, and over 99% lie with in
three standard deviations of the mean.

Note 4.7. Tchebychev’s Rule For any set of data, at least (1 − k12 ) of the
values lie with in k standard deviations of the mean (that is, have z −scores
between −k and +k).
Restated: at least(1 − k12 ) observations lie in the interval:(X̄ − ks, X̄ + ks
for sample)

Activity 4.5. Matt reads at an average (mean) rate of 20.6 pages per hour,
with a standard deviation of 3.2. What percent of the time will he read
between 15 and 26.2 pages per hour?

4.6 Review questions

Lecture notes (set by: Z)

You might also like