Professional Documents
Culture Documents
Measures of Dispersion 4
Chapter Contents
4.1. Definition of dispersion
4.2. Objectives of Measuring
In unit two we have learnt about how to condense data in table (frequency
variability
distribution table) and then represent in graphs and diagrams. 4.3. Describing Variability
4.4. Measures of Shape
In unit three we also discussed the ways how to describe data by comput-
4.5. Dispertion percentages
ing simple descriptive measures called measures of central tendency and 4.6. Review questions
position.
In this unit (unit four) we will continue our study about data description
by computing descriptive measures known as measures of dispersions and
measures of shape.
Unit objectives
• Define what we mean by measures of dispersion
• Compare and contrast absolute and relative forms.
• Enumerate some measures of dispersion used to describe data.
• Classify shape of a given data set via symmetry and peakedness.
Learning outcomes
After completing this section, successful students will be able to:
- Define what is meant by measure of dispersion
- Tell the objectives of knowing measures of dispersion.
- Understand properties of good measures dispersion.
Key words
• Coefficient of Variation • Dispersion (Variation) • Kurtosis • Mean
Absolute Deviation • Quartile Deviation • Range • Skewness • Standard
Deviation • Standard Score • Variance
Examples 43
⇒ Now it is time to deal the properties and formulas of the above measures
one by one.
Activity 4.1. Let there be two small groups of boys and girls whose scores
in an achievement test are such as the following. Is the performance of
two groups the same?
Activity 4.2. A testing lab wishes to test two experimental brands of out-
door paint to see how long each would last before fading. The testing
lab makes six gallons of paint (from each brand) to test. The results (in
months) are as given below. Can we say both brands are equally good?
Definition 4.1. The range is the difference between the largest and
smallest values of the data set. It is denoted by the letter R.
L−S R
RR = = (4.3.2)
L+S L+S
⇒Merits
• It is simple to calculate.
• It gives a general idea about the total spread of the observations.
• It is used to approximate standard deviation.
Range
sw (4.3.4)
4
⇒Demerits
• It is sensitive to outliers.
• It misses the middle n − 2 observations (a lot of information).
• It cannot be computed in the case of open end distribution.
ILLUSTRATION 4.2. Find the range and relative range for the achievement
test scores data in (activity 4.1).
Solution: Follow the steps below
Step1 Find the highest and lowest scores. For boys: H=40 and L=3 ; for
girls: H=23 and L=17
Step2 Use formula (4.3.2)
i. R boys= L – S =40-3=37 and R girls= L – S =23-17=6
ii. RR boys= R/L+ S =37/43=0.86 and RR girls= 6/40=0.15
Step3 Interpretation:
i. There is a 37 point difference between boys who scores highest
and lowest mark and 6 point to that of girls.
ii. The variation is large in boys than girls (since 0.86 >0.15).
Activity 4.3. Following are the wages of 8 workers of a factory. Find the
range and the coefficient of range. Wages in ($): 1400, 1450, 1520, 1380, 1485, 1495, 1575, 1440.
Q3 − Q1
QD = (4.3.5)
2
Note 4.1. It gives the average amount by which the two quartiles differ
from the median.
Coefficient of Quartile Deviation (C.Q.D)
Q3 −Q1
2 2QD Q3 − Q1
CQD = Q3 +Q1 = = (4.3.6)
2
Q3 + Q1 Q3 + Q1
Note 4.2. Q.D or C.Q.D includes only the middle 50% of the observation.
Definition 4.4. The Mean Deviation is defined as the mean of the absolute
deviations of observations from some suitable average which may be the
arithmetic mean, median or mode. It is denoted by the letter MD .
Pn
i=1 |Xi − A|
MD (A) = (4.3.7)
n
The relative form of mean deviation is called coefficient of mean deviation
(CMD) and is given by
MD (A)
CMD (A) = (4.3.8)
A
xm1 xm2 ... xmk
⇒ For grouped data : .
f1 f2 ... fk
Pk
i=1 |Xmi − A|fi
MD (A) = Pk (4.3.9)
i=1 fi
Note 4.3. Mean deviation is the least if the deviations are taken from the
median.
⇒Merits
• It is based on all the observations.
⇒Demerits
• We use the absolute deviations, which does not seem logical.
• It cannot be used in statistical inference.
(b) Verify that the mean deviation about median is the least.
Solution:Let Xi represent mark of ith student.
Step1 Arrange marks in ascending order: 4, 7, 7, 7, 9, 9, 10, 12, and 15.
Step2 Find average. Mean=8.89, median=9, mode=7(since 7 is repeated
maximum number of times).
Step3 Compute sum of deviations ignoring the negative sign and divide by
n.
(a)
Pn
i=1 |Xi − X̄| 4.89 + 1.89 + ... + 6.11 21.21
MD (X̄) = = = = 2.35
n 9 9
Pn
i=1 |Xi − X̂| 3 + 0 + ... + 8 23
MD (X̂) = = = = 2.56
n 9 9
Pn
i=1 |Xi − X̃| 5 + 2 + ... + 6 21
MD (X̃) = = = = 2.33
n 9 9
(b) From the above calculations, it is clear that the mean deviation from
the median has the least value.
x1 x2 ... xk
⇒ For grouped data : .
f1 f2 ... fk
v
uP
u k (X 2
t i=1 i − X̄ )fi
s= (4.3.11)
fi − 1
P
⇒Merits
• It has definite value.
• It is based on all observations.
• It is used to compare data sets when the means are equal.
• Used for the analysis of data and for the various statistical inferences.
⇒Demerits
• It is difficult to calculate
• Do not compare data sets expressed in different measuring units.
• It is sensitive to outliers.
Example 4.3. Exercise: a) Compute the standard deviation for the achieve-
ment test score data in activity 4.1 and compare the two groups.
b) Repeat part a) for the life time of paint data in activity 4.2.
Activity 4.4. Compute the standard deviation Consider the data set:5,5,5,5
calculate the SD, what you observe ?
(5)The Variance (σ 2 , s2 )
⇒Variance is another absolute measure of dispersion.
Coefficient of Variation(c.v)
The most important of all the relative measure of dispersion is the coefficient
of variation. .It is a pure number
Note 4.5. A distribution with smaller c.v than the other is taken as more
consistent (less variable) than the other.
⇒ Merits of c.v
• The c.v is used to compare the dispersion in different sets of data
particularly the data which differ in their means or differ in the units
of measurement.
• Used to know the consistency of the data. By consistency we mean
the uniformity in the values of the data distribution.
1
2
σ12 = [n1 σ12 + n2 σ22 + n1 (X̄1 − X¯12 )2 + n2 (X̄2 − X¯12 )2 ] (4.3.16)
n1 + n2
Some measures of variations Lecture notes (set by: Z)
Examples Some measures of variations 52
Example 4.4. For a group of 50 male workers the mean and standard de-
viation of their daily wages are 63 and 9 respectively. For a group of 40
female workers these values are 54 and 6 respectively. Find the mean and
variance of the combined group of 90 workers.
Solution:Here n1 = 50, X̄1 = 63, S12 = 81, n2 = 40, X̄2 = 54, s22 = 36.
1. Combined Arithmetic Mean=X̄c
= n1 X̄1 +n2 X̄2
n1 +n2
= 50(63)+40(54)
50+40
= 5310
90
= 59
2. Combined Variance=Sc2
¯ 2 ¯ 2
n1 [S12 +(X1 −barX 2
c )]+n2 [S2 +(X2 −barXc )]
= n1 +n2
50[81+(63−59)2 ]+40[36+(54−59)2 ]
= 50+40
4850+2240
= 90
7290
= 90
= 81
Properties of z score
An individuals z score has the same percentile rank as did that indi-
viduals original score.
The shape of the distribution of z scores is the same as that of the
original data.
The mean of a set of z scores is zero.
The variance of a group of z score is 1 (also SD is 1).
Xi − X̄ Xi − µ
Zi = , Zi = (4.3.19)
s σ
2. The rth central moment (moment about the mean) is given by:
Pn
i=1 (Xi − X̄)r
µr = (4.4.2)
n
X̄ − X̂
SK = (4.4.4)
s
Q1 + Q3 − 2X̃
SB = (4.4.5)
Q3 − Q1
Figure 4.3.
Note 4.7. Tchebychev’s Rule For any set of data, at least (1 − k12 ) of the
values lie with in k standard deviations of the mean (that is, have z −scores
between −k and +k).
Restated: at least(1 − k12 ) observations lie in the interval:(X̄ − ks, X̄ + ks
for sample)
Activity 4.5. Matt reads at an average (mean) rate of 20.6 pages per hour,
with a standard deviation of 3.2. What percent of the time will he read
between 15 and 26.2 pages per hour?