You are on page 1of 19

Probability & Statistics 11/4/2021

CHAPTER FOUR
MEASURES OF
VARIATION (DISPERSION)

The measures of central tendency describe that


the major part of values in the data set appears to
concentrate around a central value called average
with the remaining values scattered (distributed)
on either sides of that value.
But these measures do not reveal how these
values are dispersed (spread or scatter) on each
side of the central value.
The dispersion of values is indicated by the extent
to which these values tend to spread over an
interval rather than cluster closely around an
average.

Combined by Bacha E., ASTU 1


Probability & Statistics 11/4/2021

The term dispersion is generally used in two senses.


o Firstly, dispersion refers to the variations of the items
among themselves.
-If the value of all the items of a series is the same,
there will be no variation among different items of a
series.
o Secondly, dispersion refers to the variation of the
items around an average.
-If the difference between the value of items and the
average is large, the dispersion will be high and on
the other hand if the difference between the value
of the items and averaging is small, the dispersion
will be low.

Thus, dispersion is defined as scatteredness or


spreadness of the individual items in a given
series.
Objectives of measuring Variation:
o To judge the reliability of measures of central
tendency
o To control variability itself.
o To compare two or more groups of numbers in
terms of their variability.
o To make further statistical analysis.

Combined by Bacha E., ASTU 2


Probability & Statistics 11/4/2021

Absolute and Relative Measures of Variation


 Measures of Variation is divided into two.
1. Absolute Measures of Variation:
- Range - Mean deviation (M.D)
- Inter quartile range - Variance
- Quartile deviation (Q.D) - Standard deviation
2. Relative Measures of Variation:
i) Relative range (R.R)
ii) Coefficient of Q.D
iii) Coefficient of M.D
v) Coefficient of variation (C.V)

Absolute Measures of Variation


 Absolute measure is expressed in the same unit in which
the original data are given such as kilograms, tones, liters,
etc.
 It is suitable for comparing the variability in two
distributions having variables expressed in the same units
 It is not suitable for comparing the variability in two
distributions having variables expressed in different units.
Relative Measures of Variation
 Relative measures of variation are unit less, hence
suitable to compare variability.
 That is, relative measures of variation are suitable for
comparing the variability in two distributions having
variables expressed in different or the same units.

Combined by Bacha E., ASTU 3


Probability & Statistics 11/4/2021

Reading assignments
 Range
Mean deviation (M.D)
Inter quartile range
 Quartile deviation (Q.D)
Relative range (R.R)
 Coefficient of Q.D
 Coefficient of M.D

Variance and Standard deviation


Variance:- is the average of squared deviations from
the mean.
i) Population Variance (𝝈𝟐 )
 If we divide the variation by the number of values in the
population, we get something called the population
variance.
- Variance for individual series of data:
𝑵 𝟐 𝑵
𝒊=𝟏(𝑥𝑖 ;𝜇) 𝒊=𝟏 𝒙𝒊
𝝈𝟐 = , 𝝁=
𝑵 𝑵
- Variance for discrete F.D and continuous data
𝒌
𝒇𝒊 (𝑥 ;𝜇)𝟐
𝝈𝟐 = 𝒊=𝟏 𝑵 𝑖 where N is total frequency , k is
number of classes, for continuous data 𝑥𝑖 is the ith class mark

Combined by Bacha E., ASTU 4


Probability & Statistics 11/4/2021

ii) Sample Variance (𝑺𝟐)


 Sample variance for individual series of data
𝑛 n
2
𝑖<1(𝑥 𝑖 − 𝑥 ) 1
𝑆2 = = xi 2 − 𝑛𝑥 2
𝑛−1 𝑛−1
i<1
 Sample variance for discrete data arranged in F.D
- If the values have frequencies fi (i=1,2,…,k), then the
sample variance is given by:
𝑓𝑖 (𝑥𝑖 − 𝑥 )2 1
𝑆2 = = fi xi 2 − 𝑛𝑥 2
𝑛−1 𝑛−1
 Sample variance for continuous data
𝑓𝑖 (𝑥𝑖 − 𝑥 )2 1
2
𝑆 = = fi xi 2 − 𝑛𝑥 2
𝑛−1 𝑛−1
where 𝒙 is the sample arithmetic mean, 𝒙𝒊 is the class mark of the
ith class, fi is the frequency of the ith class and n= fi .

Standard Deviation
There is a problem with variances. Recall that the
deviations were squared. That means that the units
were also squared. To get the units back the same
as the original data values, the square root must be
taken.
Standard deviation is the positive square root of
variance.
Population Standard Deviation (𝜎)
𝜎 = 𝝈𝟐 where 𝜎 2 is the population variance.
Sample Standard Deviation ( S )
𝑆 = 𝑆 2 where 𝑆 2 is the sample variance.

Combined by Bacha E., ASTU 5


Probability & Statistics 11/4/2021

Examples
1. Find the variance and standard deviation of the
following marks of 10 students (out of 50): 21, 23,
25, 28, 30, 32, 38, 39, 46, 48.
Solution: From the data, n=10, using formula,
𝑛
𝑥= 𝑖=1 𝑥𝑖 <
21:23: …:48
< 𝟑𝟑
𝑛 10
𝑛 2
2 𝑖=1(𝑥𝑖 ;𝑥) (21;33)2 :(23;33)2 : …:(48;33)2
𝑆 = =
𝑛;1 10;1
798
= 𝟖𝟖. 𝟔𝟕 =
9
The s.d is: 𝑆 = 𝑆 2 = 88.67 = 𝟗. 𝟒𝟐

2. Find the variance and standard deviation of the


following discrete data.
𝑥𝑖 1 2 3 4 6 7 Total
𝑓𝑖 5 3 4 1 2 3 18
f𝑖𝑥𝑖 5 6 12 4 12 21 60
2 5 12 36 16 72 147 288
fi x i

Solution: n=18
𝑛
𝑥= 𝑖=1 𝑓𝑖 𝑥𝑖 <
5×1:3×2: …:3×7
<3.33 ≅ 𝟑
𝑛 18

𝑘 2
2
− 𝑛𝑥 2 288 − (18)(32 )
𝑖<1 𝑓𝑖 xi
𝑆 = = = 𝟕. 𝟒𝟏
𝑛−1 18 − 1
𝑆 = 𝑆 2 = 7.41 =2.72

Combined by Bacha E., ASTU 6


Probability & Statistics 11/4/2021

3. Find the variance and standard deviation of the following


continuous data.

C.L 1-5 6-10 11-15 16-20 Total


𝑓𝑖 4 1 2 3 10 = n
C.M(𝑥𝑖 ) 3 8 13 18
𝑓𝑖 𝑥𝑖 12 8 26 54 100
fixi2 36 64 388 972 1410

𝑘 𝑓 𝑥
𝑖=1 𝑖 𝑖 100
𝑥= 𝑛
< <1𝟎
10
𝑘 2 2
2
− 𝑛𝑥 1410 − (10)(102 ) 410
𝑖<1 𝑓𝑖 xi
𝑆 = = = = 𝟒𝟓. 𝟓𝟔
𝑛−1 10 − 1 9
𝑆 = 𝑆 2 = 45.56 = 𝟔. 𝟕𝟓

Properties of Variance & Standard deviation


If a constant is added to (or subtracted from) all the
values, the variance remains the same; i.e., for any
constant k, 𝑉𝑎𝑟 𝑥𝑖 ± 𝑘 = 𝑉𝑎𝑟(𝑥𝑖 )
If each and every value is multiplied by a non-zero
constant (k), the S.D is multiplied by 𝑘 and the
variance is multiplied by k2 ; i.e., 𝑉𝑎𝑟 𝑘𝑥𝑖 =
𝑘 2 𝑉𝑎𝑟(𝑥𝑖 ).
Generally,
𝑁𝑒𝑤 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 = 𝑘 2 × 𝑜𝑙𝑑 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒
𝑁𝑒𝑤 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 = 𝑘 × 𝑜𝑙𝑑 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛

Combined by Bacha E., ASTU 7


Probability & Statistics 11/4/2021

Example: Suppose that the value of 𝑥𝑖 is 7, 3, 4, 2 .


i) If on all the above observations 10 is added,
then find the new variance and s.d.
ii) Find the variance and s.d of 𝑦𝑖 = 4𝑥𝑖 − 6.
Solution: n = 4.
𝑛
𝑥= 𝑖=1 𝑥𝑖 <
7:3:4:2 16
< <4
𝑛 4 4
𝑛 2
2 𝑖=1(𝑥𝑖 ;𝑥) (7;4)2 :(3;4)2 : (4;4)2 :(2;4)2
𝑆 = = =
𝑛;1 4;1
14
= 4.67. s.d= 𝑺 = 𝟒. 𝟔𝟕 =2.16
3
i) The observations obtained by 𝒚𝒊 = 𝒙𝒊 + 𝟏𝟎

So, 𝑦𝑖 : 17, 13, 14, 12 . 𝑦 = 14.


𝑛
𝑖<1(𝑦𝑖− 𝑦)2
𝑆2 =
𝑛−1
(17;14)2 :(13;14)2 : (14;14)2 :(12;14)2
= = 𝟒. 𝟔𝟕
4;1
𝑺 = 𝟒. 𝟔𝟕 = 𝟐. 𝟏𝟔
ii) 𝑦𝑖 : 22, 6, 10, 2. 𝑦 = 4𝑥 − 6 = 4 × 4 − 6 = 10.
𝑛 2
2 𝑖=1(𝑦𝑖 ;𝑦)
𝑆 = 𝑛;1
= 74.72 and
𝑺 = 𝟕𝟒. 𝟕𝟐 = 𝟖. 𝟔𝟒
Exercise: Consider the 6 sample values xi: 54, 52, 53, 50, 51, and 52.
If all of the values of the observations multiplied by 3 then find the
variance and s.d of the new values of the observations.

Combined by Bacha E., ASTU 8


Probability & Statistics 11/4/2021

Properties…
The unit of variance is the square of the unit of
measurement of values. For example, if the variable is
measured in kg, the unit of variance is kg2.
 Variance and s.d are calculated based on all the
observations/data in the series.
Both the variance and the standard deviation give more
weight to extreme values and less to those which are near to
the mean.
S.d is considered to be the best measure of dispersion.
If the values of two series have different unit of
measurement, then we can not compare their variability just
by comparing the values of their respective standard
deviations.

Coefficient of Variation (C.V)


 Coefficient of variation is used to compare the variability of
two or more different series.
 Coefficient of variation is the ratio of the standard deviation
to the arithmetic mean, usually expressed in percent.
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
𝐶. 𝑉 = × 100%
𝑚𝑒𝑎𝑛
i) 𝐶. 𝑉 = 𝜎𝜇 × 100% for population data
where 𝜎 is the population standard deviation and 𝜇 is population mean.
𝑖𝑖) 𝐶. 𝑉 = 𝑆𝑥 × 100% for sample data
where 𝑆 is the sample standard deviation and x is sample mean.
Remark: A distribution having less coefficient of variation
is said to be less variable or more consistent or more uniform
or more homogeneous.

Combined by Bacha E., ASTU 9


Probability & Statistics 11/4/2021

Example: Last semester, the students of Civil Engineering and Chemical


Engineering departments took probability and Statistics course. At the
end of the semester some students randomly taken and the following
information was recorded.

Department Civil Engineering Chemical Engineering

Mean score 85 65

Standard deviation 25 12

Compare the relative variations of the two departments’ scores using the
appropriate way.
Solution:
𝐶. 𝑉𝑐𝑖𝑣𝑖𝑙 = 𝑥𝑆𝑐𝑖𝑣𝑖𝑙 ×100% < 25
85
×100% < 𝟐𝟗.𝟒𝟏%
𝑐𝑖𝑣𝑖𝑙

𝐶. 𝑉𝑐ℎ𝑒𝑚 = 𝑥𝑆𝑐ℎ𝑒𝑚 ×100% < 12


65×100% <𝟏𝟖.𝟒𝟔%
𝑐𝑐ℎ𝑒𝑚

Since the CV of Civil Engineering students is greater than that of Chemical


Engineering students, we can say that there is more dispersion relative to the
mean in the distribution of Civil Engineering students’ scores compared with
that of Chemical Engineering students.

Standard Scores (Z-Scores)


A standard score is a measure that describes the
relative position of a single score in the entire
distribution of scores in terms of the mean and
standard deviation.
 Standard score tells us how many standard
deviations a specific value is lies above (positive z-
score) or below (negative z-score) the mean value
of the data set.
𝑋;𝜇
i) 𝑍 = 𝜎
for population data
𝑋;𝑋
ii) 𝑍 = 𝑆
for sample data

Combined by Bacha E., ASTU 10


Probability & Statistics 11/4/2021

Example: What is the Z-score for the value of 14 in the


following sample data set?
3 8 6 14 4 12 7 10
Solution: 𝑋 = 8, S.D = 3.8173, X = 14. Thus,
𝑋;𝑋 14;8
Z= = ≈ 1.57.
𝑆 3.8173
 Therefore, the data value of 14 is located 1.57 standard
deviations above the mean 8 because the z-score is
positive.
Example: Suppose that a student scored 66 in Statistics and
80 in Mathematics. The score of the summary of the courses
is given below.

 In which course did the student scored better as compared to his/her


classmates?
Solution:

𝑋 − 𝑥 66 − 51 15
𝑍𝑠𝑡𝑎𝑡 = = = = 1.25
𝑆 12 12
𝑋 − 𝑥 80 − 72 8
𝑍𝑚𝑎𝑡ℎ = = = = 0.5
𝑆 16 16
 From these two standard scores, we can conclude that the student
has scored better in Statistics course relative to his/her classmates
than in Mathematics course.

Combined by Bacha E., ASTU 11


Probability & Statistics 11/4/2021

Moments, Skewness and Kurtosis


 The measures of central tendency and variation discussed
in previous one do not reveal the entire story about a
frequency distribution.
 Two distributions may have the same mean and standard
deviation but may differ in their shape of the distribution.
 Further description of their characteristics is necessary that
is provided by measures of skewness and kurtosis.
Moments
 The moments of a distribution are the arithmetic mean of
the various powers of the deviations of items from some
number.
 We shall use it in the study of Skewness and Kurtosis of
statistical distribution.

Moments about the Origin (𝑀𝑟 )


If X is a variable that assume the values X1, X2,…, Xn
then 𝑟 𝑡ℎ moment about the origin is
𝑛 𝑟
𝑖=1 𝑋𝑖
𝑀𝑟 = , where 𝑟 = 0, 1, 2, 3, …
𝑛
 The 𝑟 𝑡ℎ moments about the origin for discrete and
continuous frequency distribution is
𝑘 𝑟
𝑖=1 𝑓𝑖 𝑋𝑖
𝑀𝑟 =
𝑛
where 𝑓𝑖 is the frequency of 𝑋𝑖 .
𝑋𝑖 is the midpoint in the case of continuous
frequency distribution
Note that: 𝑀0 = 1, 𝑀1 = 𝑋

Combined by Bacha E., ASTU 12


Probability & Statistics 11/4/2021

Moments about the Mean (Central Moments)


If X is a variable that assume the values X1, X2,…, Xn
then 𝑟 𝑡ℎ moment about the mean is
𝑛
𝑋𝑖 ;𝑋 𝑟
𝑀𝑟′ = 𝑖=1
𝑛
𝑡ℎ
The 𝑟 moments about the mean for discrete and
continuous frequency distribution is
𝑘
𝑖=1 𝑓𝑖 𝑋𝑖 ;𝑋 𝑟
𝑀𝑟′ =
𝑛
where 𝑓𝑖 is the frequency of 𝑋𝑖 .
𝑋𝑖 is the midpoint in the case of continuous
frequency distribution
Note that: 𝑀2′ = 𝑆. 𝑑 2 if it is assumed 𝑛 = 𝑛 − 1.

Example: Find the first four moments about the mean for the following
individual series
𝑋𝑖 : 3 6 8 10 18
45
Solution: n=5, 𝑋 = =9
5

𝑿𝒊 (𝑿𝒊 − 𝑿 ) 𝟐 𝟑 𝟒
S.No 𝑿𝒊 − 𝑿 𝑿𝒊 − 𝑿 𝑿𝒊 − 𝑿

1 3 -6 36 -216 1296

2 6 -3 9 -27 81

3 8 -1 1 -1 1

4 10 1 1 1 1

5 18 9 81 729 6561

Total 2 3 4
𝑋 = 45 𝑋−𝑋 =0 𝑋−𝑋 = 128 𝑋 −𝑋 = 486 𝑋−𝑋 = 7940

𝑋𝑖 ;9 1 𝑋𝑖;9 2 128 𝑋 ;9 3 486


𝑀1′ = = 0, 𝑀2′ = = = 25.6,𝑀3′ = 𝑖
= = 97.2
5 5 5 5 5
𝑋𝑖 − 9 4 7940
𝑀4′ = = = 1588
5 5

Combined by Bacha E., ASTU 13


Probability & Statistics 11/4/2021

Skewness
 Skewness is the degree of asymmetry or departure from
symmetry of a distribution.
 A skewed frequency distribution is one that is not
symmetrical.
 Skewness is concerned with the shape of the curve not
size.
Symmetry
 A distribution is said to be symmetrical when the value is
uniformly distributed around the mean (distribution of the
data below the mean and above the mean are equal).
 In a symmetrical distribution, the mean, median and mode
coincide (i.e., mean = median = mode).

Negatively Skewed distribution: if one or more


extremely small observations are present i.e. mean is
smaller than median and mode.
If the value of mode is greater than the mean,
skewness is said to be negative.
A negatively skewed distribution contains some
values that are much smaller than the majority of
observations.
 mean < median < mode i.e., the distribution is
negatively skewed/skewed to the left.

Combined by Bacha E., ASTU 14


Probability & Statistics 11/4/2021

Positively skewed distribution: if one or more


observations are extremely large i.e. mean is greater
than median and mode
If the value of mean is greater than the mode,
skewness is said to be positive.
A positively skewed distribution contains some
values that are much larger than the majority of
other observations.
mode < median <mean i.e., the distribution is
positively skewed/skewed to the right

Note that: In moderately skewed distributions the


averages have the following relationship.
(Mean – mode) = 3(mean - median)

Combined by Bacha E., ASTU 15


Probability & Statistics 11/4/2021

How to check the presence of


skewness in a distribution?
Skewness present in the data if:
 the graph is not symmetrical.
 the mean, median and mode do not coincide.
 the sum of positive and negative deviations from the
median is not zero.
 the frequencies are not similarly distributed on
either side of the mode.

Measures of skewness (𝜶𝟑 )


A measure of skewness gives a numerical
expression for and the direction of asymmetry in a
distribution
It gives information about the shape of the
distribution and the degree of variation on either
side of the central value.
The three most commonly used measures of
skewness are
- Pearson’s coefficient of skewness,
- Bowley’s coefficient of skewness and
- Coefficient of skewness based on moments.

Combined by Bacha E., ASTU 16


Probability & Statistics 11/4/2021

1. Pearson’s coefficient skewness


(Pearsonian coefficient of skewness)
𝑀𝑒𝑎𝑛;𝑀𝑜𝑑𝑒
𝛼3 = 𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
2. Bowley’s Coefficient of Skewness
Bowley’s coefficient of skewness is based on
quartiles.
The formula for calculating coefficient of skewness
is:
(𝑄3 ;𝑄2);(𝑄2 ; 𝑄1 ) 𝑄3 :𝑄1 ; 2𝑄2
𝛼3 = =
𝑄3 ;𝑄1 𝑄3 ;𝑄1

3. Moment Coefficient of Skewness


Moment coefficient of skewness is based on
moments.
The formula for calculating coefficient of skewness
is:
𝑀′3 𝑀′3
𝛼3 = 3/2 = 𝜎3
𝑀′2
𝑛
where, M'r = 𝑖<1(𝑥𝑖 − 𝑥 )𝑟 /𝑛

Combined by Bacha E., ASTU 17


Probability & Statistics 11/4/2021

 The shape of the curve is determined by the value of 𝛼3


 𝛼3 > 0, the distribution is positively skewed/skewed to
the right, i.e mode < median <mean
smaller observations are more frequent than larger
observations.
 That is, the majority of the observations have a value below
an average.
 α3 = 0, the distribution is symmetric, i.e. mean = mode =
median
 α3 < 0, the distribution is negatively skewed/skewed to
the left. i.e., mean < median < mode
smaller observations are less frequent than larger
observations.
 That is, the majority of the observations have a value above
an average.

Kurtosis
Kurtosis is a measure of peakedness of a
distribution.
The degree of kurtosis of a distribution is
measured relative to the peakedness of a normal
curve.
If a curve is more peaked than the normal curve it
is called ‘leptokurtic’;
if it is more or flate-topped than the normal curve
it is called ‘platykurtic’ or flat-topped.
The normal curve itself is known as ‘mesokurtic’.

Combined by Bacha E., ASTU 18


Probability & Statistics 11/4/2021

Measures of Kurtosis (𝜶𝟒 )


The moment coefficient of kurtosis:
𝑀′4 𝑀′4
α4 = 𝑀′22
= 𝑆4

The peakedness depends on the value of 𝛼4


 𝛼4 > 3  the curve is leptokurtic,
 𝛼4 = 3  the curve is mesokurtic,
 𝛼4 < 3  the curve is platykurtic.

Example: Based on the following data:


𝑀′0 = 1, 𝑀′1 = 0, 𝑀′2 = 1.6, 𝑀′3 = -2.4, 𝑀′4 = 5.8
a/ Find the coefficient of skewness and discuss the distribution
type.
b/ Find the coefficient of kurtosis and discuss the distribution
type.
Solution:
𝑀′3 ;2.4
a) 𝛼3 = 3/2 = 1.63/2 = -1.19 < 0, the distribution is
𝑀′2
negatively skewed.
𝑀′4 5.8
b) 𝛼4 = 𝑀′22
= 1.62 = 2.26 < 3, the curve is platykurtic.

Combined by Bacha E., ASTU 19

You might also like