You are on page 1of 16

Lesson 3.

2 Measures of

Variation/DispersionLEARNING OBJECTIVES

At the end of this lesson, you will be able to:

1. calculate the various measures of variation


2. interpret the numerical values obtained from descriptive measures of
variation, skewness and kurtosis.

LET’S EXAMINE
In previous lessons, we discussed different ways of describing a set of data
by locating the typical value/score point that summarizes set of values shown
upon calculating the measures of central tendency, mean, median and mode.

One interesting question that we might ask ourselves is “What happens if


two sets of data have the same mean? How do we interpret these mean
values? The answer to these questions is the goal of the measures of variation.
Variability usually accompanies a measure of central tendency as basic
descriptive statistics for a set of values.

Let us examine the three sets of scores obtained in a Statistics quiz:

Set A: 8, 9, 9, 10, 10, 10, 11, 11, 11, 11


Set B: 6, 7, 8, 9, 10, 10, 11, 12, 13, 14
Set C: 2, 3, 4, 6, 10, 10 13, 16, 17, 19

1. What are the mean median and modal scores of each set of data? 2.
Which set of scores has the lowest variability? medium variability? highest
variability? Explain.
You have noticed that scores in set A, B and C are arranged in
asceding order that makes it easier for us to identify the three measures of
central tendency. Mean, median and modal scores are shown in the table
below:
Data Set Mean Median Mode

Set A ��̅= ∑����= 100 10 10


10= 10

Set B ��̅= ∑����= 100 10 10


10= 10

Set C ��̅= ∑����= 100 10 10


10= 10

Obviously, the above sets of data are normally distributed because they
have equal values in terms of its mean median and mode.

But which set of scores do you think has the lowest variability? medium
variability? Highest variability? Without doing any mathematical computation, it
is evident that Set A has lowest variability because scores are so closed to each
other which made them more homogenous. Set B has medium variability
because scores are a little bit scattered to each other and finally Set C shows the
highest variability because it has the most scattered scores.

The data sets above can be represented graphically with the use of
different normal curves below:

Legend:

Set A

Set B
__
__
Set C
_

Source: https://cyntegrity.com/clinical-data-quality-article/variability-graph/

LET’S EXPLORE

In order to describe a set of values in a distribution, it is necessary to


calculate any measures of central tendency such as the mean together
with measures of variation that would indicate how the scores are spread
out/scattered along the scale of the distribution. In inferential statistics,
variability provides a measure of how accurately any individual score or
sample represents the entire population. When the population variability is
small, all of the scores are clustered close together and any individual
score or sample will necessarily provide a good representation of the
entire set. However, when variability is large and scores are widely spread,
it is easy for one or two extreme scores to give a distorted picture of the
general population.

The measure of spread is known as the measure of variation or


dispersion. Five popularly known measures of variations are the Range (R),
Quartile Deviation (QD), Mean Deviation (Md), Variance (��2) and
Standard Deviation (s)

Measures of Variation/Dispersion for Ungrouped Data

1. The Range (R)

The Range is defined as the difference between the highest or maximum


value and the lowest or minimum value in a set of data. It is the simplest
variability to compute. Typically, a large range value indicates greater
dispersion in the data while a small range value indicates that there is less
dispersion in the data.

Range= Highest Value – Lowest Value


R = HV - LV

Example 1. Calculate the Range of the following sets of scores


obtained in a Statistics quiz:

Set A: 8, 9, 9, 10, 10, 10, 11, 11, 11, 11


Set B: 6, 7, 8, 9, 10, 10, 11, 12, 13, 14
Set C: 2, 3, 4, 6, 10, 10 13, 16, 17, 19
Solution:

Set A: R = 11- 8
=3
Set B: R = 14 - 6
=8

Set C: R = 19 – 2
= 17
Among the three sets of data above, Set C has the highest
variability in terms of Range.
2. Quartile Deviation (QD)

When you take half of the difference between the 3rd and the 1st
quartiles of a distribution, the result is known as the quartile deviation. Quartile
deviation is far better than range because it considers 50% of the values in a
distribution. This is more desirable to use than the range when the distribution to
be described is skewed or when the median is the only measure of central
tendency that is available.

Quartile deviation
formula:

QD = ��3−��1
2

https://www.slideshare.net/kathy_mac/quartile-deviation-72029607
Example 2. Find the quartile deviation of the set of scores below: Data Set:

2, 3, 4, 6, 10, 10 13, 16, 17, 19

Let us recall the method of solving the lower quartile (��1)and the
upper quartile (��3) using the Mendenhall and Sincich Method of Finding
Quartiles (Lesson 4 - Other Measures of Position)

a. In calculating for ��1


i. Determine the where ��1 is located along the data set
��1 = 14(�� + 1)
��1 = 14(10 + 1)
��1 = 14(11)
��1 = 2.75, (or 2.75th score which implies that ��1 is located between the 2nd
and the 3rd score of the set of data above.)

ii. Subtract the 2nd score from the 3rd score: 4-3 =1
iii. Multiply this difference with the decimal part of the answer in i. 1(0.75) =
0.75
nd
iv. Add this product to the 2 score.
3 + 0.75 = 3.75
v. ��1 = 3.75, this implies that 25% of the scores are below or equal to
3.75

b. In solving for ��3, you may follow the steps for determining ��1 above.

i. ��3 = 34(�� + 1)
��3 = 34(10 + 1)
��3 = 34(11)
��3 = 8.25, (or 8.25th score which implies that ��3 is located
between the 8th and the 9th score of the set of data above.)

ii. Subtract the 8th score from the 9th score: 17 – 16 = 1


iii. Multiply this difference with the decimal part of the answer in i. 1(0.25) =
0.25
th
iv. Add this product to the 9 score.
16+ 0.25 = 16.25
v. ��3 = 16.25, this implies that 75% of the scores are below or equal to
16.25

Solving for quartile deviation:

QD = ��3−��1
2
= 16.25−3.75

2
=12.5 2

= 6.25 (This implies that the difference between the two values ��1 ������
��3 known as the interquartile range or (IQR) is12.5 and half this
difference/variation is 6.25 which is the quartile deviation (QD) of the distribution.
3. The Mean (Average) Deviation (Md)
It is a measure of dispersion derived by computing the average of the
absolute deviations of the individual scores of a set of data from the measure of
central tendency such as the mean. The value of the mean deviation about the
mean is a measure of how closely grouped data values are. It answers the
question, “How close to the mean, on average, are the data values?”

Sample mean deviation formula: Md = ∑|��−��̅|


��

Example 3. Find the sample mean deviation of the set of scores


below:

Data Set: 6, 7, 8, 9, 10, 10, 11, 12, 13, 14

Steps in finding the mean deviation of set of values/scores in a


distribution i. Find the mean of all values.

��̅=6+7+8+9+10+10+11+12+13+14
10
��̅= 10
ii. Set-up a table and find the distance of each value from that of the
mean expressing the difference in terms of its absolute value
Score Mean Difference of Absolute value
x ��̅ each value from of each
the deviation
mean |�� − ��̅|
x-��̅

6 10 -4 4

7 -3 3

8 -2 -10 2

9 -1 1

10 0 0

10 0 0

11 1 1

12 2 10 2
13 3 3

14 4 4

∑(�� − ��̅) = 0 ∑|�� − ��̅| =20

Take note: The sum of all of


the plus and minus
differences is always equal
to zero.

iii. Calculate the mean of the absolute deviations which is equal to the
mean deviation.

Md = ∑|��−��̅|
��= 10 = 2
20

iv. Interpret the result.

With this data set, we can say that the mean is 10 and the
average distance from that mean is 2.0. Note that some numbers are closer
than 2.0 and some are farther. But 2.0 is known as the average distance.
4. Sample Variance (����)

The variance is a numerical value used to indicate how widely individuals


in a group vary. If individual observations vary greatly from the group mean, the
variance is big; and vice versa. In short, it measures how far a set of data is
spread out. A variance of zero indicates that all of the data values are identical.
A high variance indicates that the data points are very spread out from the
mean, and from one another.

Variance is the average of the squared distances/differences from


each point to the mean, hence it has the following formulas:

Formula 1. ��2 = ∑(��−��̅)2


��−1

Formula 2. ��2 =��∑��2−(∑��)2


��(��−1)
Example 4. Find the sample variance of the set of scores
below: 8, 9, 9, 10, 10, 10, 11, 11, 11, 11

Solution 1. Steps in finding the sample variance (��2) of set of


values/scores in a distribution using
Formula 1: �� =
2 ∑(��−��̅)2

��−1

i. Find the mean of all values.

��̅=8+9+9+10+10+10+11+11+11+11
10

��̅= 10
ii. Set-up a table and find the sum of the squared
distances/differences from each point to the mean.
Score Mean Difference of Squared
x ��̅ each value from difference from
the each point to
mean the mean
x-��̅ (�� − ��̅)2

8 10 -2 4

9 -1 1

9 -1 1

10 0 0

10 0 0

10 0 0

11 1 1

11 1 1

11 1 1
11 1 1

∑(�� − ��̅)2 =10

iii. Find the sample variance by substituting the sum of squared


difference from each point to the mean obtained in the table above.

��2 = ∑(��−��̅)2
��−1

��2 = 10
10−1

��2 = 1.11

Solution 2. Steps in finding the sample variance (��2) of set of


values/scores in a distribution using

Formula 2: �� =
2 ��∑��2 2
−(∑��)
��(��−1)
i. Set-up a table and square each value/score

ii. Find the values of ∑�� and ∑��2


Score Square of each
X value/score
��2

8 64

9 81

9 81

10 100

10 100

10 100

11 121

11 121
11 121

11 121

∑X=100 ∑��2= 1010

iii. Find the sample variance by substituting the values obtained for
∑�� and ∑��2found from the table above.

��2 =��∑��2−(∑��)2
��(��−1)

��2 = 10(1010)−(100)2
10 (10−1)
�� = 1.11
2

A small variance such as 1.11 indicates that the data points tend
to be very close to the mean, and to each other.
5. Sample Standard Deviation (s)

Standard deviation (s) looks at how spread out a group of values is from
2
the mean by means of looking at the square root of the variance (�� ).

Standard deviation therefore is calculated as the square root of variance


by figuring out the variation between each data point relative to the mean. If
the points are further from the mean, there is a higher deviation within the set of
data however, if they are closer to the mean then there is a lower deviation. This
implies that the more spread out the group of numbers are, the higher the
standard deviation is.

Formulas in finding the sample standard deviation:

̅2̅

Formula 1: s=√∑(��−��)
��−1

Formula 2: s = √��∑��2−(∑��)2
�� (��−1)
Since the standard deviation is calculated as the square root of variance,
we can use now the data for variance found in #4 above.

∑(��−��
Variance: �� =
2
̅)2

��−1 Variance: ��2 =��∑��2−(∑��)2


��(��−1)

̅̅2

��−1 Standard Deviation: s = √��∑��2−(∑��)2


Standard Deviation: s = √∑(��−��)

��(��−1)

10−1 s = √10(1010)−(100)2
s = √10
s = 1.05 s = 1.05
10(10−1)

s = √109 s = √100 90

s = √1.11 s = √1.11

A standard deviation value closer to 1 indicates that there


exist minimal dispersions of scores for the given set of data.
Measures of Variation/Dispersion for Grouped Data

We have learned that the measures of central tendency are descriptive


measures of statistics that represent the center point or typical value of a dataset
and indicate where most values in a distribution fall. On the other hand,
measures of dispersion focus on describing how data/values are spread-out
along the data sets.
For grouped frequency distribution of data, there are also specific
formulas that can be applied in calculating the Range, Quartile Deviation,
Mean Deviation, Sample Variance and Sample Standard Deviation as shown in
the succeeding discussions.

Let us make use again of Table 1 as basis of our computations of the


measures of variation/dispersion for grouped data.

Table 1
Frequency Distribution of the Test Scores in Statistics of Fifty Students
21-25 5

26-30 8
21-25 is the lowest class interval with a
lower boundary of 20.5 31-35 14

36-40 12

41-45 6
51-55 is the highest 46-50 4
class interval with an upper boundary of
54.5
51-55 1

1. The Range (R) i=5 n=50


Class Class
Interval X Frequency
F
The Range is determined by subtracting the lowest/minimum value from
the maximum/highest value. In the case of grouped data, the highest value
refers to the upper boundary of the highest-class interval (����ℎ) and the
lowest value refers to the lower boundary of the lowest class interval (����ℎ)

Thus, the Range of the above frequency distribution is


computed using the formula

R = ����ℎ - ����ℎ
R = 55.5 – 20.5
R = 35
2. The Quartile Deviation (QD)

Again, quartile deviation is obtained by getting one-half of the


difference of the upper quartile ��3 and the lower quartile ��1.

Its formula is,


QD =��3−��1
2

The computed values for ��3 and ��1 for the above data (Table 1) were
39.88 and 30.19 respectively. (Refer to previous Lesson 4 - Other Measures of
Position).

Therefore, QD = ��3−��1
2

QD =39.88−30.19
2

QD =9.69
2

QD = 4.85

3. The Mean Deviation (Md)

For grouped data, mean deviation can be solved using the formula below:

Md = ∑��|����−��̅|
��where ∑ denotes taking the sum
f frequency
̅
���� midpoint/class mark �� mean
n total number of observations
̅
Let us add the necessary columns in Table1 such as |���� − ��| and
̅
f|���� − ��| that will help us determine the mean deviation of the given
grouped frequency distribution.
Table 1
Frequency Distribution of the Test Scores in Statistics of Fifty Students
Class Class Midpoint/ Mean Absolute value Product of the
of the frequency and
Interval Frequen Class
��̅ difference of difference of
X cy f mark
���������� ����������
����
��̅ ��̅
̅ ̅
|���� − ��| f|���� − ��|
21-25 5 23 35.2 |23 − 35.2|= 12.2 5(12.2) = 61.0

26-30 8 28 |28 − 35,2| = 7.2 8(7.2) = 57.6

31-35 14 33 |33 − 35.2|= 2.2 14(2.2) = 30.8

36-40 12 38 |38 − 35.2| = 2.8 12(2.8) = 33.6

41-45 6 43 |43 − 35.2| = 7.8 6(7.8) = 46.8

46-50 4 48 |48 − 35.2|= 12.8 4(12.8) = 51.2

51-55 1 53 |53 − 35.2|= 17.8 1(17.8) = 17.8

i=5 n=50 ∑ f|���� − ��| =


̅
298.8

Note: The mean value of 35.2 (��̅=35.2) is taken from the previous computation of mean
found in Lesson 2.2- Measures of Central Tendency

Hence, the mean deviation Md = ∑��|����−��̅|


��

Md = 298.8
50

Md = 5.98

4. The Variance (����)

For grouped frequency distribution, variance formula is

��2 = ∑��(����−��̅)2
��−1

Using again the data in Table 1 and adding the necessary columns will
make the computation of variance for group data understandable it is because
mathematical computations are completely provided.
Table 1
Frequency Distribution of the Test Scores in Statistics of Fifty Students
Class Class Midpoint/ Mean Square of the Product of the
difference of frequency and
Interval Frequen Class
��̅ the square of
X cy f mark ���������� ��̅
the
���� ̅ difference of
(���� − ��)2
����������

��̅
̅
f(���� − ��)2

21-25 5 23 35.2 (23 − 35.2)2 = 148.84 5(148.84) = 744.20

26-30 8 28 (28 − 35.2)2 = 51.84 8(51.84) = 414.72

31-35 14 33 (33 − 35.2)2 = 4.84 14(4.84) = 67.76

36-40 12 38 (38 − 35.2)2= 7.84 12(7.84) = 94.08

41-45 6 43 (43 − 35.2)2 = 60.84 6(60.84) = 365.04

46-50 4 48 (48 − 35.2)2 = 163.84 4(163.84) = 655.36

51-55 1 53 (53 − 35.2)2 = 316.84 1(316.84) = 316.84

i=5 n=50 ̅
∑ f(���� − ��)2 =
2658

Solving the variance ��2 = ∑��(����−��̅)2


��−1

��2 = 2658
49

��2 = 54.24

5. The Standard Deviation (s)

We all know that the standard deviation is just the square root of the
variance, thus,

s = √∑��(����−��̅)2
��−1=

s = √2658
49
s = √54.24

s = 7.36

You might also like