You are on page 1of 21

CHAPTER THREE The most commonly used types of measures

of central tendency are,


MEASURE OF CENTRAL TENDENCY
 The Mean (Arithmetic, Geometric
When we want to make comparison between and Harmonic)
groups of numbers it is good to have a single  The Mode
value that is considered to be a good  The Median
representative of each group. This single  Quantiles (Quartiles, Deciles and
value is called the average of the group. Percentiles)
Averages are also called measures of central
tendency. The choice of these averages depends up
on which best fit the property under
3.2 Objective of measure of central discussion
tendency
Important characteristics of good average
 To determine single value
(Measures of Central Tendency)
around which other value
concentrate. It is easy to calculate and understand
 To facilitate comparison. It is based on all the observations
 To make further during computation.
statistical analysis It is rigidly defined.
It is not affected by the extreme
Types of measures of central tendency value if a few very small and very
large items is presented in the data.

The Mean

The mean is also known as the arithmetic average, is the sum of all values, divided by the total
number of values.

A. Arithmetic Mean (AM) of Individual Series:

Let X be a variable which takes values x1 ,x2 ,x3 ,…………….,xn. In a sample size of n from a
population of size N for n < N then A.M. of a set of observations is the sum of all values in a series
divided by the number of items in the series.

Their arithmetic mean is

x1+x2+x3+x4+x5+⋯+xn ∑𝑛𝑖
𝑖 𝑥𝑖
𝑋̅ = = For raw data
𝑛 𝑛

Arithmetic mean of discrete frequency distribution:

In discrete frequency distribution, the arithmetic mean becomes:

PREPARED BY: ABDULMENAN M. (MSc) Page 1 of 21


𝑛𝑖
∑ 𝑓𝑖𝑥𝑖
𝑋̅= 𝑖∑ fi where, ∑ 𝑋𝑖𝑓𝑖= the sum of the products of observations with their respective
frequencies and ∑ 𝑓𝑖= n which is the sum of the frequencies.

Example: The following table gives the wages paid to 125 workers in a factory. Calculate the
arithmetic mean of the wages.

Wages (in birr): 200 210 220 230 240 250 260

No. of workers: 5 15 32 42 15 12 4

Solution:

∑ 𝑥𝑖𝑓𝑖= (200*5) + (210*15) + (220*32) +...+ (260*4) = 28490

N = ∑ 𝑓𝑖= 5+15+3+…………+4 = 125


𝒏𝒊
̅ =∑𝒊 𝒇𝒊𝒙𝒊 = 28490= 227.92birr
𝑿 ∑ 𝐟𝐢 125

Arithmetic Mean of Grouped Data (Continuous frequency distribution):

Simple arithmetic mean for continuous frequency distribution is given by


𝒏𝒊
̅ =∑𝒊 𝒇𝒊𝑴𝒊
𝑿 where, Mi is midpoint of class interval.
∑ 𝐟𝐢

Example: The following table gives the marks of 58 students in introduction to Statistics.
Calculate the average marks of this group.

marks 0-10_ 10-20 20-30 30-40 40-50 50-60 60-70


No. students 4 8 11 15 12 6 2

Solution=

Midpoint (mi) =(0+10)/2=5,(10+20)/2=15……….(60+70)/2=65

∑ 𝑓𝑖= 4+8=11+………+2= 58

∑ 𝑀𝑖𝑓𝑖= (5*4) + (15*8) +……..+ (65*2) =1940


𝒏𝒊
̅ =∑𝒊 𝒇𝒊𝑴𝒊 = 𝟏𝟗𝟒𝟎 = 33.45 marks.
𝑿 ∑ 𝐟𝐢 𝟓𝟖

 Merits of arithmetic mean:


 It is easy to calculate and understand.
 All observation involved in its calculation.

PREPARED BY: ABDULMENAN M. (MSc) Page 2 of 21


 The mean is used in computing other statistics, such as the variance.
 It is Unique: - a set of data has only one mean.
 It can be used for further statistical treatment comparison of means, test
of means.
 Demerits of arithmetic mean:
 The mean cannot be computed for the data in a frequency distribution that
has an open-ended class.
 The mean is affected by extremely high or low values, called outliers, and
may not be the appropriate average to use in these situations.
 It cannot be computed for qualitative data (intelligence, honesty, beauty)
which can’t be measured quantitatively.𝑥̅

Special properties of Arithmetic mean:

1. The sum of the deviations of a set of items from their mean is always zero. i.e.
∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ ) = 0.
2. The effect of transforming original series on the mean.
a) If a constant k is added/ subtracted to/from every observation then the new mean
will be the old mean± k respectively.
b) If every observations are multiplied by a constant k then the new mean will be
k*old mean.
3. If ̅𝑥 1 is the mean from n1 observations
If ̅𝑥 2 is the mean from n2 observations
.
If ̅𝑥 k is the mean from nk observations

Then the mean of all the observation in all groups often called the combined mean is given
by:

̅𝑥 1n1 + ̅𝑥 2n2+⋯…. ̅𝑥 k nk ∑𝑘
𝑖=1 ̅
𝑥 ini
̅𝑥 c= = ∑𝑘
𝑛1+𝑛2+⋯…+𝑛𝑘 𝑖=1 ni

Example: In a class there are 30 females and 70 males. If females averaged 60 in an examination
and boys averaged 72, find the mean for the entire class.

Solutions:

Females Males

̅𝒙 1=60 ̅𝒙 2=72

n1=30 n2=70
̅𝒙 𝟏𝐧𝟏 + ̅𝒙 𝟐𝐧𝟐 (𝟔𝟎∗𝟑𝟎)+(𝟕𝟐∗𝟕𝟎)
̅𝒙 c = = = 68.40
𝒏𝟏+𝒏𝟐 𝟑𝟎+𝟕𝟎

PREPARED BY: ABDULMENAN M. (MSc) Page 3 of 21


4. Correcting the Misread (wrong) observation for arithmetic mean:

Suppose we compute the mean of n observations and later it was found that one ,two or
more of the observations were wrongly copied down .it is now required to compute the corrected
mean by replacing the wrong observations by the corrected ones. In other word, for correcting
the incorrect value of mean, first we find the corrected ∑ 𝑥. For this we subtract the wrong
items from the incorrect ∑ 𝑥 and add to it the correct items. Finally, on dividing the corrected ∑ 𝑥
by the number of observations, we get the corrected mean.

Example: The average marks of 80 students were found to be 40. Later, it was discovered that a
score of 54was misread as 84. Find the corrected mean of the 80 students.

Solution: We are given N = 80, 𝑥̅ = 40


∑𝑥
𝑥̅ = = 𝐹𝑟𝑜𝑚 𝑡ℎ𝑖𝑠 𝑤𝑒 𝑜𝑏𝑡𝑎𝑖𝑛 ∑ 𝑥=N * 𝑥̅ = 80*40 = 3200 But due to the error discovered,
𝑁
∑ 𝑥=3200 is not correct.

The correct ∑ 𝑥 = incorrect ∑ 𝑥- misread observation + correct observation

= 3200 – 84 + 54=3170.
𝐓𝐡𝐞 𝐜𝐨𝐫𝐫𝐞𝐜𝐭 ∑ 𝒙 𝟑𝟏𝟕𝟎
̅) =
Therefore the corrected mean ( 𝒙 = = 39.625
𝑵 𝟖𝟎

Exercise1: The mean of a set of 100 observations were found to be 40. But my mistake a value
50 was taken in place of 40 for one observation. Re-calculate the correct mean.

B. Weighted Mean:

One of the limitations of the arithmetic mean is that it gives equal importance (weight) to all the
items in the Series.

 When a proper importance is desired to be given to different data a weighted mean


is appropriate. For example, salaries paid should be weighted according to relative
importance.
 Weights are assigned to each item in proportion to its relative importance.
 Let X1, X2, …Xn be the value of items of a series and W1, W2, …Wn their
corresponding weights , then the weighted mean denoted as ̅𝑥 w is defined as:

∑𝒏𝒊
𝒊 𝑿𝒊𝑾𝒊
̅𝑥 w= Where, ̅𝒙 w is weighted mean, Wi= the weights attached to values of
∑ 𝐖𝐢
the variable and Xi= the values of the variable.

Example: Suppose a student has secured the following marks in three tests: Mid-term test= 30,
Laboratory = 25 and Final exam= 20.The simple arithmetic mean will be (30+25+20)/3 = 25.
However, this will be wrong if three tests carry different weights on the basis of their relative
PREPARED BY: ABDULMENAN M. (MSc) Page 4 of 21
importance. Assuming that the weights assigned to the three tests are 2, 3 and 5 points. On the
basis of this information, we can now calculate a weighted mean as

∑𝒏𝒊
𝒊 𝑿𝒊𝑾𝒊 𝐖𝟏𝐗𝟏+𝐖𝟐𝐗𝟐+𝐖𝟑𝐗𝟑
̅𝑥 w= = = 60+75+100/ 2+3+5 = 23.5 marks.
∑ 𝐖𝐢 𝐖𝟏+𝐖𝟐+𝐖𝟑

C. Geometric Mean (G.M):

The geometric mean is the nth root of the product of n positive values. If X1, X2,…,,Xn are n
positive values, then their geometric mean is

G.M =(X1X2…Xn)1/n .

The geometric mean is usually used in average rates of change , Ratio, Percentage
distribution, Logarithmical distribution and so on.

In case of number of observation is more than two it may be tedious taking out from square root,
in that case calculation can be simplified by taking natural logarithm with base ten.
𝑛
G. M = √𝑥1. . 𝑥2 … . 𝑥𝑛= G.M=(x1…x2….xn)1/n take log both sides
1 1 1
Log (G .M) = 𝑛 log(x1…x2….xn) =𝑛 (log x1+log x2+…+log xn) = ∑𝑛𝑖=1 𝑙𝑜𝑔𝑥𝑖
𝑛

1
G.M=Antilog (𝑛 ∑𝑛𝑖=1 𝑙𝑜𝑔𝑥𝑖)

This shows that the logarithms of G.M is the mean of the logarithms of individual’s observations.

Example: The ratio of prices in 1999 to those in 2000 for 4 commodities were 0.9, 1.25, 1.75
and 0.85. Find the average price ratio by means of geometric mean.
∑ 𝑙𝑜𝑔𝑥𝑖 𝑙𝑜𝑔0.9+𝑙𝑜𝑔1.25+𝑙𝑜𝑔1.75+𝑙𝑜𝑔0.85
Solution: G.M =Antilog ( ) = antilog( ) =1.14
𝑛 4

𝑛
Geometric mean for ungrouped and G. M = √𝒎𝟏𝒇𝟏 … 𝒎𝟐𝒇𝟐 … 𝒎𝒏𝒇𝒏
grouped frequency distribution:
𝟏
== ∑𝒏𝒊=𝟏 𝒇𝒊 𝒍𝒐𝒈𝒎𝒊
In case of ungrouped data, geometric mean is 𝒏
Where, n=∑ 𝒇𝒊 and mi is class interval of
obtained by
the class.
𝑛
G. M = √𝒙𝟏𝒇𝟏 … 𝒙𝟐𝒇𝟐 … 𝒙𝒏𝒇𝒏
Properties of geometric mean:
𝟏
== ∑𝒏𝒊=𝟏 𝒇𝒊 𝒍𝒐𝒈𝒙𝒊  Its calculations are not as such easy.
𝒏
Where, n =∑ 𝒇𝒊  It involves all observations during
computation
For continuous frequency distribution  It may not be defined even it a single
observation is negative.

PREPARED BY: ABDULMENAN M. (MSc) Page 5 of 21


 If the value of one observation is zero The harmonic mean is used to average
its values becomes zero. rates rather than simple values. It is usually
appropriate in averaging kilometers per hour.
D. Harmonic mean (H.M):
Exercise: If man travels 200 KM, each on
The Harmonic mean is the reciprocal of the three days at speed of 60, 50 and 40 KM per
arithmetic mean of the reciprocal of each hour respectively. Find average speed
single value. If X1, X2, X3,..Xn are n values, traveled by a person.
then their harmonic mean is
𝒏 𝒏
Harmonic mean for ungrouped and
H.M = 𝟏 𝟏 𝟏 = 𝟏 grouped frequency distribution:
+ ……… ∑
𝒙𝟏 𝒙𝟐 𝒙𝒏 𝒙𝒊

Example: Find the harmonic mean of the For simple frequency data harmonic
values 2,3 and 6. mean is calculated by using the
following formula.
3
H.M =1 1 1 =3
+ +
2 3 6

𝒇𝒊
∑( ) 𝒏
𝒙𝒊
 H. M = Reciprocal = 𝒇𝒊 , Where n is the total number of observations.
𝒏 ∑( )
𝒙𝒊

For continuous frequency distribution;


𝒇𝒊
∑( ) 𝒏
𝒎𝒊
 H. M = Reciprocal = 𝒇𝒊 , Where n is the total number of observations and mi is
𝒏 ∑( )
𝒎𝒊
class marks of class interval.

Properties of harmonic mean:

 It is based on all observation in a distribution.


 Used when a situations where small weight is given for larger observation and
larger weight for smaller observation
 Difficult to calculate and understand
 Appropriate measure of central tendency in situations where data is in ratio, speed or rate.

Relationship among A.M, G.M, and H.M:

For any set observation, its A.M, G.M, and H.M are related each other in the relationship.

A.M ≥ G.M ≥ H.M

Note:

 The sign of ‘=’ holds if and only if all the observations are identical

PREPARED BY: ABDULMENAN M. (MSc) Page 6 of 21


 If the observation on the data set takes the value a, ar, ar2, ar3…arn-1,each with
single frequencies then,

(G.M)2=A.M*H.M

Median:

Median is defined as the value of the middle item (or the mean of the values of the two middle
items) when the data are arranged in an ascending or descending order of magnitude. If there are
an odd number of items in the array, the median is the middle number. If there is an even number
of items, the average of the two middle numbers.

Median for ungrouped data:

Thus, in an ungrouped frequency distribution if the n values are arranged in ascending or


descending order of magnitude, the median is the middle value if n is odd. When n is even, the
median is the mean of the two middle values.

 n  1
th

Median =   element if n is odd.


 2 

th th
n n 
     1
Median =   2 
2
element if n is even.
2

Determination of Median in a Continuous Frequency Distribution

In the case of a continuous frequency distribution, we first locate the median class by cumulating
th
N
the frequencies until   point is reached. Finally, the median is calculated by with the help of
2
the following formula:

Remark: The median class is the class with the smallest cumulative frequency (less than type)
N 
th 2  Cf  w
N
greater than or equal to   = Median  LCb   
Where, Cf = less than
2 f

cumulative frequency of the class preceding(one before) the median class , f is frequency of

PREPARED BY: ABDULMENAN M. (MSc) Page 7 of 21


the median class, LCb is lower class boundary of median class and w is the size of the class
k
width and N  
i 1
fi ,

Example: Calculate median for the following frequency distribution.

Monthly 800- 1000-1200 1200-1400 1400- 1600-1800 1800- total


Wages (in 1000 1600 2000
birr)
No. of 18 25 30 34 26 10 143
Workers
LCF 18 43 73 107 133 143
Solution=

In order to calculate median in this case, based on provided cumulative frequency, Median is the
N 143
value of   71.5th item, which lies in the class (1,200-1,400). Thus (1,200-1,400) is the
2 2
median class. For determining the median in this class, we use interpolation formula as follows:

N 
2  Cf 
71.5  43
Median  L C b    w =1200+ 200
f mc 30

=1393.2 birr Disadvantage of median:

Advantage of median:  The value of median is affected


more by sampling variations, that
 The value of median is easy to is, it affected by the number of
understand and may be calculated for observations rather than the values of
any type of data. the observations.
 The extreme value in the data set does  Since median is an average of
not affect the calculation of the position, therefore arranging the
median value. data in ascending or descending
 The median value may be calculated order of magnitude is time consuming
for an open-ended distribution of data in case of a large number of
set. observation.

 It is unique that is like mean there is ̂ ):


Mode(𝑿
only one median for a given set of
The mode is another measure of central
data tendency. It is the value at the point

PREPARED BY: ABDULMENAN M. (MSc) Page 8 of 21


around which the items are most heavily maximum frequency. This method can be
concentrated. The mode is the value that used conveniently if there is only one value
occurs most frequently in the data set.
with the highest concentration of
NB: In case of discrete frequency observation.
distribution or raw data, mode is the value In the case of grouped data, mode is
of the variable corresponding to the determined by the following formula:

 f 1  f0 
Mode = ̂
𝑿 = lo    w

 1 f  f 0    f1  f 
2 

Where lo is the lower value of the class in which the mode lie, f1 is the frequency of the class in
which the mode lie, f0 is the frequency of the class preceding the modal class, f2 is the frequency
of the class success ding the modal class and w is the class width of the modal class.

Example: Let us take the following frequency distribution:

Class 30-40 40-50 50-60 60-70 70-80 80-90 80-90


intervals
frequency 4 6 8 12 9 7 4
Solution: We can see from Column (2) of the table that the maximum frequency of 12 lies in the
class-interval of 60-70. This suggests that the mode lies in this class-interval. Applying the
formula given earlier, we get:

12  8 4
Mode  60   10 = 60   10 =65.7
12  8  12  9 43

Advantage of mode:
The mode is not affected by the extreme value in the distribution.
The mode value can be calculated for open-ended frequency distribution.
It is the only measurement of central tendency that can be used for qualitative data for
example in describing the opinion of people about a certain phenomenon and qualitative
data.

Disadvantage of mode:

Mode is not rigidly defined measure as there are several methods for calculating its
value.
PREPARED BY: ABDULMENAN M. (MSc) Page 9 of 21
It is difficult to locate modal class in the case of multi-modal frequency distribution.
Mode is not suitable for algebraic manipulations.
When data set contains more than one mode, such values are difficult to interpret and
compare.

Measure of location (positional measures): They tell where a specific data value falls within the
data set or its relative position in comparison with other data values. Quintiles are measures which
divides a given set of data in to equal subdivision and are obtained by the same procedure to that
of median but data must be arranged only in an increasing order. The most commonly used ones
includes: quartiles, deciles, Percentiles. Their measures that depend up on their positions in
distribution quartiles, deciles, and percentiles are collectively called quantiles.

Quartiles: Quartiles are measure which divided the ordered data in to four equal parts and usually
denoted by Q1,Q2,Q3 and are obtained after arranging the data in an increasing order known as
respectively first(lower) quartile or value for
which 25% of the observation lies below it, second quartile or value for which 50%of the
observation lies below or above it and third (upper) quartile or value for which 75% of the arranged
item lies below it or 25% lies above it.

For ungrouped data the ith quartiles is the value of the items which is at the

 n  1  n  1
th th

i *  position item i.e Qi = i *   position item where i=1, 2, 3


 4   4 

 n 1
th

Q1 - is value corresponding to   ordered observation.


 4 

 n 1
th

Q2 is the value of 2   ordered observation.


 4 

 n 1
th

Q3 =the value of the 3   ordered observation.


 4 

In case of continuous frequency distribution, quartiles obtained by applying formula

Q i  Lo 
i n 4  cf w
Where, n = the sum of the frequencies of all classes =
fQi
f i , Lo = the lower class boundary of the ith quartile class, Cf = the cumulative

PREPARED BY: ABDULMENAN M. (MSc) Page 10 of


21
frequencies of class before Qi (ith quartile class) and f Qi = The frequency of ith quartile class
and w is class width.

 in 
Note: To find ith quartile class compute   and search for the minimum less than cumulative
4
frequency greater than or equal to this value, then the class corresponding to this cumulative
frequency is ith quartile class.
Deciles: Are measures which divide a given ordered data in to ten equal parts and each part
contains equal no of elements. It has nine points known as 1st, 2nd …9th deciles and denoted by
D1 D2 D3………D9 and often called the first, the second,…, the ninth decile respectively.

 n  1
th

For ungrouped data, i deciles is the value of the item which is at the i * 
th
 position it
 10 

 n  1
th

Di = i *   position item where i=1, 2, 3, ……9.


 10 

For grouped data or continuous frequency distribution, deciles can be obtained by using

D i  Lo 
i n10  cf w
, for i=1, 2, 3………..9. Where, n= the sum of the frequencies
fDi

of all classes =  fi , Lo the lower class boundary of the ith deciles class, Cf is the cumulative

frequencies of class before Di (ith deciles class) and f is the frequency of ith deciles class and
w is class width.

 in 
Note: To find ith deciles class compute   and search for the minimum less than cumulative
 10 
frequency greater than or equal to this value.

Percentiles: Percentiles are measures having 99 points which divide a given ordered data in to
100 equal parts and each part consists of equal number of elements. It is denoted by P 1,P2,…P99
and known as 1st , 2nd , …99th percentiles respectively.

PREPARED BY: ABDULMENAN M. (MSc) Page 11 of


21
 n  1
th

For ungrouped data, ith percentiles is the value of item at the i *   position Pi =
 100 

 n  1  n 1
th th

i *  position item where i=1, 2, 3, 3…, 99, P1=   ordered observation, P2 = 2


 100   100 

 n 1  n 1
th th

  ordered observation and P99 = 99   ordered observation.


 100   100 

For grouped (continuous) data, percentiles can be obtained by using

P i  Lo 
i n100  cf w
, for i=1, 2, 3………..,99.
fp i

 in 
Note: To find ith percentile class compute   and search less than cumulative frequency
 100 
greater than or equal to this value, then the class corresponding to this cumulative frequency is i th
percentile class.

Example: The following frequency distribution is the distribution of profit earned by 15


companies during 2003 – 2004.

Class <5 5 – 10 10 – 15 15 – 20 20 – 25 25 – 30 30 – 35 35 – 40
interval
Frequency 2 5 7 13 21 16 8 3
Compute,

A) Median and verify that it is equal to Q2.


B) 72th percentile.
C) Second deciles for the above data.
D) The value for which 75% of the profit earned by the company lies above it .

Solution: A). The value of median can be calculated by using formula

n 
  Cf 
2  n
Median = Lcb + w . To find median class compute  75  37.5
f mc 2 2

PREPARED BY: ABDULMENAN M. (MSc) Page 12 of


21
The median class is 20 - 25 ,Lmc = 20 , Cf = 27 , W = 5 and Fmc = 21 then

Median = 20 +
37.5  27 5
= 22.5. Thus 50 % of the companies earned an annual profit of
21
22.5 thousands birr or less.

Note that from above example on 2nd quartiles which is equal to median value of the profit earned
by 15 companies.

B). 72th percentile of the profit earned by 15 companies is computed as follows;


 72 
 n  cf  w
P72  Lo   
100 72
75  54 , The
th
To find 72 percentiles class compute
f P 72 100
class that contains P72 is 25 – 30 , R = 25 , C f = 48 , fp72 = 13.

P72  25 
54  48 5  26.875 . A It shows that 72 % of the companies earn profit
16
of 26.875 thousands.

C). Second deciles of the profit earned by 15 companies is computed as follows

D2  Lo 
210 n  cf  w 2*n 2 * 75
f D2
To find 2nd deciles class compute   15
10 10

The class for D2 is 15 – 20, L o= 15, cf =1 4, f02 = 13

D2  15 
15  1 45
= 16.406.
13

n 75
D). To find 1st quartile class, compute   18.75
4 4

Then Q1 lies in the class 15-20, then Lo = 1 5, Cf = 14, fQ2 = 13 and w = 5

The unique value of Q1 is computed by

Q1  L o 
n 4  cf w 18.75  14
= 15  5  16.827
f Q1 13

PREPARED BY: ABDULMENAN M. (MSc) Page 13 of


21
It shows that only 25 % of the companies earn profit of birr 16.827 thousands or less annually.

CHAPTER FOUR

Measures of Dispersion (Variation)


We have seen that averages are representatives of a frequency distribution. But they fail to give a
complete picture of the distribution and knowing the average of a data set is not enough to
describe the data set entirely.

In addition to knowing the average, you must know how the data values are dispersed.
That is,
Is the data values cluster around the mean?
Are they spread more evenly throughout the distribution?

The measures that determine the spread of the data values are called measures of variation, or
measures of dispersion. These measures include the range, variance, and standard deviation.The
scatter or variation of observations from their average is called dispersion.

Objectives of measure of Variation:

judge the reliability of measures of central tendency


To control variability itself.
To compare two or more groups of numbers in terms of their variability.
To make further statistical analysis.

Characteristics of good Measures of Dispersion:

 It should be based on all observations.


 It should be easy to compute and to understand.
 It should not be affected much by extreme values.
 It should not be affected by sampling fluctuation.

Absolute and Relative Measures of Dispersion:

The measures of dispersion which are expressed in terms of the original unit of a series are termed
as absolute measures. Such measures are not suitable for comparing the variability of two
distributions which are expressed in different units of measurement and different average size.

Relative measures of dispersions are a ratio or percentage of a measure of absolute dispersion to


an appropriate measure of central tendency. If their means are widely different or if they expressed
indifferent units of measurement, we cannot use the standard deviation as such for comparing their
variability and are thus pure numbers independent of the units of measurement.

PREPARED BY: ABDULMENAN M. (MSc) Page 14 of


21
The most commonly used measure of variations are the range, variance, and standard deviation.

Range

Range is the simplest measure of dispersion and it is the highest value minus the lowest value. The
symbol R is used for the range. Range takes only maximum and minimum values into account
and not all the values. Hence it is a very unstable or unreliable indicator of the amount of deviation.

Range for ungrouped data, R=highest value -lowest value

Range for grouped frequency distribution

. Range = largest class limit minus smallest class limit or midpoint of the last class interval
minus the first class interval or upper class boundary of the last class minus lower class boundary
of the first Class.
Advantage of range:
 To know only the extent of the extreme dispersion “ordinary” condition
 It is easy to calculate and simple to understand
 To measure the a symmetric and nearly continuous series
Disadvantage of range:

 It can be affected by extreme values


 It can’t be computed when the distribution has open ended classes.
 It cannot take the entire data in to account.
 It does not tell anything about the distribution of values in the series
Generally range is an absolute measure and cannot be useful for comparing the variability
expressed in different units. So we need a measure of relative dispersion called relative range.
𝒓𝒂𝒏𝒈𝒆 𝒎𝒂𝒙𝒊𝒎𝒖𝒎 𝒗𝒂𝒍𝒖𝒆−𝒎𝒊𝒏𝒖𝒎𝒖𝒎 𝒗𝒂𝒍𝒖𝒆
 Relative range = 𝒔𝒖𝒎 𝒐𝒇 𝒆𝒙𝒕𝒓𝒆𝒎 𝒗𝒂𝒍𝒖𝒆𝒔 = 𝒎𝒂𝒙𝒊𝒎𝒖𝒎 𝒗𝒂𝒍𝒖𝒆+𝒎𝒊𝒏𝒊𝒎𝒖𝒎 𝒗𝒂𝒍𝒖𝒆

Quartile deviation:
Quartiles are the point which divided the array in to four equal parts.
Inter Quartile Range: Is the difference between 3rd and 1st quartile and it is a good indicator of
the absolute variability than range.

I.Q.R = Q3Q1

Quartile Deviation (semi – inter quartile Range) is a half of inter quartile range

PREPARED BY: ABDULMENAN M. (MSc) Page 15 of


21
Q. D 
Q3  Q2   Q2  Q1  =
Q3  Q1
2 2

Properties of Quartile Deviations:

The size of quartile deviation gives an indication about the uniformity. If Q. D is small, it
denotes large uniformity. Thus, a coefficient of quartile deviation is used for comparing
uniformity or variation in different distribution.
As compared to range, it is considered a superior measure of dispersion.
Like the range, it fails to cover all items in the distribution.
It not influenced by the extreme values in a distribution.
𝑄 −𝑄
Coefficient of quartile division = 𝑄3 +𝑄1
3 1

The Variance, the Standard Deviation and the Coefficient of Variation:


In order to have a more meaningful statistic to measure the variability, statisticians use measures
called the variance and standard deviation.
The variance is the average of the squares of the distance each value is from the mean. The
variance is actually the average of the square of the distance that each value is from the mean.
Therefore, if the values are near the mean, the variance will be small. In contrast, if the values are
far from the mean, the variance will be large.
The symbol for the population variance (variance obtained from entire data) is
1 N
2 =   X   2 ; puplation var iance where
N 1
Where, X = individual value, µ= population mean and N= population size.

 x  x 
n 2

And the sample variance for raw data can be obtained as, s2 = i an unbiased
i  1
n 1

estimator for population variance and the computing formula for variance is can be simplified as
 n 
  xi   xi  2 n 
2

S2 = i  1 
n 1

Variance for ungrouped frequency distribution:

The determination of variance for ungrouped frequency distribution is,

PREPARED BY: ABDULMENAN M. (MSc) Page 16 of


21
 f x 
n
2
i i  x
f
i 1
S2 = where n 
n  1 i

Variance for grouped frequency distribution:

The determination of variance for grouped frequency distribution is,


n

 f i mi  x 
2

s2  i 1
Where mi is midpoint value of class interval
(n  1)

Simplified formula used for computation is

 f i mi
2
  mi fi  2
n
S2 
 k 
 f  1
 i 
 i 1 

Properties of Variance

o The variance and standard deviation of a data set can never be negative.
o If every element in the distributions are multiplied by a constant C the new variance is
2
Snew  C 2 Sold
2
.
o When a constant c is added to all measurement of the distribution, the variance doesn’t
change.
o The variance of constant measured n times is zero.
Standard Deviation

The standard deviation is defined as the square root of the mean of the squared deviations of
individual values from their mean. Finding the square root of the variance puts the standard
deviation in the same units as the raw data.

 X  X 
2

S.D =
n

The standard deviation of a sample (denoted by s) is

∑(𝑋−𝑋̅ )2
S=√𝑠 2 =√ where, X=individual value and 𝑋̅=sample mean, n=sample size
𝑛−1

 Its advantage over variance is that it is in the same unit as the variable under consideration.
 It is a measure of average variation in the set of data.
PREPARED BY: ABDULMENAN M. (MSc) Page 17 of
21
Example: Calculate the S.D for the following grouped frequency distribution.
Class intervals 1–3 3–5 5–7 7 – 9 9 – 11 11 – 13 13 – 15 Total
Frequency(fi) 1 9 25 35 17 10 3 100

Solution=

f m i i
2
  f m 2 n
i i
7016  800 2
100
S 2
 = = 6.22
 fi  1 99

S  S2  6. 22  2.49

Uses of the Variance and Standard Deviation:

As previously stated, variances and standard deviations can be used to determine the spread
of the data. If the variance or standard deviation is large, the data are more dispersed. This
information is useful in comparing two (or more) data sets to determine which is more
(most) variable.
The measures of variance and standard deviation are used to determine the consistency of
a variable. For example, in the manufacture of fittings, such as nuts and bolts, the variation
in the diameters must be small, or the parts will not fit together.
The variance and standard deviation are used to determine the number of data values that
fall within a specified interval in a distribution. For example, Chebyshev’s theorem.
Finally, the variance and standard deviation are used quite often in inferential statistics.

Coefficient of variation (CV): The CV is a unit free measure. It is always expressed as percentage.
SD
CV = 100%. The CV will be small if the variation is small. Of the two groups, the one with
Mean
less CV is said to be more consistent.

Example: The mean for the number of pages of a sample of women’s fitness magazines is 132,
with a variance of 23; the mean for the number of advertisements of a sample of women’s fitness
magazines is 182, with a variance of 62. Compare the variations.

Solution: The coefficients of variation are,

√23 √62
CV for pages = 132 ∗ 100% = 3.6% and CV for advertisements *100% = 4.3%
182

The number of advertisements is more variable than the number of pages since the coefficient of
variation is larger for advertisements.

PREPARED BY: ABDULMENAN M. (MSc) Page 18 of


21
The Standard Scores: A z score or standard score for a value is obtained by subtracting the mean
from the value and dividing the result bythe
xi  x  deviation. The symbol for a standard score
standard
Z 
is z. The formula is defined as S

Where S – Standard deviation of the distribution

Xi each observation value

 This measures the deviation of individual observation from the mean of the total
observation in the unit of standard deviation and termed as Z – Score.
 A standard score or z score tells how many standard deviations a data value is above or
below the mean for a specific distribution of values. If a standard score is zero, then the
data value is the same as the mean.
 A comparison of a relative standard similar to both groups can be made.

Example: Compare the performance of the following two students

Candidate Marks in economics Marks in Acct. Total

A 84 75 159

B 74 85 159

Average mark for Accounting is 50 with standard deviation of 11 and average mark for economics
is 60 with standard deviation 13. Whose performance is better A or B?

84  60
Economics  1.846
13
Z score for A
75  50
Accounting  2.273
{ 11

Total Z score for A = 1.846 + 2.27 = 4.119

74  60
Economics  1.077
13
Z score for B
75  50
Accounitin g  3.182
{ 11

Total Z – Score for B = 1.077 + 3.182 = 4.25

PREPARED BY: ABDULMENAN M. (MSc) Page 19 of


21
Since B’s Z – score is higher; student B had good performance than student A.

Chebyshev’s Theorem: The proportion of values from a data set that will fall within k standard
deviations of the mean will be at least, where k is a number greater than 1 (k is not necessarily an
integer).

Specifies the proportions of the spread in terms of the standard deviation.


Chebyshev’s theorem applies to any distribution regardless of its shape.

Example: The mean price of houses in a certain neighborhood is $50,000, and the standard
deviation is $10,000. Find the price range for which at least 75% of the houses will sell.

Solution=Chebyshev’s theorem states that three-fourths, or 75%, of the data values will fall within
2 standard deviations of the mean. Thus,

$50,000 +2($10,000) = $50,000 +$20,000 = $70,000 and

$50,000 -2($10,000) = $50,000 -$20,000 = $30,000

Hence, at least 75% of all homes sold in the area will have a price range from $30,000 to $70,000.

Measure of shapes

We have seen that averages and measure of dispersion can help in describing the frequency
distribution. However, they are not sufficient to describe the nature of the distribution. For this
purpose, we use Skewness and Kurtosis commonly known as measure of shape. Frequency
distributions can assume many shapes. The three most important shapes are positively skewed,
symmetric, and negatively skewed.

Skewness:

when a frequency distribution is not symmetrical it is said to be asymmetrical or


skewed.
Skewness means lack of symmetry.
When the values are uniformly distributed around the mean a distribution is said
to be symmetrical.
In a symmetrical distribution the mean, median and mode are the same and at the center
of the distribution.

We have two types of skewed distribution. These are,

 Negatively Skewed Distribution


 Positively Skewed Distribution

PREPARED BY: ABDULMENAN M. (MSc) Page 20 of


21
In a positively skewed or right-skewed distribution, the majority of the data values fall to the
left of the mean and cluster at the lower end of the distribution; the “tail” is to the right. Also, the
mean is to the right of the median, and the mode is to the left of the median and if the distribution
is skewed to the right side i.e., when mean > median > mode. When the majority of the data values
fall to the right of the mean and cluster at the upper end of the distribution, with the tail to the left,
the distribution is said to be negatively skewed or left-skewed. Also, the mean is to the left of the
median, and the Mode is to the right of the median.

 If the distribution is skewed to the right side i.e., when mode > median > mean.
 When a distribution is extremely skewed, the value of the mean will be pulled toward the
tail, but the majority of the data values will be greater than the mean or less than the mean
(depending on which way the data are skewed); hence, the median rather than the mean is
a more appropriate measure of central tendency.

Karl person’s Measure of skewness: is a measure of skewness. If the distribution is symmetric


we will have Arithmetic mean. = Median = Mode; unless they will not be equal if the distribution
is skewed. Therefore the distance between the A.M. and the Mode (A.M – Mode) can also be used
as a measure of skewness. However since the measure of skewness should be a pure number we
define
𝐴 .𝑀−𝑀𝑜𝑑𝑒
SK = Where δ is the standard deviation of the distribution.
δ

For distribution which are bell shaped and are moderately skewed, we have an approximate
relationship between the A.M, Median and mode.

A. M – Mode = 3 (A. M – Median)


Accordingly we may define skewness as follows
3(𝐴 .𝑀−𝑀𝑒𝑑𝑖𝑎𝑛)
SK = δ

For a symmetrical distribution SK = 0. If the distribution negatively skewed, then the value of Sk
is negative, and if it is positively skewed then Sk is positive. The range for values of SK is from -
3 to 3.

Kurtosis: A measure of the peakedness or convexity of a curve is known as Kurtosis. It is


µ4
measured by Pearson’s coefficient, β2. It is given by δ4 β2=µ4/µ22=δ4 . The sample estimate of this
2, ∑(𝑋− ̅𝑋)
̅̅̅4
coefficient is b2 = m4/m 2 where m 4 is the 4Th central moment given by m4= . The
𝑛−1
distribution is called meso-kurtic if the value of b2 = 3. When b2 is more than 3 the Distribution
is said to be leptokurtic. And also, if b2 is less than 3 the distribution is said to be platykurtic.

PREPARED BY: ABDULMENAN M. (MSc) Page 21 of


21

You might also like