Professional Documents
Culture Documents
Measures of Central Tendency Project +2
Measures of Central Tendency Project +2
Paper Coordinator Prof. Surendra Singh (Retd) North- Eastern Hill University,
Shillong
Content Prof. Surendra Singh(Retd) North Eastern Hill University,
writer/Author (CW) Shillong
Content Reviewer Prof. Surendra Singh(Retd) North Eastern Hill University,
(CR) Shillong
Module ID
Pre- Requisites knowledge of Basic statistics and data compilation
Objectives (i) To provide background of central tendency and
its measurement, and
(ii) Application of central tendency techniques in
geographical studies
Key words Mean, median, mode, weighted mean, frequency
distribution, squared deviation, mid position, and mid
value
to be the concentration of n items in Xi variable. Secondly, the dispersion that shows the
width of distribution. It is equally important dimension in statistical analysis because,
some times, there is same value of central tendency of two variables but dispersion may
vary or same magnitude of dispersion but central trend may differ. For example, the two
sets of daily rainfall for the months of April and May 2004 in Cherrapunji have same
magnitude of average rainfall (a most familiar measure of central tendency) called mean,
while the ranges of distribution (a difference of maximum and minimum value of data –
series) are different as it vary from 0 to 492.2 mm in the month of April and from 0 to
327.6mm in the month of May (Table-1). It shows different degree of dispersion despite
the same value of their mean. So the nature of these two distributions is different from
each other showing differences in distributions (see Fig.-1). Thus, the study of central
value and dispersion are essential dimensions in understanding the distributional
characteristics of a series/variable.
Table-1: Daily Rainfall of the month of April and May 2004 at Cherrapunji
Days April 2004 May 2004
1 15.2 10.1
2 0 119.1
3 25.8 8.6
4 0 0
5 1.4 0
6 1.4 1.4
7 2.7 0
8 0 0
9 60.6 6.2
10 66.8 12.6
11 11.2 327.6
12 29.4 90
13 89.2 172
14 135.1 175.8
15 98.3 66.8
16 492.2 36.3
17 39.5 0
18 76 0
19 87.3 18.6
20 0 0
21 0 16.4
22 70.3 4.4
23 9.2 104.6
24 20.2 2.8
25 0 0
26 0 10.2
27 0 19
28 1.5 0
29 0 73.4
30 1.6 59
Total 1334.9 1334.9
Mean 44.49 44.49
Source: Automatic weather Station installed by the Department of Geography, North Eastern Hill University,
Shillong under the project sponsored by Ministry of Science and Technology, New Delhi
A B
Fig.-1: Comparison of Daily Rainfall Pattern of April and May months of the Year 2004 in Cherrapunji
(Totals and Means of the rainfall of these months are same as 1334.9 mm and
44.50 mm respectively)
In this module, the details about the measures and application of central tendency are
given, while the dispersion of a data series would be described separately in another
module.
It is to note that skewness of distribution that would be described in the separate module,
determines the use of best measure of central tendency.
These characteristics of the mean are exemplified in Table – 3 and are useful for its
application in statistical analysis.
Table-3: Mean and Absolute and Squared Deviations of Daily Rainfall at
Cherrapunji (April 2004)
The sum of absolute deviation and the sum of squared deviation are 924.27 mm and
70949.94 mm respectively (columns 4 and 5 in Table -3). These are the smallest values
as calculated after deviating rain from its mean, i.e. 52.29 mm. If this series of rainfall
data is deviated from another value either lesser or higher than 52.29 mm, the values of
the sum of deviation will be always be higher than the earlier values. That is why, mean
is the central value of the series showing a value balance.
2.3 Extension of the Concept of Mean
In addition, if we add one observation in any series of n observations and calculate mean
of this series containing n+1 observation as
X*n+1 = (X1+X2+X3+ ... + Xn + Xn+1)/(n+1) . … …(4)
Its simplification becomes
X*n+1 = (n/n+1).X*n + (1/n+1).Xn+1 , … …(5)
where Xn+1 is value of newly added observation.
Similarly, if a large number of observations are added to a series, it will be difficult to
arrange the data in the series of individual observations. A frequency distribution as made
in which mid value of the class is weighted by the frequencies, f i. The sum of the
weighted product is divided by the size of data to get the mean.
X* = (f1m1, + f2m2 + f3m3 + ... + fcmc) / (f1+f2+f3 + ...+fc )
X* = ∑fimi /∑fi … … … (6)
Where c is number of classes in the frequency distribution, m c is mid value of the class
and fc is frequency of the class c.
Such concept of assigning weight, W i , either for each individual observation (W i X i ) or
for each class in frequency distribution (f imi) is called weighted mean which is calculated
by given formula (Eqn -6).
X* = (w1m1, + w2m2 + w3m3 + ... + wcmc)/(w1+w2+w3 + ...+wc )= ∑wimi/∑wi
Example:
Compute the mean of daily rainfall data of Cherrapunji by above given three methods
(simple, mid point value and coding methods). Following Table is the conversion of
ungrouped data of rainfall of Cherrapunji in to frequency distribution.
Table-4: Rainfall data of 170 days of the Rainy Season 2004 at Cherrapunj
Rainfall Frequency Class fm Assumed Deviation fidi
Class (mm) (fi) Mid- point (Col 2X col 3) Mean (di=m-Xa) (col 2x col 6)
(m) (Xa)
(1) (2) (3) (4) (5) (6) (7)
0-100 130 50 6500 -300 -39000
101-200 17 150 2550 -200 -3400
201-300 9 250 2250 -100 -900
301-400 7 350 2450 350 0 0
401-500 4 450 1800 100 400
501-600 1 550 550 200 200
601-700 1 650 650 300 300
701-800 1 750 750 400 400
∑f=N=170 - ∑fm =17500 ∑fidi= -42000
Moving Average:
In a long series of data in which there is large fluctuation in distribution, in order to avoid
temporary variation in data and to improve fit of data to a line called ‘smoothing’,
moving average is used. It moves as the value of next observation is added and earlier
observation is removed in the process as (n-1+1=n). Simple moving average will change,
while the number of observation n will remain same in the calculation of moving
average. However, the values of Xi in calculation will move forward as{(X i-1 + Xi +
There are two examples to show the moving trends of, first, a time series of peak flow
data of Kaljani river, a tributary of Brahmaputra of a19 years (n=19) and, secondly, the
trend of total pulse production of India for the last 55 years and its fluctuation. The
following discussion forwards how moving average is used to reduce the fluctuation of
time series data.
Example-1: Draw a line of 3-years moving average in a given data series of peak
discharge of Kaljani River, a tributary of Brahamaputra
Interpretation: As per given data of peak flow collected from the Irrigation Department,
Government of Assam, Guwahati, moving average of 3-years was calculated (Table- 3
and Fig-2). It is evident from the figure that moving average smoothens the fluctuation. It
is shown by the moving average curve that the years from 1990 to 1995 were the time of
large fluctuation in the peak flow of river Kaljani.
Fig.-2: 3-years Moving Average for annual data of peak flow
Example-2: Show the trend of the production of total pulses in India and its smooth
curve.
A trend of total pulse production and its smooth curve by computing 3-years average
were shown by the Fig.- 3 given below. In fact, moving average was calculated as per
given equation. It shows that smoothness in the pulse production fluctuation (black
dotted) persists with increasing and decreasing trends in the years of 1961-2, 1982-3 and
2003-4. It may be due to the impact of weather and fluctuation of pulse –prices in India.
Sources: Agricultural Statistics- At A Glance 2007, Directorate of Economics and Statistics, Ministry of
Agriculture, Government of India, New Delhi
Fig.- 3: Trends of Total Pulse Production and its Fluctuation (3-years Moving
Average)
Combined Mean
If mean of a distribution, X1, X2, X3 , ... , Xn and mean of another distribution Y1, Y2,
Y3, ... , Yk are expressed as X*n and Y*m with their number of observations n and m
respectively, the combined mean of these two distribution (Z*m ) can be calculated as
Z*m = (nX*n + mY*m) / (n+m) … … (7)
For grouped data distribution when data are arranged in frequency distribution, the
median involves cumulative frequency distribution for determining the median class and
the correct mid position to find frequency ratio. This ratio is finally converted into values
multiplying it by the value of class interval (h) of the distribution. Thus, median in
frequency distribution is the value of lower limit of median class (L) plus the ratio value.
It is written as
The formula of calculation of these measurements will remain same except the changes
in its position value as given below:
st
I Quartile, Q1 = L + [h {(n/4 – c)}/f] … … …(13)
nd
2 Quartile, Q 2 = median as equation -12
rd
3 Quartie, Q3 =L+ [h {(3n/4)-c}/ f]
Deciles, Dj =L+ [h { (j.n./10)-c}/f]
Percentiles Pr = L+ [h {(r.n./100)-c}/f]
Example:
Find out Q1, Q2 (that is median), Q3 , D6 and P87 for the daily rainfall data of
Cherrapunji containing 170 observations.
Table-4: Rainfall data of 170 days of the rainy season 2004 at Cherrapunj
Rainfall Class (mm) Frequency (f) Cumulative frequency (cf)
(1) (2) (3)
0-100 130 130
101-200 17 147
201-300 9 156
301-400 7 163
401-500 4 167
501-600 1 168
601-700 1 169
701-800 1 170
∑f=N=170
As per equation-12, Q2= X med = L+[h{(n/2) – c}/f], where L is lower limit value
of median class, c is cumulative frequencies of the class preceding the median class
where median lies, f is frequency of median class, n is number of observations and h is
class interval.
In above given distribution rainfall, n/2= 170/2 = 85; it lies in the first class in
distribution, so median lies in this class. Since there is no pre median class, the c=0, f=
130, h= 100 and L= 0.
Q2= X med = 0+[100{(170/2) – 0}/130] = 100(85/ 130)= 65.38 mm Q1=
0+[100{(170/4) – 0}/130] = 100(42.5/ 130)= 32.692 mm Q3=0+[100{(170x3/4) –
0}/130] = 100(127.5/ 130)= 98.07 mm D6=0+[100{(170x6/10) – 0}/130] =
100(102/ 130)= 78.46mm P87=200+[100{(170x87/100) – 147}/9] =
200+[100((147.9-147.0)/ 9)]= 210 mm
( P87% position lies in the class ranging 200-300 mm of rainfall, so L= 200, c=147 and
f= 9).
4.2 Computation of Mode
and d2 =(fm – f2) difference between the frequencies of model and post-model classes.
Preparing histogram of frequencies distribution the mode is graphically calculated by
joining the maximum frequency for with pre and post model classes also.
000^^^000