You are on page 1of 15

Chapter 3: Measures of central tendency or location

It is that value around which a majority of the data will tend to concentrate and hence it acts as a
representative of the whole data.

The important measures of central tendency are:

(i) Arithmetic Mean: This is also called as simple average and is obtained by the sum of all the
values divided by the number of values.

If X is a variable having values x1, x2, x3 ,…xn then the arithmetic mean X bar = ∑ X / n

For example, if there are 5 values 12, 15, 14, 16 and 18 then arithmetic mean = (12 + 15 + 14+
16+ 18)/ 5 = 75/5 = 15.

If the values are repeating i.e. if frequencies are given then it can be shown as

X f fx

2 3 6

4 2 8

6 5 30

X bar = ∑ fx/∑f = 44/10 = 4.4

For class interval data, we first find the mid-points of each class and multiply these with the
frequencies. The arithmetic mean can be calculated as

Class f m fm

0-10 5 5 25

10-20 2 15 30

20-30 3 25 75

Arithmetic mean = ∑fm/∑f = 130/10 = 13


Arithmetic mean is commonly used because it is well defined and not based on the order of the
values. Also, it is possible to combine the arithmetic means of two or more data to find the
combined mean, which is not possible in case of other measures of central tendency.

For example, we have two batches, the first batch having 60 students and having average marks
of 75 and the second batch of 30 students and having average marks of 65.

So we can have n1 = 60, X1 bar = 75, n2 = 30 and X2 bar = 65

Combined mean = (n1* X1 bar + n2* X2 bar)/ (n1+ n2) = (60 x 75 + 30 x 65)/ (60+30) = 71.67

Arithmetic mean has one major limitation. If there are extreme values present in the data, then
the arithmetic mean does not act as a good representative of the data.

For example, if we consider values like 0, 10, 50 and 1000 then arithmetic mean = 1060/4 = 265.
This value of 265 is not close to any of the four values and is not able to represent the data
properly. Hence, in this case, arithmetic mean does not act as a good measure of central tendency
and we can use some other measure like the median. One example of extreme values is the data
on income of people of the whole country, where there are huge variations.

(ii) Weighted mean: This is similar to arithmetic mean, the only difference being that different
values have a different weightage or significance attached to them.

For example, we can take the example of overall evaluation of students which is divided into a
number of parameters like assignments, test, presentations and final exam. Each of these
parameters has got a certain weightage attached to it and based on that, we can find the weighted
average marks of a particular student. Consider the following example, where marks in each
criteria are taken out of 100, to have a common base.

Criteria Weightage w Marks X wX

1. Test 10% 75 7.5

2. Assignments 20% 70 14

3. Presentations 10% 68 6.8

4. Final exam 60% 60 36


Weighted average marks = ∑ wX / ∑w = 64.3 / 1 = 64.3

The concept of weighted mean can also be used to find the weighted average cost of capital or
the weighted score of an employee, to be used for performance appraisal.

(iii) Median: This is the middle value of the data i.e. it divides the data into two equal parts. So
half the data is less than the median and the remaining half is more than the median. This is a
positional measure which means that it is based on the order of the values. Hence we first
arrange the values in ascending or descending order and then find the middle term.

If the number of terms n is an odd number, then there is only one middle term and the median is
given by the (n+1)/2 term. For example, if there are 5 terms, we first arrange the values in
ascending order and the median = (5+1)/2 term = 3rd term.

If the number of terms n is an even number, then there are two middle terms i.e. n/2 term and
(n/2 + 1) term and the median will be the average of these 2 terms. For example, if there are 6
terms, then the middle terms are 3rd term and 4th term and the median is the average of these 2
terms.

For example, consider the terms 12, 6, 25, 8, 19

First, we arrange the values in ascending order. So we get 6, 8, 12, 19, 25. Here n = 5, which is
an odd number. So median = 3rd term = 12.

We take another example. Consider the terms 16, 6, 28, 18, 29, 35. First, arrange the terms in
ascending order. We get 6, 16, 18, 28, 29, 35. Here n = 6, an even number. So there are 2 middle
terms i.e. 18 and 28. Median will be the average of these 2 terms, i.e. (18 + 28)/2 = 23.

Now, we take the case when values are repeating, i.e. when frequencies are given.
X f
3 8
6 5
4 4
5 2
Here we have to first arrange the X values in ascending order and write the corresponding
frequencies.
X f C.f
3 8 8
4 4 12
5 2 14
6 5 19
Here n = ∑f = 19, an odd number. So median = (19+1)/2 term = 10th term. For getting this term,
we find the cumulative frequencies. For this, we add the successive values of frequencies.
The 10th term lies between 8th term and 12th term in the C.f. column. Hence the 10th term= 4.
We take another example.
X f
6 10
2 4
9 5
4 3
First, we arrange the X values in ascending order and write the corresponding frequencies.
X f C.f
2 4 4
4 3 7
6 10 17
9 5 22
Since n = ∑f = 22, an even number, there are 2 middle terms i.e. 11 th and 12th term. From the C.f.
column, 11th and 12th term lie between 7th and 17th term. Hence the 11th and 12th term are 6.
Hence, median = (6+6)/2 = 6.
For class interval data, we find out the n/2 term, irrespective of whether n is odd or even and by
using a formula, we can find the exact value of the median.
Class f C.f
0-10 5 5
10-20 12 17
20-30 10 27
30-40 4 31
Here n= ∑f = 31. We find n/2 = 31/2 = 15.5 term. Then we find the cumulative frequencies to
find out which interval will contain the 15.5 term. The 15.5 term lies between 5 th and 17th term in
the C.f. column. Hence the class interval 10-20 will contain the median.
Then we use the following formula to find the exact value of the median.
Median = L1 + ((n/2 – C.f)(L2-L1)/f)
Here L1 and L2 are the lower and upper limit of the class interval containing the median. C.f is
the cumulative frequency upto the previous class interval and f is the frequency of the class
interval containing the median.
Here L1= 10, L2 = 20, C.f = 5, f = 12
Median = 10 + ((15.5-5)(20-10)/12)= 18.75
Similar to the median, we have some more positional measures like quartiles, deciles and
percentiles. We will discuss them now.

(iv) Quartiles: These are points which divide the data into 4 equal parts. Hence we have three
quartiles Q1, Q2 and Q3. Q1 divides the data in the ratio of 1:3, i.e. the lowest 25% of the data is
less than Q1. Q2 divides the data in the ratio 2:2 i.e. 1:1. Hence Q2 is same as the median. Q3
divides the data in the ratio 3:1. Hence 75% of the data is less than Q3 or we can say that the top
25% of the data is more than Q3.
We will find the values of Q1 and Q3 for the example shown.
Class f C.f
0-10 5 5
10-20 12 17
20-30 10 27
30-40 4 31
Here n = 31. For Q1, we find the n/4 term i.e. 31/4 = 7.75 term. From the C.f. column, it is seen
that the 7.75 term lies between 5th term and 17th term. Hence the class interval 10-20 will contain
the value of Q1.
Q1 = L1 + ((n/4-C.f)(L2-L1)/f)
Here L1, L2, C.f and f are similar to the case of the median.
Q1 = 10+ ((7.75-5)(20-10)/12)= 12.29
For finding Q3, we find the 3n/4 term= 3x 7.75 = 23.25 term. This term lies between the 17 th
term and the 27th term. Hence the class interval 20-30 contains Q3.
Q3= L1 + ((3n/4-C.f.)(L2-L1)/f) = 20+ ((23.25-17)(30-20)/10)= 26.25

(v) Deciles: These are points which divide the data into 10 equal parts. There are 9 deciles D1 to
D9. 10% of the data is less than D1, 20% is less than D2 and so on. D5 is same as the median.
For finding D1, we find n/10 = 31/10 = 3.1 term. This lies upto the 5 th term in the class interval
0-10.
D1= L1 + ((n/10-C.f.)(L2-L1)/f)= 0 + ((3.1-0)(10-0)/5) = 6.2
For getting D6, we find 6n/10 = 6x3.1= 18.6 term. This lies between the 17 th and 27th term in the
class interval 20-30.
D6 = L1 + ((6n/10-C.f)(L2-L1)/f) = 20+ ((18.6-17)(30-20)/10)= 21.6

(vi) Percentiles: These are points which divide the data into 100 equal parts. There are 99
percentiles P1 to P99. 1% of the data is less than P1, 2% is less than P2 and so on. P50 is same
as the median.
For finding P1, we find the n/100 term.
P1= L1 + ((n/100-C.f)(L2-L1)/f)
n/100 = 31/100 = 0.31 term, which lies upto the 5th term in the class interval 0-10.
P1 = 0 + ((0.31-0)(10-0)/5)= 0.62
For getting P97, we find 97n/100 = 97 x 0.31 = 30.07 term. This lies between the 27 th and 31st
term in the class interval 30-40.
P97= L1 + ((97n/100-C.f)(L2-L1)/f) = 30+ ((30.07-27)(40-30)/4) = 37.675

(vii) Mode: It is that value which occurs the most often or has the highest frequency. This is
often used in the shoe industry to find the most common size of shoes.
We take an example. Suppose X is a variable having values 3, 4, 6, 4, 8, 4, 10 then mode=4
If frequencies are given, then mode can be found as follows:
X f
5 6
8 10
10 3
4 2
Here the value 8 has the highest frequency, so the mode is 8.
Sometimes, there is more than one value having the highest frequency. This is called as multi-
modal data. For example,
X f
5 10
6 4
7 10
4 10
Here there are 3 modes: 5, 7 and 4.
If we have a grouped data, we first find the class interval having the highest frequency and then
using a formula, we can find the value of the mode.

Class f
0-10 4
10-20 8
20-30 3
30-40 6
Here the highest frequency is 8, so the class interval 10-20 contains the mode.
Mode = L1 + (delta 1)(L2-L1)/(delta 1 + delta 2)
Here L1 and L2 are the lower and upper limit of the class interval containing the mode. We get
delta 1= maximum frequency – previous frequency and delta 2 = maximum frequency – next
frequency
For the above example, delta 1 = 8-4 = 4 and delta 2 = 8-3 = 5
Mode = 10 + (4/ (4+ 5))(20-10) = 14.44
The highest frequency can be for the first class interval or for the last class interval. If the first
class interval has the highest frequency, then the previous frequency is taken as 0 and if the last
class interval has the highest frequency, then the next frequency is taken as 0.

(viii) Geometric mean: Here we take the nth root of the product of n values.
For example, if there are n values x1, x2,...xn then
G.M.= n√ x1.x2.x3.....xn
This is used in cases where the data is about rate of change of prices or growth rates of a
company or of the country or rate of change of population.

Example: The growth rate of a company in the first year of operation is 6%, in the second year it
is 7% and in the third year it is 9%. Find the average growth rate of the company.
Solution: Here, instead of finding the simple average, we find the geometric mean.

G.M.= 3√ 106 x 107 x 109 = 107.326


Hence the average growth rate of the company is 107.326-100 = 7.326%

Symmetric data: If the values of mean, median and mode are equal, then the data is called as
symmetric data and can be represented in the form of a bell shaped curve as follows:
Here X is the variable which follows a symmetric data and mean is denoted as µ. There are more
values near the average value and less values at the two extremes. For example, heights, weights,
marks, incomes and productivities are variables which follow a symmetric data.
If the values of mean, median and mode are not equal then the data is called as skewed data and
this property is called as skewness. A simple way of finding the skewness is mean –mode. If
skewness is 0, it means that the data is symmetric.
Skewness can be either positive or negative depending on whether mean is less than the mode or
greater than mode. If mean < mode, then skewness is negative and if mean > mode, then
skewness is positive.
Coefficient of skewness = Skewness / Standard deviation
We will be discussing about standard deviation in the next chapter.
For moderately symmetric data, mode can be calculated as mode = 3 median -2 mean.
Hence, Skewness = Mean – mode = mean – (3 median – 2 mean) = 3 (mean – median)

Kurtosis: Kurtosis is the relative length of the tails and the degree of concentration in the center.
Consider three kurtosis prototype shapes.

If coefficient of kurtosis = 3, it is a normal curve called as mesokurtic curve.


If coefficient of kurtosis > 3, the curve is more peaked than the normal curve and is called as a
leptokurtic curve.
If coefficient of kurtosis < 3, the curve is less peaked than the normal curve and is called as a
platykurtic curve.

The merits and demerits of the different measures of central tendency are:
(i) Arithmetic mean
Merits: 1. It is easy to determine and understand arithmetic mean.
2. It is based on all the values or observations of the distribution.
3. It can be used for further algebraic treatment. For example, if we know the arithmetic means
of 2 or more distributions and the number of observations in each, then the arithmetic mean of
the combined distribution can be obtained.
4. The formula for arithmetic mean is rigidly defined implying that for a given series, arithmetic
mean is unique.
5. It provides a good basis for comparison.
6. For obtaining arithmetic mean of a series, its values need not be arranged in a given order.
7. If the arithmetic mean and the number of observations of a distribution are known, then the
sum of the observations of the distribution can be known.
Demerits: 1. It is unduly affected by extreme values.
2. In case even a single observation of a series is missing, we cannot determine the arithmetic
mean of the series.
3. The determination of arithmetic mean of a grouped frequency distribution is based on the
unrealistic assumption that the observations of each class are concentrated at the centre of that
class.
(ii) Median
Merits: 1. Extreme values do not affect the median.
2. Median is easy to understand. It is also easy to determine.
3. Median can also be determined graphically.
4. Medians of individual distributions and ungrouped frequency distributions can be determined
simply by observation.
Demerits: 1. In order to determine the median of a distribution, the distribution must be arranged
in a definite order. This is not needed in other measures of central tendency.
2. Median of a distribution is not based on all the observations of the distribution.

(iii) Mode
Merits: 1. The mode/modes of an ungrouped frequency distribution can be determined simply
by observation.
2. Mode is not affected by extreme values.
3. Mode is easy to understand.
4. Mode can be determined graphically.
Demerits: 1. It is not based on all the observations.
2. It is not suitable for further mathematical treatment.
3. We cannot know the sum of observations of a distribution if we know the mode and the
number of observations of the distribution.

Characteristics of a good measure of central tendency


(i) It should be rigidly defined.

(ii) It should be easy to understand and easy to interpret.

(iii) It should be easy to compute.

(iv) It should be based on all the observations.

(v) It should be least affected by fluctuations of sampling.

(vi) It should not be unduly affected by extreme observations.

(vii) It should be capable of further mathematical treatment.

(viii) It should not be affected by the method of grouping of observations.

(ix) It should represent the central tendency of the data.

Examples (with answers)

Q1. Find the mean and median for the following data

Class F m fm C.f

5-10 7 7.5 52.5 7

10-15 8 12.5 100 15

15-20 5 17.5 87.5 20

20-25 4 22.5 90 24
25-30 6 27.5 165 30

Solution: Mean= ∑fm/∑f = 495/30 = 16.5

Here n is the sum of the frequencies i.e. 30. n/2 = 15th term. From the C.f column, we see that the
15th term lies in the interval 10-15.

Median = L1 + (n/2 –C.f)(L2-L1)/f = 10 + (15-7)(15-10)/8 = 15.

Q2. Find the mode

Class F

4-6 3

6-8 5

8-10 10

10-12 6

Solution: The highest frequency is 10 which is for the interval 8-10.

Mode= L1 + (delta 1/ delta 1 + delta 2) (L2-L1) = 8 + (5/ 5+ 4) (10-8) = 9.11

Q3. Find the mean and median for the following data:

Class interval Freq. m fm C.f

4–8 5 6 30 5

8-12 6 10 60 11

12- 16 7 14 98 18

16-20 2 18 36 20
Solution: Mean= ∑fm/∑f = 224/20 = 11.2

Here n is the sum of the frequencies i.e. 20, n/2 = 10th term. From the C.f column, we see that the
10th term lies in the interval 8-12.

Median = L1 + (n/2 –C.f)(L2-L1)/f = 8 + (10-5)(12-8)/6 = 11.33

Q4. Find the mean and median for the following data:

Age F m fm C.f.

. 20-30 12 25 300 12

30-40 8 35 280 20

40-50 20 45 900 40

50-60 10 55 550 50

Solution: Mean= ∑fm/∑f = 2030/50 = 40.6

Here n is the sum of the frequencies i.e. 50. n/2 = 25th term. From the C.f column, we see that the
25th term lies in the interval 40-50

Median = L1 + (n/2 –C.f)(L2-L1)/f = 40 + (25-20)(50-40)/20 = 42.5

Q5. Find the mean and median for the following data:

Marks No. of students (f) m fm C.f

10-20 10 15 150 10

20-30 5 25 125 15

30-40 20 35 700 35
40-50 15 45 675 50

Solution: Mean= ∑fm/∑f = 1650/50 = 33

Here n is the sum of the frequencies i.e. 50. n/2 = 25th term. From the C.f column, we see that the
25th term lies in the interval 30-40

Median = L1 + (n/2 –C.f)(L2-L1)/f = 30 + (25-15)(40-30)/20 = 35

Q6. Find the median and mode.

X F

2 5

6 4

10 7

8 3

Solution: First arrange the X values in ascending order and write the corresponding frequencies.

X F C.f

2 5 5

6 4 9

8 3 12

10 7 19

Here n = 19, an odd number. So median = (19+1)/2 term = 10th term = 8

Mode=10 since 10 has the highest frequency.

Q7. Find the mean and median for the following data
Marks No. of students (f) m fm C.f

4-6 5 5 25 5

6-8 8 7 56 13

8-10 10 9 90 23

10-12 2 11 22 25

Solution: Mean= ∑fm/∑f = 193/25 = 7.72

Here n is the sum of the frequencies i.e. 25. n/2 = 12.5 th term. From the C.f column, we see that
the 12.5th term lies in the interval 6-8

Median = L1 + (n/2 –C.f)(L2-L1)/f = 6 + (12.5-5)(8-6)/8 = 7.875

You might also like