Central Tendency - Note On Selected Topics

Measures of central tendency
Comparison of Mean (A.M.), Median and Mode:
The arithmetic mean (or simply mean) is most widely used of all three measures. Here is a discussion on
these three measures regarding various issues:
1. Rigidity of definition: The mean is rigidly defined. Median is also rigidly defined, but not unique
if no. of observations is even. So is the mode unless there is more than one value with highest
frequency.
2. Comprehensibility: All the three measures are easily comprehensible.
3. Calculability: All three measures are more or less, easy to calculate.
4. Dependence on all observations: Mean is based on all observations. Median or mode is not.
Even if values are not available or altered, then also median or mode may be determined.
5. Effect of extreme values: Mean is affected by the presence of even a few extremely high or low
values, but median or mode is not.
6. Effect of open end classes: Mean cannot be determined unless we make some assumption
about the open end classes. But the presence of open end classes has no effect on median or
mode determination.
7. Effect of unequal class widths: If the classes are not of equal width, mean or median can be
computed without any difficulties but not the mode. For mode, at least modal class and its two
adjacent classes have to be of same width.
8. Algebraic treatment: The mean can be treated algebraically (e.g., the mean of combined group
may be determined from the group means). But the median or mode does not have this type of
algebraic treatment.
9. Reliability and Sampling fluctuations: Sampling fluctuation of mean (variation that expected in
mean value from sample to sample) is the least compared to median or mode. So, it is most
reliable. Mode has greater sampling fluctuation than median.
10. Qualitative/Non-numerical data: In case the observations cannot be measured numerically but
can be ranked in order (i.e., ordinal data), the median is suitable. It would not be hard to locate
the middlemost value but mean or mode is meaningless in such situations. Mode is appropriate
if we are looking for the most “usual value”. For this reason it is possible to locate mode for non-
numerical data which cannot be ranked in order (i.e., nominal data).
Use of Mean, Median and Mode:
To analyse data using the mean, median and mode, we need to use the most appropriate measure of
central tendency. The following points should be remembered:
 The mean is useful for predicting future results when there are no extreme values in the data
set. Of the three measures, it is the most sensitive measurement, because its value always
reflects the contributions of each of the data values in the group. Mean can be used where the
distribution is more or less symmetrical. The mean is applicable only to quantitative data.
1
 The median may be more useful than the mean when there are extreme values in the data set
as it is not affected by the extreme values. Use median when there are large outliers in a data
set. Also median may be useful for finding representative value from ordinal data.
 The mode is useful when the most common item, characteristic or value of a data set is
required. The mode is applicable to nominal, ordinal or any quantitative data.
Here is a summary:
Type of Variable Best measure of central tendency

Nominal Mode
Ordinal Median
Quantitative (not skewed) Mean
Quantitative (skewed) Median
Quantiles:
The quantile of order p or pth quantile (0 < p < 1) is a value of the variable which divides the
whole frequency distribution in two parts such that p-proportion of the total number of
observations are less than or equal to it and (1-p) proportion of the total number of
observations are greater than it. p= 0.5 refers to the median.
Some p-quantiles have special names:
 The 2-quantile is called the median

 The 3-quantiles are called tertiles or terciles
 The 4-quantiles are called quartiles
 The 5-quantiles are called quintiles
 The 6-quantiles are called sextiles
 The 10-quantiles are called deciles
 The 12-quantiles are called duo-deciles
 The 20-quantiles are called vigintiles
 The 100-quantiles are called percentiles
 The 1000-quantiles are called permilles
Calculation of quantiles:
A. Quantiles for series data:

Let n be the total no. of observations and we want to find the pth quantile, Zp. The steps
for calculating Zp are:
1) Arrange the observations in an ascending order, x (1)≤ x (2)≤ ...≤ x (n)
2) Calculate Q = (n+1)p. Suppose Q is such that, r < Q < r+1, where r is the largest integer not
exceeding Q.
3) Let x (r) and x (r+1) be the rth and (r+1)th order observations. Then, we have (using interpolation)
2
Z p  x( r ) (n  1) p  r

x( r 1)  x( r ) (r  1)  r
 Z p  x( r )  {(n  1) p  r}{x( r 1)  x( r ) }
When, (n+1)p is an integer r the 2nd term of the above vanished and Zp= x(r)
B. Quantiles for ungrouped frequency distribution:

Step 1: Prepare cumulative frequency (less than type) distribution table.
Step 2: Calculate Np, where N is the total frequency.
Step 3: Identify the class corresponding to the cumulative frequency just greater than or
equal to Np i.e., Fk-1< Np <= Fk, where Fi is the cumulative frequency of ith class. The
class value corresponding to this class (i.e., kth class) gives pth quantile.
C. Quantiles for grouped frequency distribution:

Here first identify the pth quantile class as the class for which the cumulative frequency
(less than type) just exceeds Np, N=Total frequency. The pth quantile, Zp is given by,
h( Np  F )
Zp  l 
f
where,
l  lower class boundary of the pth quantile class
h width of the pth quantile class
f  frequency of the pth quantile class
F  cumulative frequency (less than type) of the class previous to the pth quantile class
Special cases of quantiles:
Quartiles:
Quartiles are the points which divide the whole distribution i n four equal parts. There are 3
quartiles, viz. 1st quartile, 2nd quartile and 3rd quartile.
1st quartile divides the whole frequency distribution in 1:3 ratio. It is a value of the variable such that
25% (i.e., one-fourth) of the total observations fall below it and 75% (i.e., three-fourth) above. 2nd
quartile is nothing but the median.
3rd quartile divides the whole frequency distribution in 3:1 ratio i.e., 75% of the total observations fall
below it and 25% above.
Calculation of quartile:
3
For series data:
Step 1: Arrange the data in ascending order.
Step 2: For 1st quartile, obtain (n+1)/4, n being the total no. of observations. If (n+1)/4 is an
integer then (n+1)/4th ordered value gives Q1, otherwise we have to interpolate.
In that case, say, (n+1)/4 = I+F (I is the integral part and F is the fractional part).
Then, Q1=Ith value + F.( (I+1)th value - Ith value)
Similarly, for 3rd quartile we have to obtain 3(n+1)/4 and proceed as above.
For ungrouped frequency distribution:
In a cumulative frequency (less than type) distribution table, the variate value corresponding to
smallest of the cumulative frequencies >= N/4 gives 1st quartile Q 1 and smallest of the
cumulative frequencies >=3N/4 gives 3rd quartile Q3, where N is the total frequency.
For grouped frequency distribution:
Step 1: Obtain cumulative frequency distribution (less than type).
Step 2: Identify the classes containing quartiles as follows:
The class corresponding to the cumulative frequency just >= N/4 contains Q1 and the class
corresponding to the cumulative frequency just >= 3N/4 contains Q3, N being the total
frequency.
Step 3: Now Q 1 and Q3 are given by,
h( N4  F )
Q1  l 
f
h ' ( 34N  F ' )
Q3  l ' 
f'
where,
l (l’)  lower class boundary of the Q1(Q3 ) containing class

h (h’) width of the Q1(Q3 ) containing class
f (f’) frequency of the Q1(Q3 ) containing class
F (F’) cumulative frequency (less than type) of the class previous to the Q1(Q3 ) containing
class
4
Deciles:
Deciles are the points which divide the whole distribution i n 10 equal parts. There are 9 deciles of
which 5thdecile is nothing but the median. Deciles are usually denoted by D1, D2, ...,D9.
Percentiles:
Percentiles are the summary measures that divide a ranked dataset into 100 equal parts. Each
ranked dataset has 99 percentiles. Percentiles are usually denoted by P 1, P2, ...,P99. Clearly 25th
percentile, P25=Q1, the 1st quartile, 50th percentile, P50=Q2, the median and 75th percentile,
P75=Q3, the 3rd quartile.
Calculation of percentiles:
For series data:
Let n be the total number of observations. We want to obtain k th percentile Pk.
Step 1: Arrange the observations in ascending order
Step 2: Obtain (n+1)k/100. If this is an integer then P k is (n+1)k/100th order observation.
If (n+1)k/100 is not an integer then suppose, (n+1)k/100 = I + F (I be the Integral part and F be
the fractional part). Then using interpolation,
Pk = Ith observation + F.((I+1)th observation –Ith observation)
For frequency distribution:
5
The calculation is same as the pth quantile where p = k/100 for kth percentile. So replacing Np by
Nk/100 (in the formula of pth quantile) the k th percentile is given by,
Nk
h( 100  F)
Pk  l 
f
Percentile rank:
Percentile rank of an observation is the percentage of observations lying below or equal to it. It
is obtained from the above formula where Pk is known and k is to be obtained.
Percentile rank for a series data can be obtained by the following simple formula:
𝑃𝑒𝑟𝑐𝑒𝑛𝑡𝑎𝑔𝑒 𝑟𝑎𝑛𝑘 𝑜𝑓 𝑥𝑖
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑑𝑎𝑡𝑎 𝑣𝑎𝑙𝑢𝑒𝑠 𝑙𝑒𝑠𝑠 𝑡ℎ𝑎𝑛 𝑥𝑖 (𝑖. 𝑒. , 𝑐𝑢𝑚𝑢𝑙𝑎𝑡𝑖𝑣𝑒 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦(𝑙𝑒𝑠𝑠 𝑡ℎ𝑎𝑛 𝑡𝑦𝑝𝑒)𝑜𝑓 𝑥𝑖 )
=
𝑁(𝑇𝑜𝑡𝑎𝑙 𝑛𝑜. 𝑜𝑓 𝑣𝑎𝑙𝑢𝑒𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑑𝑎𝑡𝑎𝑠𝑒𝑡)
Determination of quantile using graphical method:
Determination of quantile by using less than type ogive is similar to the determination of
median. The pth quantile would be the value of the variable corresponding to the cumulative
frequency Np in y-axis of less than type ogive. In case of more than type ogive, pth quantile is
the variate value corresponding to N(1-p) in y-axis.
For 1st quartile the variate value (along x-axis) corresponding to N/4 (on y-axis) gives Q1 and
3N/4 gives Q3 (from less than type ogive).
For kth percentile, P k would be the value of the variable corresponding to the cumulative
frequency Nk/100 from less than type ogive.
6
7

Central Tendency - Note On Selected Topics

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Central Tendency - Note On Selected Topics

Uploaded by

Copyright:

Available Formats

Measures of central tendency

Comparison of Mean (A.M.), Median and Mode:

Use of Mean, Median and Mode:

Type of Variable Best measure of central tendency

Some p-quantiles have special names:

 The 2-quantile is called the median

A. Quantiles for series data:

B. Quantiles for ungrouped frequency distribution:

C. Quantiles for grouped frequency distribution:

Special cases of quantiles:

Step 1: Arrange the data in ascending order.

Then, Q1=Ith value + F.( (I+1)th value - Ith value)

For ungrouped frequency distribution:

For grouped frequency distribution:

Step 1: Obtain cumulative frequency distribution (less than type).

Step 2: Identify the classes containing quartiles as follows:

Step 3: Now Q 1 and Q3 are given by,

l (l’)  lower class boundary of the Q1(Q3 ) containing class

For series data:

Let n be the total number of observations. We want to obtain k th percentile Pk.

Step 1: Arrange the observations in ascending order

Step 2: Obtain (n+1)k/100. If this is an integer then P k is (n+1)k/100th order observation.

Pk = Ith observation + F.((I+1)th observation –Ith observation)

For frequency distribution:

Determination of quantile using graphical method:

You might also like