Professional Documents
Culture Documents
The arithmetic mean (or simply mean) is most widely used of all three measures. Here is a discussion on
these three measures regarding various issues:
1. Rigidity of definition: The mean is rigidly defined. Median is also rigidly defined, but not unique
if no. of observations is even. So is the mode unless there is more than one value with highest
frequency.
2. Comprehensibility: All the three measures are easily comprehensible.
3. Calculability: All three measures are more or less, easy to calculate.
4. Dependence on all observations: Mean is based on all observations. Median or mode is not.
Even if values are not available or altered, then also median or mode may be determined.
5. Effect of extreme values: Mean is affected by the presence of even a few extremely high or low
values, but median or mode is not.
6. Effect of open end classes: Mean cannot be determined unless we make some assumption
about the open end classes. But the presence of open end classes has no effect on median or
mode determination.
7. Effect of unequal class widths: If the classes are not of equal width, mean or median can be
computed without any difficulties but not the mode. For mode, at least modal class and its two
adjacent classes have to be of same width.
8. Algebraic treatment: The mean can be treated algebraically (e.g., the mean of combined group
may be determined from the group means). But the median or mode does not have this type of
algebraic treatment.
9. Reliability and Sampling fluctuations: Sampling fluctuation of mean (variation that expected in
mean value from sample to sample) is the least compared to median or mode. So, it is most
reliable. Mode has greater sampling fluctuation than median.
10. Qualitative/Non-numerical data: In case the observations cannot be measured numerically but
can be ranked in order (i.e., ordinal data), the median is suitable. It would not be hard to locate
the middlemost value but mean or mode is meaningless in such situations. Mode is appropriate
if we are looking for the most “usual value”. For this reason it is possible to locate mode for non-
numerical data which cannot be ranked in order (i.e., nominal data).
To analyse data using the mean, median and mode, we need to use the most appropriate measure of
central tendency. The following points should be remembered:
The mean is useful for predicting future results when there are no extreme values in the data
set. Of the three measures, it is the most sensitive measurement, because its value always
reflects the contributions of each of the data values in the group. Mean can be used where the
distribution is more or less symmetrical. The mean is applicable only to quantitative data.
1
The median may be more useful than the mean when there are extreme values in the data set
as it is not affected by the extreme values. Use median when there are large outliers in a data
set. Also median may be useful for finding representative value from ordinal data.
The mode is useful when the most common item, characteristic or value of a data set is
required. The mode is applicable to nominal, ordinal or any quantitative data.
Here is a summary:
Quantiles:
The quantile of order p or pth quantile (0 < p < 1) is a value of the variable which divides the
whole frequency distribution in two parts such that p-proportion of the total number of
observations are less than or equal to it and (1-p) proportion of the total number of
observations are greater than it. p= 0.5 refers to the median.
Calculation of quantiles:
Quartiles:
Quartiles are the points which divide the whole distribution i n four equal parts. There are 3
quartiles, viz. 1st quartile, 2nd quartile and 3rd quartile.
1st quartile divides the whole frequency distribution in 1:3 ratio. It is a value of the variable such that
25% (i.e., one-fourth) of the total observations fall below it and 75% (i.e., three-fourth) above. 2nd
quartile is nothing but the median.
3rd quartile divides the whole frequency distribution in 3:1 ratio i.e., 75% of the total observations fall
below it and 25% above.
Calculation of quartile:
3
For series data:
Step 2: For 1st quartile, obtain (n+1)/4, n being the total no. of observations. If (n+1)/4 is an
integer then (n+1)/4th ordered value gives Q1, otherwise we have to interpolate.
In that case, say, (n+1)/4 = I+F (I is the integral part and F is the fractional part).
Similarly, for 3rd quartile we have to obtain 3(n+1)/4 and proceed as above.
In a cumulative frequency (less than type) distribution table, the variate value corresponding to
smallest of the cumulative frequencies >= N/4 gives 1st quartile Q 1 and smallest of the
cumulative frequencies >=3N/4 gives 3rd quartile Q3, where N is the total frequency.
The class corresponding to the cumulative frequency just >= N/4 contains Q1 and the class
corresponding to the cumulative frequency just >= 3N/4 contains Q3, N being the total
frequency.
h( N4 F )
Q1 l
f
h ' ( 34N F ' )
Q3 l '
f'
where,
4
Deciles:
Deciles are the points which divide the whole distribution i n 10 equal parts. There are 9 deciles of
which 5thdecile is nothing but the median. Deciles are usually denoted by D1, D2, ...,D9.
Percentiles:
Percentiles are the summary measures that divide a ranked dataset into 100 equal parts. Each
ranked dataset has 99 percentiles. Percentiles are usually denoted by P 1, P2, ...,P99. Clearly 25th
percentile, P25=Q1, the 1st quartile, 50th percentile, P50=Q2, the median and 75th percentile,
P75=Q3, the 3rd quartile.
Calculation of percentiles:
If (n+1)k/100 is not an integer then suppose, (n+1)k/100 = I + F (I be the Integral part and F be
the fractional part). Then using interpolation,
5
The calculation is same as the pth quantile where p = k/100 for kth percentile. So replacing Np by
Nk/100 (in the formula of pth quantile) the k th percentile is given by,
Nk
h( 100 F)
Pk l
f
Percentile rank:
Percentile rank of an observation is the percentage of observations lying below or equal to it. It
is obtained from the above formula where Pk is known and k is to be obtained.
Percentile rank for a series data can be obtained by the following simple formula:
𝑃𝑒𝑟𝑐𝑒𝑛𝑡𝑎𝑔𝑒 𝑟𝑎𝑛𝑘 𝑜𝑓 𝑥𝑖
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑑𝑎𝑡𝑎 𝑣𝑎𝑙𝑢𝑒𝑠 𝑙𝑒𝑠𝑠 𝑡ℎ𝑎𝑛 𝑥𝑖 (𝑖. 𝑒. , 𝑐𝑢𝑚𝑢𝑙𝑎𝑡𝑖𝑣𝑒 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦(𝑙𝑒𝑠𝑠 𝑡ℎ𝑎𝑛 𝑡𝑦𝑝𝑒)𝑜𝑓 𝑥𝑖 )
=
𝑁(𝑇𝑜𝑡𝑎𝑙 𝑛𝑜. 𝑜𝑓 𝑣𝑎𝑙𝑢𝑒𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑑𝑎𝑡𝑎𝑠𝑒𝑡)
Determination of quantile by using less than type ogive is similar to the determination of
median. The pth quantile would be the value of the variable corresponding to the cumulative
frequency Np in y-axis of less than type ogive. In case of more than type ogive, pth quantile is
the variate value corresponding to N(1-p) in y-axis.
For 1st quartile the variate value (along x-axis) corresponding to N/4 (on y-axis) gives Q1 and
3N/4 gives Q3 (from less than type ogive).
For kth percentile, P k would be the value of the variable corresponding to the cumulative
frequency Nk/100 from less than type ogive.
6
7