You are on page 1of 9


Statistics is the science dealing with the collection, organizing, analysis and
interpretation of numerical data.

Two topics mainly are covered under statistics:-

 Measures of central tendency (average)

 Measures of variation (dispersion)

Measures of central tendency

Measures of central tendency are measures of the location of the middle or the
center of a distribution. The definition of "middle" or "center" is purposely left
somewhat vague so that the term "central tendency" can refer to a wide variety of
measures. The mean is the most commonly used measure of central tendency. The
following measures of central tendency are discussed :-

In mathematics and statistics, the arithmetic mean, often referred to as
simply the mean or average when the context is clear, is a method to derive
the central tendency of a sample space.

Merits and demerits of mean

 Mean is well understood by most people.
 Computation of mean is easy.
 Finds the most accurate average of the set of numbers.
 Mean is the average-It is best used when there are no extreme values.
 It is unique - there is only one answer and is useful when comparing sets of
 It is based on all observation and it can be regarded as representative of data.
 Arithmetic mean can be computed even if the detailed distribution is not
known but some of the observation and number of the observation are
 It is least affected by the fluctuation of sampling


 Sensitive to extreme values.
 It can neither be determined by inspection or by graphical location.
 Arithmetic mean cannot be computed for qualitative data like data on
intelligence honesty and smoking habit etc.
 It is too much affected by extreme observations and hence it is not
adequately represent data consisting of some extreme point.
 Arithmetic mean cannot be computed when class intervals have open
 For example:
 X={1,1,1,1,2,9}, mean(X)=2.5 which does not reflect the
actually central tendency of this set of numbers.
In probability theory and statistics, a median is described as the numeric
value separating the higher half of a sample, a population, or a
probability distribution, from the lower half. The median of a finite list of
numbers can be found by arranging all the observations from lowest
value to highest value and picking the middle one. If there is an even
number of observations, then there is no single middle value; the median
is then usually defined to be the mean of the two middle values.


 Simplicity:- It is very simple measure of the central tendency of

the series. I the case of simple statistical series, just a glance at the
data is enough to locate the median value. It can easy to find out by
 Median can be determined even when class intervals have open
 Median can also be located graphically
 It is not much affected by extreme observations and also
interdependent of range or dispersion of the data.
 It is only suitable average when the data are qualitative & it is
possible to rank various items according to qualitative
 It is not influenced by extreme values.
 Useful when comparing sets of data.
 It is unique - there is only one answer.

 Not as popular as mean as it does not take all the data into account.
 If the gap between some numbers is large, while it is small between
other numbers in the data, this can cause the median to be a very
inaccurate way to find the middle of a set of values.
 Lack of representative character: - Median fails to be a representative
measure in case of such series the different values of which are wide
apart from each other. Also, median is of limited representative
character as it is not based on all the items in the series.
 Unrealistic:- When the median is located somewhere between the two
middle values, it remains only an approximate measure, not a precise
 Lack of algebraic treatment: - Arithmetic mean is capable of further
algebraic treatment, but median is not. For example, multiplying the
median with the number of items in the series will not give us the
sum total of the values of the series.
 When the number of items are small, median may not be
representative, because it is a positional average.
In statistics, the mode is the value that occurs the most frequently in a data
set or a probability distribution. In some fields, notably education, sample
data are often called scores, and the sample mode is known as the modal

 Simple and popular: - Mode is very simple measure of central tendency.
Sometimes, just at the series is enough to locate the model value. Because of
its simplicity, its a very popular measure of the central tendency.
 Less effect of marginal values: - Compared top mean, mode is less affected
by marginal values in the series. Mode is determined only by the value with
highest frequencies.
 Graphic presentation:- Mode can be located graphically, with the help of
 Best representative: - Mode is that value which occurs most frequently in the
series. Accordingly, mode is the best representative value of the series.
 No need of knowing all the items or frequencies: - The calculation of mode
does not require knowledge of all the items and frequencies of a distribution.
In simple series, it is enough if one knows the items with highest frequencies
in the distribution.
 Extreme values (outliers) do not affect the mode.
 It represents the most typical value in the distribution.
 Allows you to see what value happened the most in a set of data. This can
help you to figure out things in a different way. It is also quick and easy.
 It can be determined even if distribution has open end classes.
 It is value around which more concentrations of observations and hence the
best representative of data.
 Uncertain and vague: - Mode is an uncertain and vague measure of the
central tendency. It may not be uniquely defined
 Example: X={1,1,2,2}, Mode(X)=1 or 2.
 Not capable of algebraic treatment: - Unlike mean, mode is not capable of
further algebraic treatment.
 Difficult: - With frequencies of all items are identical, it is difficult to
identify the modal value.
 Complex procedure of grouping:- Calculation of mode involves
cumbersome procedure of grouping the data. If the extent of grouping
changes there will be a change in the model value.
 Ignores extreme marginal frequencies:- It ignores extreme marginal
frequencies. To that extent model value is not a representative value of all
the items in a series.
 It is not based on all observations.
 It is not capable of further mathematical treatment.
 It is much affected by fluctuations of sampling.
 It is not suitable when different items of data are unequal importance.
 When no values repeat in the data set, the mode is every value and is useless.

In probability theory and statistics, the variance is used as one of several
descriptors of a distribution. It describes how far values lie from the mean. In
particular, the variance is one of the moments of a distribution. In that context,
it forms part of a systematic approach to distinguishing between probability
distributions. While other such approaches have been developed, those based on
moments are advantageous in terms of mathematical and computational
 This variance analysis can lead to the identification of certain types of task
that frequently overrun their budget whilst other tasks may be seen to
regularly come in under their budget.
 Variance analysis is intrinsically connected with planned and actual results
and effects of the difference between those two on the performance of the
entity or company.
 Excellent measure of manufacturing efficiency and productivity.
 Helps you understand the root causes of deviations
impacting your Profit & loss.
 Opportunities for improvement lie on both sides of the
Variance equation.
 Variance analysis is a very important management tool to understand the
cost behavior and take appropriate actions for controls where necessary.
 The analysis is used to monitor the areas where cost overrun are frequent to
assess if the standard costs fixed for the activity or materials is reasonable or
the process itself needs to be closely monitored.
 The analysis also identifies the areas of efficiency within the processes and
can be used for rewarding the efficiency, just like it can be used to take
corrective actions for areas with adverse variance. It must however be noted
that this exercise must be properly carried out and analyzed before arriving
at any conclusions.
 This variance analysis can lead to the identification of certain types of task
that frequently overrun their budget whilst other tasks may be seen to
regularly come in under their budget.

 The monitoring cycle can be so long that it renders the application of control
impossible. Typically, by the time a problem has been identified through
variance analysis it is too late to take corrective action. This is a major
shortcoming of variance analysis and highlights the need for a monitoring
system that depicts the current status of the project more effectively.
 One of the major disadvantages of variance analysis is that it a post-facto
exercise and generally takes a longer time to assess the impact of the
variance and consequently results in delayed corrective actions.
 This drawback can be effectively addressed by maintaining monitoring and
data collecting systems that work more on real time basis as opposed to the
traditional systems that provide the requisite data after a delayed gap after
incurrence. Besides, the positives out of this exercise far out-weigh the dis-
advantages associated with this exercise.


In probability theory and statistics, the standard deviation of a statistical

population, a data set, or a probability distribution is the square root of its
variance. Standard deviation is a widely used measure of the variability or
dispersion, being algebraically more tractable though practically less robust
than the expected deviation or average absolute deviation. It shows how
much variation there is from the "average" (mean or expected/budgeted
value). A low standard deviation indicates that the data points tend to be
very close to the mean, whereas high standard deviation indicates that the
data is spread out over a large range of values.The standard deviation of a
statistical data is defined as the positive square root of the A.M of the
squared deviations of items from the A.M. of the series under consideration.

 It is a rigidly defined measure of dispersion.

 It is based on all the observations.
 It is capable of being treated mathematically. For example, if standard
deviations of a number of groups are known, their combined standard
deviation can be computed.
 It is not very much affected by the fluctuations of sampling and therefore is
widely used in sampling theory and test of significance.
 This takes care of both positive and negative deviation.
 It is based on every item of the data.
 It is least affected by variation in sampling.
 Standard deviation can be used to compare the dispersions of two or more
distributions when their units of measurements and arithmetic means are
same It is used to test the reliability of mean.
 It is amenable to algebraic treatment and possesses many mathematical
properties. It is on account of these properties that standard deviation is used
in many advanced studies.
 The standard deviation of a statistical data is defined as the positive square
root of the A.M of the squared deviations of items from the A.M. of the
series under consideration.

Squaring the difference has at least three advantages:

1. Squaring makes each term positive so that values above the mean do not
cancel values below the mean.
2. Squaring adds more weighting to the larger differences, and in many cases
this extra weighting is appropriate since points further from the mean may
be more significant.
3. The mathematics are comparatively convenient when using this measure in
subsequent statistical calculations.

The standard deviation is reported as the square root of the variance and the
units then correspond to those of the data set.The computation and notation of the
variance and standard deviation depends on whether we are considering the entire
population or a sample set.


 As compared to the quartile deviation and range etc, it is difficult to
understand and difficult to calculate.
 It gives more importance to extreme observation.
 Since it depends upon the units of measurement of the observations, it
cannot be used to compare the dispersion of the distribution expressed in
different units.
 Standard deviation is not easy to calculate, nor is it easily understood. In any
case it is more cumbersome in its calculation than either quartile deviation or
mean deviation.
 It gives more weight to extreme items and less to those which are near the
mean, because the squares of the deviations, which are big in size, would be
proportionately greater than the squares of those deviations which are
comparatively small. Thus, deviation 2 and 8 are in the ratio of 1:4 but their
square i.e, 4 and 64 would be in the ratio of 1 : 16.


Despite the drawbacks mentioned above the standard deviation is the best measure
of dispersion and should be used wherever possible. Just as mean is the best
measure of central tendency (leaving exceptional cases) standard deviation is the
best measure of dispersion, excepting a few cases where mean deviation or quartile
deviation may give better results.

However since standard deviation gives greater weight to extreme items, it does
not find much favor with economists and businessmen who are more interested in
the results of the model class.