You are on page 1of 16

DESCRIBING DATA USING

NUMERICAL MEASURES
After collection and presentation of data in tables, graphs, we need
more exact measures. In these cases, we can use single numbers to
describe characteristics of a data set.
To summarize a data set to describe various characteristics, we use
 measures of Central Tendency,
 Dispersion,
 Skewness and
 Kurtosis.
Central Tendency
 A measure of central tendency is a single value that attempts to
describe a set of data by identifying the central position within
that set of data.
 As such, measures of central tendency are sometimes called
measures of central location.
 The mean (often called the average) is most likely the measure of
central tendency that you are most familiar with, but there are
others, such as, the median and the mode.
 The mean, median and mode are all valid measures of central
tendency but, under different conditions, some measures of
central tendency become more appropriate to use than others.
MEASURES OF CENTRAL TENDENCY
The Arithmetic Mean
1. Arithmetic mean for ungrouped data (individual series)

 
2. Arithmetic mean for grouped data (discrete and continuous
distribution)
 

 
Where,n = ∑f = number of observations = total frequency
x = value of a variable for discrete frequency distribution or midpoint for
each class for grouped/continuous frequency distribution
Coding Data
Change of origin (Short-cut method)

Change of origin & scale (Step-deviation method)


 

 
Where,
d=X–a
d’ = (X - a)/h
a = assumed mean
h = width of the class interval
The Weighted Mean

The weighted mean enables us to calculate an average


that takes into account the importance of each value
to the overall total. The weighted mean is given by

Where
w = weight
Advantages of A.M.
 Its concept is familiar to most people and intuitively clear.
 It is easy to compute and simple to understand.
 It is based on all the observations.
 It is useful for performing statistical procedures such as comparing the means from
several data sets.
 
Disadvantages of A.M.
 It is affected by extreme values that are not representative of the rest of the data.
 It is tedious to compute the mean when we encountered with large number of data.
 It cannot be computed for a data set that has open-end classes at either the high or
low end of the scale.
 It is not suitable for measuring qualitative characteristics like beauty, honesty,
intelligence, etc.
 It is also not recommended as a measure of location if the data is highly skewed.
Properties of A.M.
 The algebraic sum of the deviations of the given set of
observations from their arithmetic mean is always zero.

 Combined mean,

 is minimum
Geometric Mean
 Geometric mean is specially useful when we are dealing with
quantities that change over a period of time, such as: averaging ratios,
percentages, and rates of increase over a period of several years.
 The average rate of growth of population, or average increase in the
rate of profits, sales, production, or rate of money, or construction of
index numbers are some important examples of uses of geometric
mean in social, economic and management sciences.

 
Median
Median is the value which divides a set of data into two equal parts.
The values of the variable in the first part are less than median and

others in the second part greater than median.


As a value of median always lies in the central position of a distribution,

median is also called positional average.


Median of Ungrouped Data (Individual data)
First array the data in ascending or descending order of their
magnitudes.
When number of observations is odd, median is the middle value of the

arranged series or distribution.


In case of even number of observations, median is obtained as the

arithmetic mean of the two middle observations of the arranged series or


distribution.
The median value is calculated by the formula:
Median of Grouped Data
 calculate N/2
 find the c.f. just greater than N/2
 the class corresponding to the c.f. just greater than N/2 is the median class

Where,
f = frequency of the median class
l = lower limit of the median class
h = class width of the median class
N = total number of observations/total frequency
c.f. = cumulative frequency of the class preceding the median class
Advantages of Median
 Median is not affected by the extreme values.
 Median can be used for the qualitative descriptions.
 Median can be calculated for the open-end classes.
 When the distribution is skewed positively or negatively, the
median is often the best measure of location because it always
between the mean and mode.

Disadvantages of Median
 The set of data should be arranged and this is time consuming
for the data set with a large number of observations.
 Median is not used for further statistical treatments.
Partition Values
Partition values are the values which divide the given distribution into
number of equal parts.
Quartiles/Deciles/Percentiles
 Divides a distribution into 4/10/100 equal parts
 Quartiles :- Q1, Q2 and Q3
 Deciles: - D1, D2,…………., D9
 Percentiles :- P1, P2,………………..., P99
Ungrouped data (individual series)

Qi = th item where, i = 1, 2, 3

Qi = th item where, i = 1, 2, ……, 9

Qi = th item where, i = 1, 2, ………., 99


Quartiles/Deciles/Percentiles of Grouped Data
 calculate N/4 or N/10 or N/100
 find the c.f. just greater than N/4 or N/10 or N/100
 the class corresponding to the c.f. just greater than N/4 or N/10 or
N/100 is the required quartile/decile/percentile class

Where,
f = frequency of the quartile/decile/percentile class
l = lower limit of the quartile/decile/percentile class
h = class width of the quartile/decile/percentile class
N = total number of observations/total frequency
c.f. = cumulative frequency of the class preceding the
quartile/decile/percentile class
Mode
 Mode is the value of a variable which occurs most
frequently in a set of observations.
 If the distribution has a single mode, or two modes, or
more modes, then it is called a uni-modal, bimodal and
multimodal distributions respectively.
 In case of bimodal or multimodal distribution, mode is
not a representative measure of location.
 For bimodal or multimodal distribution, whenever we use
the mode as a measure of the central tendency of a data
set, we should calculate the mode from grouped data.
Mode for Grouped Data
 First,the modal class, the class with the highest frequency, is
identified.
 Mode is given by

Where,
l = lower limit of the modal class
f1 = frequency of the modal class
f0 = frequency of the class preceding the modal class
f2 = frequency of the class following the modal class
h = width of the modal class interval
Advantages of Mode
 It can be used as a central location for qualitative as well as
quantitative data.
 It can be calculated for open end classes data.
 It is not affected by extreme values.

Disadvantages of Mode
 For the data set which contains two, three or many modes, they are
difficult to interpret and compare.
 It is not based on all the observations.
 It is not used for further statistical treatments.
 It cannot be calculated for the data set, which contains no values that
occur more than once or every value occurs the same number of times.

You might also like