You are on page 1of 26

Measure of central tendency

Competency to be gained
from this lecture
Calculate a measure of central tendency
that is adapted to the sample studied
Key issues
Measures of central tendency
Mode
Median
Mean
Appropriate applications
Quick definitions of
measures of central tendency
Mode
The most frequently occuring observation
Median
The mid-point of a set of ordered observations
Mean
The product of the division of the arithmetic sum
of observations by the number of observations
The mode
Definition
The mode of a distribution is the value that is
observed most frequently in a given set of data
How to obtain it?
Arrange the data in sequence from low to high
Count the number of times each value occurs
The most frequently occurring value is the mode
Mode
The mode
0
2
4
6
8
10
12
14
16
18
20
Mode
Mode
Examples of mode (2/2):
Incubation period for hepatitis affected
persons (in days)
29, 31, 24, 29, 30, 25
Arranging the values in order:
24, 25, 29, 29, 30, 31
Mode is 29
Mode
The mode is the only location statistics
to be used when some characteristic
itself cannot be measured
Colour preference of people for their cars
Colour preference Number of people
Green 354
Blue 852
Gray 310
Red 474
Mode
Mode
Specific features of the mode
There may be no mode
When each value is unique
There may be more than one mode
When more than 1 peak occurs
Bimodal distribution
The mode can be misinterpreted
Is a distribution skewed or bimodal ?
The mode is not amenable to statistical tests
The mode is not based upon all observations
Mode
The median
The median describes literally the middle
value of the data
It is defined as the value above or below
which half (50%) the observations fall
Median
Computing the median
Arrange the observations in order from
smallest to largest (ascending order) or vice-
versa
Count the number of observations n
If n is an odd number
Median = value of the (n+1) / 2th observation
If n is an even number
Median = the average of the n / 2th and (n /2)+1th
observations
Median
Computing the Median, Example
Example of median calculation
What is the median of the following values:
10, 20, 12, 3, 18, 16, 14, 25, 2
Arrange the numbers in increasing order
2 , 3, 10, 12, 14, 16, 18, 20, 25
Median = 14
Suppose there is one more observation (8)
2 , 3, 8, 10, 12, 14, 16, 18, 20, 25
Median = Mean of 12 & 14 = 13
Median
Advantages and disadvantages
of the median
Advantages
The median is unaffected by extreme values
Disadvantages
The median does not contain information on the
other values of the distribution
Only selected by its rank
You can change 50% of the values without affecting the
median
The median is less amenable to statistical tests
Median
Median
0
2
4
6
8
10
12
14
Class of the variable
0
2
4
6
8
10
12
14
Class of the variable
The median is not sensitive to
extreme values
Median
Same median
Mean (Arithmetic mean / Average)
Most commonly used measure of location
Definition
Calculated by adding all observed values and
dividing by the total number of observations
Notations
Each observation is denoted as x1, x2, xn
The total number of observations: n
Summation process = Sigma :
The mean: X
X = xi /n
Mean
Computation of the mean
Duration of stay in days in a hospital
8,25,7,5,8,3,10,12,9
9 observations (n=9)
Sum of all observations = 87
Mean duration of stay = 87 / 9 = 9.67
Incubation period in days of a disease
8,45,7,5,8,3,10,12,9
9 observations (n=9)
Sum of all observations =107
Mean incubation period = 107 / 9 = 11.89
Mean
Advantages and disadvantages
of the mean
Advantages
Has a lot of good theoretical properties
Used as the basis of many statistical tests
Good summary statistic for a symmetrical
distribution
Disadvantages
Less useful for an asymmetric distribution
Can be distorted by outliers, therefore giving a less
typical value
Mean
Mean of several groups combined

Group
( i )
Size
( n
i
)
Mean
( x
i
)
Sum
(n
i
x
i
)

1

2

3

10

15

25

41

36

42

410

540

1050
Total 50 -- 2000

Mean of all groups = 2000 / 50 = 40 Crude average = 39.7
The geometric mean
Background
Some distribution appear symmetric after log
transformation
(e.g., Neutrophil counts)
A log transformation may help describing the
central tendency
Definition
The geometric mean is the antilog of the mean of
the log values
Geometric mean
Calculating a geometric mean
Observe the set of observations
5,10,20,25,40
Take the logarithm of these values
0.70, 1.00, 1.30, 1.40 & 1.60.
Calculate the mean of the log values
0.70 + 1.00 + 1.30 + 1.40 + 1.60 = 6.00
6.00/ 5 = 1.20
Take the antilog of the mean of the log values
Antilog (1.20) = 15.85
Geometric mean
Geometric mean of several groups
combined
Overall GM = antilog of ( 48.42 / 50) = antilog ( 0.9684 ) = 9.3
Group
(i)
Number of
patients
(n
i
)
Geometric
mean (GM) log GM

ni * log GM

A

B

C

20

18

12


8.5

10.2

9.4


0.93

1.01

0.97

18.60

18.18

11.64

Total

50 -- -- 48.42

Geometric mean
0
2
4
6
8
10
12
14
Mean = 10.8
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Median = 10 Mode = 13.5
Choosing
What measure of location to use?
Consider the duration (days) of absence from
work of 21 labourers owing to sickness
1, 1, 2, 2, 3, 3, 4, 4, 4, 4, 5, 6, 6, 6, 7, 8, 9, 10,
10, 59, 80
Mean = 11 days
Not typical of the series as 19 of the 21 labourers
were absent for less than 11 days
Distorted by extreme values
Median = 5 days
Better measure
Choosing
Choice of measure of central tendency
for symmetric distributions
Any one of the central/location measures
can be used
The mean has definite advantages if
subsequent computations are needed
Choosing
Choice of measure of central tendency
for asymmetric distributions
For skewed distributions, the mean is not
suitable
Positive skewed: Mean gives a higher value
Negatively skewed: Mean gives a lower value
If some observations deviate much more
than others in the series, then median is the
appropriate measure
If the log-transformed distribution is
symmetric, the geometric mean may be used
Choosing
Key messages
The mode is the most common value
The median is adapted when there are
extreme values
The mean is adapted for symmetric
distribution
The geometric mean may be useful when log
transformed data are symmetric
The type of the distribution determines the
measure of central tendency to use

You might also like