You are on page 1of 13

CHAPTER

Measures of Central Tendency 3


Chapter Contents
3.1. Definition of center
In unit two we have learnt about how one could gain useful information
3.2. Review of Algebra
from raw data by organizing (grouping) it in to a frequency distribution 3.3. Describing center
table, and then presenting data by using various statistical tools such as 3.4. Quantiles
graphs and diagrams. 3.5. Review questions

In this unit (unit three) we will learn about data analysis: description
and summarization – the 4th stage in conducting any statistical study/
investigation.

Unit objectives
1. Define what we mean by measures of central tendency.
2. Summarize data using measures of central tendency such as the mean,
median and mode.
3 Identify the position of a data value in a data set using measures of
position such as percentiles, deciles and quartiles.

Learning outcomes
After completing this section, successful students will be able to:
• Define what is meant by measure of central tendency.
• Tell the objectives of knowing measures of center.
• Understand properties of a good measure

Key words
Deciles ,Mean , Median, Mode, Percentiles, Quartiles
Review of algebra 30

3.1 Definition of Central Tendency

Definition 3.1. Measures of central tendency are numerical values that


locate, in some sense, the center of a data set. Due to this reason, they
are sometimes called measures of location or measures of averaging or
summary statistics.

Objectives of averaging (or MCT)


Why it is necessary to measure the center of a data set?
• To get one single value that can describe the characteristics of the entire
group.
• To facilitate comparisons between two or more different data sets.
• For summarizing data or reducing data size.

Properties of a good measure of center


We say a measure of central tendency is best or good if it posses most of
the following properties. It should be:
X simple to understand and easy to calculate/ interpret.
X exist and be unique i.e., rigidly defined by mathematical formula.
X based on all observations.
X not seriously affected by extreme observations/ outliers.
X capable of further statistical analysis and/or manipulation

3.2 Review of algebra

Definition 3.2. Statistical notation refers to the standardized code for


symbolizing the mathematical operations performed in the formulas
and the answers we obtain.

A.Summation notation (Σ)

Sum of terms such asX1 + X2 + X3 + X4 + X5 , is often designated by the


symbol, 5i=1 Xi .
P

5
X1 + X 2 + X3 + X4 + X 5 = (3.2.1)
X
Xi
i=1

is the capital Greek letter read as ”sigma”, and in this connection it is


P

read as ”the sum of...” and is called the summation sign. The letter ”i” is

Lecture notes (set by: Z)


Review of algebra 31

called the summation index. The term following is called the summand.
P

In our example Xi is the summand.


The ”i = 1” below indicates that the first term of the sum is obtained
P

by putting i = 1 in the summand . The five above indicates that the


P

final term of the sum is obtained by substituting i = 5 in the summand .


The other terms of the sum are obtained by giving ”i” the integral values
between the limits 1 and 5.

B.Product notation ( )
Q

An analogous notation for the product is obtained by substituting the Greek


capital letter for (read as ”pi”). In this case the terms resulting from
Q P

substituting the integers for the index are multiplied instead of added.
Product of terms such as X1 ∗ X2 ∗ X3 ∗ X4 ∗ X5 , is often denoted by short
hand notation, 5i=1 Xi .
Q

5
X1 ∗ X 2 ∗ X3 ∗ X4 ∗ X 5 = (3.2.2)
Y
Xi
i=1

C.Algebra of summations (AoS)

The following rules are commonly called the Algebra of summations. If X


and Y are two variables and c is a constant number, then

n
Xi = X1 + X2 + ... + Xn (3.2.3)
X

i=1

n
Xi2 = X12 + X22 + ... + Xn2 (3.2.4)
X

i=1

n
c = |c + c + ... + c} = nc (3.2.5)
X
{z
i=1 ntimes

n n
c ∗ Xi = cX1 + ... + cXn = c ∗ (3.2.6)
X X
Xi
i=1 i=1

n n n
(Xi + Yi ) = Xi + (3.2.7)
X X X
Yi ,
i=1 i=1 i=1

n
Xi Yi = X1 Y1 + X2 Y2 + ... + Xn Yn (3.2.8)
X

i=1

Example 3.1. ll
mm

Lecture notes (set by: Z)


Describing the center of the distribution/ or the data set 32

3.3 Describing the center of the distribution/ or


the data set
The most common summary measures used to describe the center of the
distribution are the mean, the median, and the mode. 1 1

From these, the mean is best known and frequently used measure. Their
formula is defined in raw data and grouped data as well. Now it is time to
deal the properties and formulas of each measure one by one.

3.3.1 The mean or arithmetic mean


Figure 3.1. Center measures
Definition 3.3. The mean is the sum of the observations divided by
the total number of observations. It is denoted by the letter µ (read
as “mu”) for population and X̄(read as “X-bar”) for sample.

Computing Mean for raw data

Data in list form: given n observations X1 , X2 , ..., Xn , sample mean is com-


puted by 3.3.1
X1 + X2 + ... + Xn
Pn
Xi
X̄ = = i=1 (3.3.1)
n n
ILLUSTRATION 3.1. Calculating mean for raw data The heights (in
inch) of six female police officer candidates are shown below. 62, 64, 63, 61, 62, and 66.
Then compute the mean height.
Solution:Let Xi represent height of ith candidate.
Step 1 Count the number of values. n = 6 =total candidates.
Step 2 Find sum. To get sum add all values.

Xi = 62 + 64 + 63 + 61 + 62 + 66 = 378 inch
X

.
Step 3 Use formula ((3.3.1)) .

378
Pn
Xi
X̄ = i=1
= = 63 inch
n 6

Step 4 Interpret the value you compute: On average, the female police
candidates were 63 inch tall.

Computing mean from ungrouped and grouped frequency distribution

Data in the form:


V alue x1 x2 ... xk
F requency f1 f2 ... fk
Lecture notes (set by: Z)
Describing the center of the distribution/ or the data set The mean or arithmetic mean 33

sum of the product of i th class value (Xi )and its frequency (fi )
M ean =
total frequency (n)

Symbolically,

X1 f1 + X2 f2 + ... + Xk fk k P
i=1 Xi fi
X̄ = Pk = P k (3.3.2)
i=1 fi i=1 fi

Where
Xi =data value of the ith class
i = 1, 2...k
fi = frequency /repetition of Xi
k= number of classes
n = ki=1 fi = total frequency.
P

Note 3.1. In case of grouped data (G.f.d) replace Xi in [(3.3.2)] by Xmi .


Where,Xmi =mid point of ith class

ILLUSTRATION 3.2 (Calculating mean for discrete data (u.f.g)). . The


following numbers of books were read by each of the 28 students in a liter-
ature class. Calculate mean.

Number of books 0 1 2 3 4 T otal


Frequency 2 6 12 5 3 28

Solution:Let Xi =number of books that ith student read.


Step1 Count the number of values.n = fi = 28 students
Pk
i=1

Step2 Find sum. Xi fi = 0 ∗ 2 + 1 ∗ 6 + 2 ∗ 12 + 3 ∗ 5 + 4 ∗ 3 = 57


Pk
i=1
Pk
Xf
Step3 Use formula (3.2),X̄ = Pi=1 i i
k = 57/28 = 2.03 books
i=1
fi

Step4 Interpret the value you compute: ”Therefore,on average each stu-
dent read 2 books.”
- Let’s look at another example.

ILLUSTRATION 3.3. Calculating for continuous data (ufg) Thirty au-


tomobiles were tested for fuel efficiency (in miles per gallon: mpg). The
results were summarized in the frequency distribution table given below.
Find the mean.

The mean or arithmetic mean Lecture notes (set by: Z)


Describing the center of the distribution/ or the data set The mean or arithmetic mean 34

Table 3.1. Fuel efficiency of 30 Automobieles


Fuel efficiency(mpg) Number of automobiles(f)
7.5–12.5 3
12.5–17.5 5
17.5–22.5 15
22.5–27.5 5
27.5–32.5 2

Solution:The given data is tabulated in grouped form of frequency distri-


bution .so each class takes values in a range of intervals. K=5. Let Xmi
=represent mid value of ith class fuel efficiency.
Step 1 Construct the table as:

Table 3.2. Fuel efficiency of 30 Automobieles


Fuel efficiency fi Xmi Xi f i
(in mpg) (I) (II) (III)
7.5–12.5 3 10 10*3=30
12.5–17.5 5 15 15*5=75
17.5–22.5 15 20 20*15=300
22.5–27.5 5 25 25*5=125
27.5–32.5 2 30 30*2=60
Total 30 590

Pk
X f
Step 2 Use formula ((3.3.2)). X̄ = i=1 mi i
P k = 590/30 = 19.6 mpg. 
f
i=1 i

Step 3 Interpret the value you get: “The average fuel efficiency of the
tested automobiles were 19.6 miles per gallon .”

Properties of the mean (arithmetic mean)

Before proceeding to this section first do the following activity.

Activity 3.1. Properties of mean (4 min.) Consider the observations: 10, 12, 26, 14, and, 28.
1. Calculate the mean.
2. Add 2 to each observation and calculate the new mean.
3. Multiply each observation by 2 and compute the new mean.
4. What would you conclude from your discussions/works/?
Solution::

The following are properties of the mean.

Property 3.1. The sum of the deviations of the observations from their
arithmetic mean is zero.
n
[i.e., (Xi − X̄) = 0] (3.3.3)
X

i=1
The mean or arithmetic mean Lecture notes (set by: Z)
Describing the center of the distribution/ or the data set The mean or arithmetic mean 35

Proof. i=1 (Xi − X̄) =


n n
i=1 Xi − X̄ = ni=1 Xi − n ∗ X̄ but
P P P P

i=1 Xi = n ∗ X̄, substitute this you get, n ∗ X̄-n ∗ X̄=0


Pn

Property 3.2. The sum of the squares of the deviations of a set of obser-
vations from any number, say A, is the least only when A = X̄.
n n
[i.e., (Xi − X̄)2 < (Xi − Ā)2 ] (3.3.4)
X X

i=1 i=1

Proof.

Property 3.3. The combined mean of k different data sets or groups is


calculated as

X̄1 n1 + X̄2 n2 + ... + X̄k nk k P


i=1 X̄i ni
X̄12...k = = P (3.3.5)
n1 + n2 + ... + nk k
i=1 ni

Where
X̄1 =the mean of data set 1 having n1 observations
X̄2 =the mean of data set 2 having n2 observations
. .........................................
X̄k =the mean of data set k having nk observations

Property 3.4. If a wrong figure has been used in calculating the mean we
can correct if we know the correct figure that should have been used

nX¯w + Xc − Xw
X̄c = (3.3.6)
n
Where c =correct and w= wrong

Property 3.5. If the mean of n observations X1 , X2 , ..., Xn is X̄ ,then

The mean or arithmetic mean Lecture notes (set by: Z)


Describing the center of the distribution/ or the data set The mean or arithmetic mean 36

• If we add constant number k to all the observations, the new mean


becomes old mean plus that constant number k.X̄new = X̄ + k
• If we subtract a constant number k from all observations the new mean
becomes the old mean minus that constant number. X̄new = X̄ − k
• If we multiply all observations by k the new mean becomes X̄new = k X̄
• If we divide all observations to k the new mean becomes X̄new = X̄k

Merits and demerits of arithmetic mean

Merits of arithmetic mean


• It has definite value
• It is calculated based on all observations.
• Simple to calculate and easy to understand /comprehend.
• Used for further manipulation e.g. to calculate variance.
• Used for comparing two data sets.
• Used for data measured at interval or ratio scale.

Demerits of mean
• Highly affected by extreme values/outliers.
• It can not be calculated for frequency distribution having open ended
classes.
• It sometimes gives absurd results.

Types of mean
Pn
Xi
1. Arithmetic Mean or simply Mean :The formula is A.M = X̄ = i=1
n
This is the mean you discussed earlier . See (3.3.1) ,(3.3.2)
2. The Geometric Mean (G.M):It is defined as the nth root of the product
of n values /observations.The formula is
v
u n
G.M = (3.3.7)
uY
n
t X i
i=1

Note 3.2. The geometric mean is useful in finding the average of percent-
ages, ratios, indexes, or growth rates.
3. The Harmonic Mean (H.M):It is defined as the number of values divided
by the sum of the reciprocals of each value.The formula is

n
H.M = Pn 1 (3.3.8)
i=1 Xi

Note 3.3. The harmonic mean is used in finding the average speed.

The mean or arithmetic mean Lecture notes (set by: Z)


Describing the center of the distribution/ or the data set The median 37

Relation between the three types of means

Before we talk about this sub–session, first do the following class activity

Activity 3.2. Relation between means (5 min.)


1. For data values given below: X1 = 1, X2 = 3, andX3 = 9. (i)Find the
G.M, A.M, and H.M. (ii) Compare the results.
2. The cost of food increases in a specific geographic region for the past
three years were 1 %, 3% and 5%. Find the average?Multiply each
observation by 2 and compute the new mean.
3. A sales person derives 300 miles round trip at 30 miles per hour going
to Chicago and 45 miles per hour returning to home. Find the average
miles per hour? Solution::
Lemma 3.3.1. If X1 and X2 are two observed values, then the G.M of
their A.M and H.M is equal to the geometric mean of the numbers X1
and X2 . i.e., Gm2 = Am ∗ Hm.
Proof. Substituting the values in formulas (3.3.1),(3.3.7),(3.3.8), we
have
x1 + x2
A.m =
2
,

G.m = x1 ∗ x2

and
2 2 ∗ x1 ∗ x2
H.m = =
1
x1
+ 1
x2
x1 + x2
⇒, G.m2 = x1 ∗ x2

x1 + x2 2 ∗ x1 ∗ x2
A.m ∗ H.m = ( )∗( ) = x1 ∗ x2
2 x1 + x2

∴ Gm2 = Am ∗ Hm
Lemma 3.3.2. If A, G and H stands for Arithmetic mean, Geometric
mean and Harmonic mean respectively, then the relation A ≥ G ≥ H
, holds true.
Proof. Left as an exercise. [Hint: Consider two observations x1 , x2
√ √
and ( x1 − x2 )2 ≥ 0 ∀x1 , x2 ≥ 0]

3.3.2 The median


-This is the second measure of central tendency

Definition 3.4. The median is the middle observation of the values


after they have been ordered from the smallest to largest or from the
largest to the smallest. Or the median is the middle point of the data

The median Lecture notes (set by: Z)


Describing the center of the distribution/ or the data set The median 38

array. Or the median is the half way point in a data set. The symbol
for sample median is X̃(read as X-tilde).

Computing Median from un-grouped and grouped data

For un-grouped data,



( n+1 )th value,

if n is odd
X̃ = 2
n th n+2 th (3.3.9)
 ( 2 ) value+(
 2
) value
, if n is even
2

ILLUSTRATION 3.4. Calculating median for un-grouped data


Consider the following observations: 1, 5, 9, 3, and 11. Now find the
median.
Solution:Follow the steps shown below.
1st Arrange the data in increasing order of magnitude as: 1, 3, 5, 9, and
11.
2nd Decide whether n is even or odd. n=5(odd).
3rd Select the correct formula in (3.3.9) and calculate.

n + 1 th 5 + 1 th
X̃ = ( ) value = ( ) valu = 3rd value = 5
2 2

4th Interpret: 50% of values found below 5.


For grouped data,
w n
X̃ = Lm + ( − Fpm ) (3.3.10)
fm 2

Where
• Lm =lower boundary of the median class
• Fpm =the less than cumulative frequency immediately Preceding the
median class
• fm =frequency of the median class
• w= the class width
• n2 = is the key to find the median class and should be calculated first

Note 3.4. The median class: is the class with the smallest less than
cumulative frequency ≥ n2

ILLUSTRATION 3.5. Calculating median from grouped data: Table


3.3 below gives the distribution of the weekly wages of employs of a small
firm.

The median Lecture notes (set by: Z)


Describing the center of the distribution/ or the data set The mode 39

Table 3.3. Distribution of wages of employees in a small firm


Wages( in birr) No of employees (fi)
126&below 3
127–138 5
136–144 9
145–153 12
154–162 5
163–171 4
172&above 2

a. Find the median of weekly wage.


b. Why is the median a more suitable measure of central tendency in this
case?

Solution::

Merits and demerits of median

Merits of median
• Used when one is interested to find the center or middle value of a
data set.
• It is unique.
• It is affected less than the mean by extremely low or high values
because it is a positional average.
• It can be computed for a frequency distribution with an open ended
class.
• Can be determined for all levels of data except nominal

Demerits of median
• It is not capable of further algebraic treatment/ statistical analysis.
• It is not a good representative of a data when the N o of observations
(data) is small.
• In case when the N o of items is very large, sorting is cumbersome and
time consuming.

3.3.3 The mode


–This is the third measure of central tendency.

Definition 3.5. The mode is the value of the observation that appears
most frequently. The symbol for sample mode is X̂ (read as X-hat).

The mode Lecture notes (set by: Z)


Describing the center of the distribution/ or the data set The mode 40

Computing Mode from grouped data

For grouped data


∆1
X̂ = Lmo + w (3.3.11)
∆1 + ∆2
Where
• Lmo =lower boundary of the modal class
• Fpm =the less than cumulative frequency of the class immediately Pre-
ceding the modal class
• fmo =frequency of the modal class
• w= the class width
• ∆1 = fmo − fsmo , and, ∆2 = fmo − fpmo

Note 3.5. The modal class: is the class having largest frequency.

Merits and demerits of mode

Merits of mode
• It is the easiest average to compute.
• It is not affected by extreme values.
• It can be calculated in case of the open ended intervals.
• It is the only measures of center that can be used in finding the most
typical case when the data are nominal or categorical

Demerits of mode
• It may not exist; if it exists it may not be unique.
• It may be unrepresentative in many cases.

Note 3.6. In moderately asymmetrical distribution the following relation


holds true.
mean − mode = 3(mean − median) (3.3.12)

ILLUSTRATION 3.6. Calculating X̂ for continuous data (ufg) Find the


mode for the following frequency distribution of the birth weights (in kg)
of 30 children given below.
Weight(in kgs) 1.9–2.3 2.3–2.7 2.7–3.1 3.1–3.5 3.5–3.9 3.9–4.3
N o of children 5 5 9 4 4 3
Solution:Follow the steps shown below.
Step1 Find the modal class: The modal class is the 3rd class because the
frequency is higher than other classes . The interval is 2.7 − −3.1
. This is the class boundary of the 3rd class. So Lmo = 2.7 , ∆1 =
fmo −fsmo = 9−5 = 4 , 5 = 9−4 = fmo −fsmo = ∆2 ,w = Ucb1 −Lcb1 =
2.3 − 1.9 = 0.4
The mode Lecture notes (set by: Z)
Review questions 41

Step2 Substitute the values in to formula (3.18), we can get the answer as
follows: X̂ = Lmo + w ∆1∆+∆
1
2
=2.7 + ( 4+5
4
) ∗ 0.4 = 2.87 kg. 
Step3 Interpretation: most of the children at birth weigh about 2.87 kg.

Example 3.1. Find the mode of the weekly wages data in Table 3.3

3.4 Quantile
-These are other measures used to describe position of a data.

Definition 3.6. Quantile are values that divide a given data set in to
some equal parts. They are also called measures of position (MoP).

Example 3.2. quartiles, deciles, percentiles


(i) Quartiles: are numerical values that divide a given data set in to 4 equal
parts.
• Notations :Q1 , Q2 , Q3 .
• Meaning:Qi =is the value below which 25i% and above which 100−25i%
of values found.
(ii) Deciles: are numerical values that divide a given data set in to 10 equal
parts.
• Notations: D1 , D2 , ..., D5 , ..., D9 .
• Meaning:Di =is the value below which 10i% and above which 100−10i%
of values found.
iii) Percentiles: are numerical values that divide a given data set in to 100
equal parts.
• Notations: P1 , P2 , ..., P50 , ..., P99 .
• Meaning:Pi =is the value below which i% and above which 100 − i% of
values found.

Example 3.2. Find Q1 , X̃, D3 , P80 for the data given in Illustrations ( 3.3.1
, 3.3.1,3.3.1,3.4,3.5,3.3.3)

3.5 Review questions

Lecture notes (set by: Z)

You might also like