Professional Documents
Culture Documents
In this unit (unit three) we will learn about data analysis: description
and summarization – the 4th stage in conducting any statistical study/
investigation.
Unit objectives
1. Define what we mean by measures of central tendency.
2. Summarize data using measures of central tendency such as the mean,
median and mode.
3 Identify the position of a data value in a data set using measures of
position such as percentiles, deciles and quartiles.
Learning outcomes
After completing this section, successful students will be able to:
• Define what is meant by measure of central tendency.
• Tell the objectives of knowing measures of center.
• Understand properties of a good measure
Key words
Deciles ,Mean , Median, Mode, Percentiles, Quartiles
Review of algebra 30
5
X1 + X 2 + X3 + X4 + X 5 = (3.2.1)
X
Xi
i=1
read as ”the sum of...” and is called the summation sign. The letter ”i” is
called the summation index. The term following is called the summand.
P
B.Product notation ( )
Q
substituting the integers for the index are multiplied instead of added.
Product of terms such as X1 ∗ X2 ∗ X3 ∗ X4 ∗ X5 , is often denoted by short
hand notation, 5i=1 Xi .
Q
5
X1 ∗ X 2 ∗ X3 ∗ X4 ∗ X 5 = (3.2.2)
Y
Xi
i=1
n
Xi = X1 + X2 + ... + Xn (3.2.3)
X
i=1
n
Xi2 = X12 + X22 + ... + Xn2 (3.2.4)
X
i=1
n
c = |c + c + ... + c} = nc (3.2.5)
X
{z
i=1 ntimes
n n
c ∗ Xi = cX1 + ... + cXn = c ∗ (3.2.6)
X X
Xi
i=1 i=1
n n n
(Xi + Yi ) = Xi + (3.2.7)
X X X
Yi ,
i=1 i=1 i=1
n
Xi Yi = X1 Y1 + X2 Y2 + ... + Xn Yn (3.2.8)
X
i=1
Example 3.1. ll
mm
From these, the mean is best known and frequently used measure. Their
formula is defined in raw data and grouped data as well. Now it is time to
deal the properties and formulas of each measure one by one.
Xi = 62 + 64 + 63 + 61 + 62 + 66 = 378 inch
X
.
Step 3 Use formula ((3.3.1)) .
378
Pn
Xi
X̄ = i=1
= = 63 inch
n 6
Step 4 Interpret the value you compute: On average, the female police
candidates were 63 inch tall.
sum of the product of i th class value (Xi )and its frequency (fi )
M ean =
total frequency (n)
Symbolically,
X1 f1 + X2 f2 + ... + Xk fk k P
i=1 Xi fi
X̄ = Pk = P k (3.3.2)
i=1 fi i=1 fi
Where
Xi =data value of the ith class
i = 1, 2...k
fi = frequency /repetition of Xi
k= number of classes
n = ki=1 fi = total frequency.
P
Step4 Interpret the value you compute: ”Therefore,on average each stu-
dent read 2 books.”
- Let’s look at another example.
Pk
X f
Step 2 Use formula ((3.3.2)). X̄ = i=1 mi i
P k = 590/30 = 19.6 mpg.
f
i=1 i
Step 3 Interpret the value you get: “The average fuel efficiency of the
tested automobiles were 19.6 miles per gallon .”
Activity 3.1. Properties of mean (4 min.) Consider the observations: 10, 12, 26, 14, and, 28.
1. Calculate the mean.
2. Add 2 to each observation and calculate the new mean.
3. Multiply each observation by 2 and compute the new mean.
4. What would you conclude from your discussions/works/?
Solution::
Property 3.1. The sum of the deviations of the observations from their
arithmetic mean is zero.
n
[i.e., (Xi − X̄) = 0] (3.3.3)
X
i=1
The mean or arithmetic mean Lecture notes (set by: Z)
Describing the center of the distribution/ or the data set The mean or arithmetic mean 35
Property 3.2. The sum of the squares of the deviations of a set of obser-
vations from any number, say A, is the least only when A = X̄.
n n
[i.e., (Xi − X̄)2 < (Xi − Ā)2 ] (3.3.4)
X X
i=1 i=1
Proof.
Where
X̄1 =the mean of data set 1 having n1 observations
X̄2 =the mean of data set 2 having n2 observations
. .........................................
X̄k =the mean of data set k having nk observations
Property 3.4. If a wrong figure has been used in calculating the mean we
can correct if we know the correct figure that should have been used
nX¯w + Xc − Xw
X̄c = (3.3.6)
n
Where c =correct and w= wrong
Demerits of mean
• Highly affected by extreme values/outliers.
• It can not be calculated for frequency distribution having open ended
classes.
• It sometimes gives absurd results.
Types of mean
Pn
Xi
1. Arithmetic Mean or simply Mean :The formula is A.M = X̄ = i=1
n
This is the mean you discussed earlier . See (3.3.1) ,(3.3.2)
2. The Geometric Mean (G.M):It is defined as the nth root of the product
of n values /observations.The formula is
v
u n
G.M = (3.3.7)
uY
n
t X i
i=1
Note 3.2. The geometric mean is useful in finding the average of percent-
ages, ratios, indexes, or growth rates.
3. The Harmonic Mean (H.M):It is defined as the number of values divided
by the sum of the reciprocals of each value.The formula is
n
H.M = Pn 1 (3.3.8)
i=1 Xi
Note 3.3. The harmonic mean is used in finding the average speed.
Before we talk about this sub–session, first do the following class activity
and
2 2 ∗ x1 ∗ x2
H.m = =
1
x1
+ 1
x2
x1 + x2
⇒, G.m2 = x1 ∗ x2
x1 + x2 2 ∗ x1 ∗ x2
A.m ∗ H.m = ( )∗( ) = x1 ∗ x2
2 x1 + x2
∴ Gm2 = Am ∗ Hm
Lemma 3.3.2. If A, G and H stands for Arithmetic mean, Geometric
mean and Harmonic mean respectively, then the relation A ≥ G ≥ H
, holds true.
Proof. Left as an exercise. [Hint: Consider two observations x1 , x2
√ √
and ( x1 − x2 )2 ≥ 0 ∀x1 , x2 ≥ 0]
array. Or the median is the half way point in a data set. The symbol
for sample median is X̃(read as X-tilde).
n + 1 th 5 + 1 th
X̃ = ( ) value = ( ) valu = 3rd value = 5
2 2
Where
• Lm =lower boundary of the median class
• Fpm =the less than cumulative frequency immediately Preceding the
median class
• fm =frequency of the median class
• w= the class width
• n2 = is the key to find the median class and should be calculated first
Note 3.4. The median class: is the class with the smallest less than
cumulative frequency ≥ n2
Solution::
Merits of median
• Used when one is interested to find the center or middle value of a
data set.
• It is unique.
• It is affected less than the mean by extremely low or high values
because it is a positional average.
• It can be computed for a frequency distribution with an open ended
class.
• Can be determined for all levels of data except nominal
Demerits of median
• It is not capable of further algebraic treatment/ statistical analysis.
• It is not a good representative of a data when the N o of observations
(data) is small.
• In case when the N o of items is very large, sorting is cumbersome and
time consuming.
Definition 3.5. The mode is the value of the observation that appears
most frequently. The symbol for sample mode is X̂ (read as X-hat).
Note 3.5. The modal class: is the class having largest frequency.
Merits of mode
• It is the easiest average to compute.
• It is not affected by extreme values.
• It can be calculated in case of the open ended intervals.
• It is the only measures of center that can be used in finding the most
typical case when the data are nominal or categorical
Demerits of mode
• It may not exist; if it exists it may not be unique.
• It may be unrepresentative in many cases.
Step2 Substitute the values in to formula (3.18), we can get the answer as
follows: X̂ = Lmo + w ∆1∆+∆
1
2
=2.7 + ( 4+5
4
) ∗ 0.4 = 2.87 kg.
Step3 Interpretation: most of the children at birth weigh about 2.87 kg.
Example 3.1. Find the mode of the weekly wages data in Table 3.3
3.4 Quantile
-These are other measures used to describe position of a data.
Definition 3.6. Quantile are values that divide a given data set in to
some equal parts. They are also called measures of position (MoP).
Example 3.2. Find Q1 , X̃, D3 , P80 for the data given in Illustrations ( 3.3.1
, 3.3.1,3.3.1,3.4,3.5,3.3.3)