You are on page 1of 5

Stat 2161 SB

3.2 The Summation Notation: Sigma operator


3 Measures of Central Tendency
3.1 Objectives of Measuring Central Tendency What does  (sigma) stand for?
Earlier, we have seen how raw data can be organized and how they can be presented. In this
lesson we learn about data summarization. We will see how we can make a summary of the • In mathematics the symbol  (sigma) means to add or
bulk of data by using statistical methods. One of these methods is the measure of central
tendency or measures of “averages”.
finding the sum.
N

Measures of central tendency are indicative of the “center” or X


i1
i is a mathematical shorthand for X1+X2+X3 + … + XN
“middle” or the “most typical” of a set of data around, which the
largest number of values tend to cluster. 5
The objectives of measuring central location are, therefore: X Means that the sum of the variable xi, where i varies
i

• To understand the data easily


i 1from 1 to 5.
• To facilitate comparison, since it is a single number that represent 𝑌 = 𝑥 − ∑ 𝑥 The value of y equals the result of adding

a set of data. together each value of x multiplied by
• To make further statistical analysis itself and then subtracting the total
produced by adding up all the values of x.
1 2

1 2

Sigma operator …
Sigma operator … Example: considering the following data and find the required values.
X 5 7 7 6 8
Properties of summation 5
Y 6 7 8
5
7 8
n

 k  nk where k is any constant. a) X i1


i =5+7+7+6+8=33 b)  Y =6+7+8+7+8=36
i
i1 i1
n n

 kX  k Xi
2
c) 5 d)  5 
 Xi 
 10
i
where k is any constant. =510=50 d≠e
 = 33 = 1089
i 1 i 1 2
 i 1
n n i1
 ( a  bX i) na  b  X where i 5
i1 i1 a and b are any constants.
e) X i 1
i
2

= 52 + 7 2+ 72 + 62 + 82= 223
f)
 5


 i 1

Xi 


 5


 i1

Yi 

 = 33x36=1188
n n n

 (X i  yi )  X y i i 5

XY
i1 i 1 i1

g) i i =(5x6 )+(7x7)+(7x8) +(6x7)+8x8)=241 f≠g


The sum of the product of the two variables could be written as: i 1
5
n

X Y i i  ( X 1 Y1 )  ( X 2 Y2 )  ( X 2 Y2 )  ...  ( X n Yn ) h) 
i1
(X i  Yi ) =(5+6)+(7+7)+(7+8)+(6+7)+(8+8)=33+36=69
i1
5
3
i)  (X
i1
i  Yi ) =(5-6)+(7-7)+(7-8)+(6-7)+(8-8)=33-36=-3 4

3 4

3.3 Types of measures of central tendency Measure of central tendency …


There are several different measures of central tendency; some of the
common ones are: 3.3.1 Mean
• The Mean a) The arithmetic mean
• The Median The arithmetic mean also known as the arithmetic average is the most
• The Mode widely used and most understood of all averages. It can be computed
for ungrouped and grouped data.
• In statistics, Greek letters are used to denote parameters and Roman i) For ungrouped data
letters are used to denote statistics.
The mean is the sum of the values divided by the total number of
n
values.
• We represent population units by N and the sample units by n.
X1  X2  X3   Xn i 1
X 
• The symbol X represents the sample mean. X 
Example n n
• The Greek letter  (mu) is used for the population mean. The mean of 3,2,6,5 and 4
is found by adding 3+2+6+5+4=20 and dividing by 5. Hence the
5 mean of the data is 20/5=4. 6

5 6

1
Stat 2161 SB

Measure of central tendency … Measure of central tendency …

ii) For ungrouped frequency distribution Example


If X1 occurs f1 times Consider the data on the typing speed of 25 secretaries.
If X2 occurs f2 times Minutes Frequency
. (Xi) (fi) fi  Xi

. 4 2 8
. 5 3 15
X 
fX i i

187
 7 .5
If Xn occurs fn times
6
7
1
5
6
35 f i 25
8 7 56
Then the mean will be: 9 4 36 Conclusion
f X
X
f
i

i
i
where  f =n
i
10
11
2
1
20
11
On the average, it takes 7
and half minutes to type the
 f i  25  f i X i  187 test paragraphs.

7 8

7 8

Measure of central tendency … Measure of central tendency …

iii) For grouped frequency distribution b) The weighted mean


Sometimes one must find the mean of a data set in which not all
The procedure for finding the mean for grouped data is similar to that for values are equally represented.
ungrouped data, except that the midpoints of the classes are used for the
X values. The type of mean that considers this additional factor is called the
Consider the data on temperature for 50 places in a country.
Class marks X 
 f i X i  5710  114 .2 weighted mean, and it is used when the values are not all equally
(mid points) Frequency (fi)
(Xi)
(fi Xi)
 fi 50 represented.
102 2 204 The weighted mean can be used to compute a grade point average.
107 8 856 Conclusion Since courses vary in their credit value, the number of credits must be
112 18 2016 The average recorded high
117 13 1521 used as weights.
temperature is 114.2 F.
122
127
7
1
854
127 The arithmetic mean, in most X 
W X
132 1 132 cases, is not an actual value. W
f  50 fX i i  5710 The mean should be rounded to
one more decimal place than where w1, w2, …, wn are weights and X1, X2, …, Xn are the
occurs in the raw data. 9
values (grades). 10

9 10

Measure of central tendency … Measure of central tendency …

Example: Abebe scored the following grades. Exercise: 3.1.


A construction company has 80 employees. Sixty of them earn 240 Birr
Course Cred Grade
per day, the rest 20 earn160 per day. Determine the average daily
it (w) (X)
Statistics 3 A earnings.
Mathematics 3 C
Operating Systems 4 B
Introduction to AI 2 D

X
W X  3 4  3 2  4  3  2 1  32  2.7
W 3 3 4  2 12
2.7 is the Grade Point Average {GPA} for Abebe
11 12

11 12

2
Stat 2161 SB

Measure of central tendency … Measure of central tendency …

C) The geometric mean: Examples


When the observed values are measured as ratios, proportions
or percentages, the geometric mean gives a better measure of 1. The G.M of 4, 8 and 6 is
central tendency than any other means. 4 × 8 × 6 = 192 = 5.77

The geometrical mean of a set of values 𝑥1,𝑥2,…,𝑥𝑛 of n positive


values is defined as the nth root of their product . 2. Suppose that the profits earned by a certain
That is, construction company in four projects were 3%, 2%,
G.M. = 𝑥 × 𝑥 × ⋯× 𝑥 4% & 6%, respectively. What is the mean profit?
G.M = 3 × 2 × 4 × 6 = 144 = 3.46
Note that GM cannot be used if any of the values are
zero or negative.
The 𝑚𝑒𝑎𝑛 𝑝𝑟𝑜𝑓𝑖𝑡 𝑖𝑠 3.46 𝑝𝑒𝑟𝑐𝑒𝑛𝑡
13 14

13 14

Measure of central tendency … Measure of central tendency …

d)The Harmonic mean (HM) 3.3.2 The median


HM is the number of values divided by the sum of the reciprocals of
each value. 𝐻𝑀 = HM is useful when the data values are It is the halfway point in a data set. Before one can find this point, the
∑ data must be arranged in order. When the data set is ordered, it is
defined in relation to some unit. Common examples are: Speed, rates called a data array.
and price.
The median is the midpoint of the data array. The symbol for the median is MD
~ .
Example or X
A person drove 100 kms at 40 km per hour and returned driving 50 km per hour.
What is the average speed? Steps in computing the Median for ungrouped data
• S=d/t  Time=Distance/Speed • Step 1: Arrange the data in order
• 𝑇𝑖𝑚𝑒 1 = ⁄ = 2.5 ℎ𝑜𝑢𝑟𝑠 𝑡𝑜 𝑚𝑎𝑘𝑒 𝑡ℎ𝑒 𝑡𝑟𝑖𝑝. • Step 2: Select the middle point.
• 𝑇𝑖𝑚𝑒 2 = ⁄ = 2 ℎ𝑜𝑢𝑟𝑠 𝑡𝑜 𝑟𝑒𝑡𝑢𝑟𝑛.
• Hence the total time is 4.5 hours and the total kilometers driven are 200.
• We can say, When the data set has an odd number of values, the median is an actual value
on the average his speed was: ⁄ = ⁄ . =44.44 kms per hour. that falls in the middle.
When there is an even number of values in the data set, the median will fall
• Using the harmonic mean: 𝐻𝑀 = = = 44.44 kms per hour
. between two given values and it is determined by averaging these two values.
15 16

15 16

Measure of central tendency … Measure of central tendency …

• Examples: The median for an ungrouped frequency distribution


• Find the median for the ages of seven preschool To locate the middle value, we examine the cumulative frequency.
children. The ages are: 1,3,4,2,3,5 and 1. Consider the following sample.
Class Frequency CF
1 3 3
• Soln. 1,1,2,3,3,4,5 The value at the middle is 3. Hence 2 8 11
3 5 16
the median age is 3. 4 4 20
5 2 22
6 1 23
Exercise: 3.2. 7 1 24
n=?
• The ages of 10 college students are given below. Find To locate the middle point 24/2 = 12. Then locate the point where
the median 12 values would fall below, and 12 values would fall above. To do
this, look at the cumulative frequency. The 12th and 13th values fall
• 18,24,20,35,19,23,26,23,19,20 in class 3. Hence, the median is 3.

17 18

17 18

3
Stat 2161 SB

Measure of central tendency … Measure of central tendency …

The median for grouped frequency distribution can be Consider the record high temperature data.
calculated by using Step 1: Cumulative
Class boundaries Frequency (f) frequency

 N  cf  99.5–104.5 2 2
MD  Lm   2 w
 104.5–109.5 8 10
 fm  CF
Lm 109.5–114.5 18 fm 28
Median class
114.5–119.5 13 41
where
119.5–124.5 7 48
• Lm=Lower boundary of the median class. 124.5–129.5 1 49
• N = the total number of observations N=? 129.5–134.5 1 50
• cf= cumulative frequency of the class immediately preceding the Step 2: Find the halfway point 50/2=25 N/2
median class
• fm= frequency of the median class Step 3: Locate the median class by using the cumulative frequency
distribution. This class contains the 25th value (the median).
• w=class width
.

Step 4 Identify the values to be inserted in the formula  N2  cf 


Lm   w

19  fm  20
The width is the difference between any of the class boundaries

19 20

Measure of central tendency … Measure of central tendency …

Step 5 3.3.3 The mode


Solve for the median  N  cf  • The mode is the value that occurs most often in the data set. It is
MD  L m   2 w

also called the most typical case.

 fm  • The symbol for the mode is X
• A data set can have more than one mode or no mode at all.
• A data set that has only one value that occurs with the greatest
frequency is said to be unimodal.
 50
 10 
MD  109 . 5   2 5 • If a data set has two values that occur with the same greatest
 18  frequency, both values are considered to be the mode and the
data set is said to be bimodal.
=109.5+(0.83)x5=109.5+4.2 • If a data set has more than two values that occur the same
greatest frequency, each value is used as the mode, and the data
=113.7 set is said to be multimodal.
• When no data value occurs more than once, the data set is said to
have no mode.
21 22

21 22

Measure of central tendency … Measure of central tendency …

3) Eleven cleaners were tested how long they take to clean a lecture hall. The
• Examples
data, in minutes, are shown below. Find the mode.
1) Find the modes for the following data sets.
18.0, 14.0, 34.5, 10.0,11.3,10.0,12.4,10.0 15, 18, 18, 18, 20, 22, 24, 24, 24, 26, 26
Soln
Soln Since 18 and 24 both occur three times, the modes are 18 and 24. This data
It is helpful to arrange the data in order although it is not necessary. set is said to be bimodal.
10.0, 10.0, 10.0, 11.3, 12.4, 14.0, 18.0, 34.5
Since 10.0 occurs three times—a frequency larger than any other number—the 4) Find the modal class for the frequency distribution of our temperature data.
mode is 10.0.
Class boundaries Frequency (f)
2) Six children were tested to see how long they could remain silent. The time, 99.5–104.5 2
in minutes, is recorded below. Find the mode. 104.5–109.5 8 Soln
109.5–114.5 18
2, 3, 5, 7, 8, 10 114.5–119.5 13 The modal class is 109.5–114.5,
Soln 119.5–124.5 7 since it has the largest frequency.
124.5–129.5 1
Since each value occurs only once, there is no mode. 129.5–134.5 1
Note: Do not say that the mode is zero. That would be incorrect, because in The mode is the only measure of central tendency
some data zero can be an actual value. 23
when the data are nominal or categorical. 24

23 24

4
Stat 2161 SB

3.4 Properties and uses of measures of central tendency Properties and uses of …
Researchers and statisticians must know which measure of CT is being used and when to use each The Median
measure of CT. These are summarized below. • The median is used to find the center or middle value of a data set.
The Mean • The median is used when it is necessary to find out whether the data
values fall into the upper half or lower half of the distribution.
• The mean is found by using all the values of the data. • The median is used for an open-ended distribution.
• The mean varies less than the median or mode when samples are • The median is affected less than the mean by extremely high or
taken from the same population and all three measures are extremely low values.
computed for these samples.
• The mean is used in computing other statistics, such as the The Mode
variance.
• The mode is used when the most typical case is desired.
• The mean for the data set is unique and not necessarily one of the
data values. • The mode is the easiest average to compute.
• The mean cannot be computed for the data in a frequency • The mode can be used when the data are nominal, such as religious
distribution that has an open-ended class. preference, gender or political affiliation.
• The mode is not always unique. A data set can have more than one
• The mean is affected by extremely high or low values, called mode, or the mode may not exit for a data set.
outliers, and may not be the appropriate average to use in these
situations.
25 26

25 26

3.5 Distribution shapes Distribution . . .


In describing data, it is important to be able to recognize the
shapes of the distribution values. In later lessons, you will see
that the shape of a distribution also determines the
appropriate statistical methods used to analyze data.

A distribution can have many shapes, and one method of


analyzing a distribution is to draw a histogram or frequency
polygon for the distribution. Mode Median Mean Mean Median Mode

Frequency distributions can assume many shapes that can be


identified by an over all pattern. Some of them are the bell-
shaped, the uniform-shaped, the J-shaped, the reverse j-
shaped, u-shaped, etc. The three most important shapes are
positively skewed, symmetric and negatively skewed.
Mean
Median
27 28
Mode

27 28

In a positively skewed or right-skewed distribution, • In symmetric distribution, the data values are evenly
the majority of the data values fall to the left of the distributed on both sides of the mean. In addition, when
the distribution is unimodal, the mean, median and mode
mean and cluster at the lower end of the distribution; are the same and are all at the center of the distribution.
the “tail” is to the right. Also, the mean is to the right Examples, IQ scores and heights of mothers.
of the median, and the mode is to the left of the
median. • When the majority of the data values fall to the right of the
mean and cluster at the upper end of the distribution, with
the tail to the left, the distribution is said to be negatively
Example: skewed or left-skewed. Also the mean is to the left of the
In an examination, if most of the students did poorly, median and the mode is to the right of the median. Eg., if
the majority of students score very high on an examination.
their scores would tend to cluster on the left side of the These scores will tend to cluster to the right of the
distribution. A few high scores would constitute the tail of distribution.
the distribution, which would be on the right side.
• When a distribution is extremely skewed, the median
rather than the mean is a more appropriate measure of CT.
29 30

29 30

You might also like