You are on page 1of 37

Measures of Central Tendency

(Mean, Median and Mode)

and
Measures of Dispersion or Variability
(Range, Mean Deviation, Variance, and Standard Deviation)
Measures of central tendency represent some of the most
simple forms of data analysis.
Students are often presented with information about national
averages, class averages, and grade point averages.
Statisticians refer to this statistic as the mean.

Two additional measures of central tendency are commonly


employed by those working statistics.
These measures are the mode and the median.
The Mean
Most widely used measure of central tendency.
The mean of a distribution of values is obtained by adding all of the values and dividing the sum
by the number of values.

Mean for Ungrouped Data


When obtaining the mean for a simple distribution of sample
data, the formula for the mean is:
Example:
Suppose a researcher obtained a simple distribution or set of scores on a civil service exam as
follows: X = 70, 80, 50, 30, 90, 80, 75, 95, 100, 105.

The mean would be determined by following two simple steps.


1. Sum all the values of X
2. Then divide by the number of values in the distribution

The mean civil service exam score is 77.5. The mean score is the typical
performance level of all the candidates taking the exam.
There are cases when the observations in a data set assume respective weights. In this case
where the weights are positive integers, we can call these weights as frequencies. The
following gives a formula for the weighted mean of a weighted data set.
Example
Calculate the General Weighted Average
(GWA) of Julius Garde for the first semester of
school year 2019-2020 as shown in the
following table.

Solution. To solve for the GWA, we first


consider the entries on the second column of
the table as the
points xi and the entries in the third column as the
corresponding weights wi . By constructing a
fourth column consisting of the products wi xi
and finding the column totals, we get the table
below.
Mean for Grouped Data
-In actual research situations, the data may not always be ungrouped or organized in a
simple frequency distribution.

-Suppose a researcher had data grouped into class intervals with a width of ten. In such
cases, the mean is calculated in a slightly different manner.

The process begins with a determination of a midpoint for each class interval.
The midpoint is calculated by adding the highest value included in the interval to the
lowest value included in the interval and dividing the sum by two.

The value of the midpoint is then used to calculate the mean

Midpoint/X/m=
The process begins with a determination of a midpoint for each class interval.
The midpoint is calculated by adding the highest value included in the interval
to the lowest value included in the interval and dividing the sum by two.

The value of the midpoint is


then used to calculate the mean Midpoint/X/m =

1. Long Method Formula

Classes Midpoint/X
The calculation of the arithmetic mean may be shortened by using the coded-deviation
method. The steps are:
1. Take the midpoint of one of the intervals, an arbitrary reference point, as
assumed mean. As far as the result is concerned, it makes no difference which
interval midpoint is used.
2. After the frequency and midpoint columns, f and x, make another column to
represent unit deviations from the arbitrary reference point. Label it as d’. Since
there is no deviation at the reference point, place zero at that point and
complete the column; positive deviations above and negative deviations below
it.
3. Make another column and label it as fd’. Multiply each f by its d’ and enter
the product in this column.
4. Get the sum of fd’ to obtain ∑fd’.

• To find the mean of a grouped data using the coded-deviation method, use
the formula:
2.Coded-Deviation Method Formula Where,
x’ – assumed mean of the given set
d’ – unit deviation from the reference point
i – size of the class interval
n – number of items grouped
Example:
Calculate the average daily income for 40 days on cross-stitch thread of The Arts and Crafts Shop.

Amount in f x
d’ fx fd’
Pesos (days) (midpoint)
172-180 3 176 3 528 9
163-171 5 167 2 835 10
154-162 9 158 1 1422 9
145-153 12 149 0 1788 0
136-144 5 140 -1 700 -5
127-135 4 131 -2 524 -8
118-126 2 122 -3 244 -6
n=40 ∑fx=6041 ∑fd’=9
Using the two methods

Long Method Coded-Deviation Method


The Median
The median represents the middle point of a distribution of data.
It is the point at which exactly half of the observed values in the distribution are
higher and half of the observed values are lower.

Median for Ungrouped Data

In computing for the median, it is important to remember the following:


1. Arrange the data in the array of ascending or descending order.
2. Take note of the items in the middle position. If there is an odd number of
items, the middle item is the median. If there is an even number of items,
the median is taken as the arithmetic mean of the two values falling in the
middle.
Examples: 1. The number of books loaned from the library during each day of the week were 36, 31, 24, 45, and 50.
What is the median?
Solution: Arrange the numbers as 24, 31, 36, 45, and 50. Since there are 5 items, the middle item is 36.
Thus, the median is 36.
2. The number of books loaned during another week from Monday to Saturday were 36, 31, 24, 25, 50,
and 47. What is the median?
Solution: Arrange the numbers as 24, 25, 31, 36, 47, and 50. In this case, there are two middle numbers:
31 and 36. The median is the average of the middle numbers, that is,

The position of the median is { (n+1)÷2 }th value,


where n is the number of values in a set of data.
Example 4:
Example 3:
n=5 n=6

Md=(n+1)÷2 Md=(n+1)÷2
=(5+1)÷2 =(6+1)÷2
=6÷2
=7÷2
=3
With n=5, the median is the 3rd value. =3.5
With n=6, the median is the 3.5th value.
Median for Grouped Data

To find the median of a grouped data, use the formula:

Where:
L – exact lower limit of the median class
n – total number of items
F – “less than” or “equal to” cumulative frequency preceding the class
interval containing the median
f – frequency of the median class
i – size of the class interval
Example: compute the median for grouped data

Solution: n=100, L=79.5, F=42, f=25, i=5


Exact Lower
Cumulative
Limit or
Scores Frequency (f) Frequency
Lower
(F)
Boundary (L)

95-99 5 94.5 100


90-94 11 89.5 95
85-89 17 84.5 84
80-84 25 79.5 67
75-79 20 74.5 42
70-74 12 69.5 22
65-69 7 64.5 10
60-64 3 59.5 3
i=5 n=100
The Mode
is the simplest measure of central tendency and is easy to derive.
The mode is observed rather than computed.

Mode for Ungrouped Data

-The mode of a distribution of values is the value which occurs more often.
-Since all of the values in a distribution occur only once in a simple distribution,
there is no mode for a simple distribution.
-For a frequency distribution, the mode occurs more times than any of the
other values.
If two values in a distribution
occur equally more often
than the other values, the
distribution is referred to as
bimodal.
Mode for Grouped Data

For a grouped frequency distribution, the mode is in the interval having the
greatest frequency and is called the modal interval.

The information provided in the frequency column indicates that the modal interval for this
distribution is the 14-16 interval.
The mode represents the midpoint of the modal interval.
The mode of a distribution is not always at the middle or center of a
distribution.
It can be located at any point within the range of observations that make up
the distribution examined.

To find the mode of a grouped data, use the formula:

Where:

- the exact lower limit of the modal class


- the difference between the frequency of the modal class and that of the
frequency below the modal class
- the difference between the frequency of the modal class and that of the
frequency above the modal class
- the size of the class interval
Example:
Consider the distribution of the weekly wages of the factory
workers in Sofia’s Garments Factory

Weekly Wages No. of Solution:


(in Pesos) Workers
1,380- 1,399 4
1,360-1,379 6
1,340-1,359 12
1,320-1,339 modal class/interval 31
1,300-1,319 24
1,280-1,299 15
1,260-1,279 11
1,240-1,259 8

Thus, the modal weekly wage of the factory workers is


approximately P1,324.88
Measures of Dispersion or
Variability
(Range, Mean Deviation, Variance,
and Standard Deviation)
In order to understand measures of dispersion, let us consider an example.

The daily income of the workers in two factories are:


Factory A: 35, 45, 50, 65, 70, 90, 100
Factory B: 60, 65, 65, 65, 65, 65, 70

• Here we observe that in both groups the mean of the data is the same, namely, 65.
Mean A=(35+45+50+65+70+90+100)/7 = 455/7=65
Mean B =(60+65+65+65+65+65+70)/7 =455/7= 65

-In group A, the observations are much more scattered from the mean.
-In group B, almost all the observations are concentrated around the mean.
Certainly, the two groups differ even though they have the same mean. Thus, there arises a
need to differentiate between the groups. We need some other measures which concern with
the measure of scatteredness (or spread).
To do this, we study what is known as measures of dispersion.
Example:
Two sections of 10 students each in class X in a certain school were given a
common test in Mathematics (40 maximum marks). The scores of the students are
given below:

The average score in section A is 19.


The average score in section B is 19.

Clearly, the extent of spread or dispersion


of data is different from section A from
that of B.
The measurement of the scatter of the
given data about the average is said to be
a measure of dispersion or scatter.

The position of the mean is marked by an arrow in the dot diagram.


Range

For section 1, the highest score is 43, while the lowest score is 38. Thus,

range = 43 − 38 = 5

On the other hand, for section 2, the highest score is 47, while the lowest
score is 33. Thus,

range = 47 − 33 = 14

Therefore, the scores of students surveyed from section 2 gets a wider range
than those of students surveyed from section 1.
Mean Deviation from the Mean
The mean deviation is the sum of the absolute values of the deviations from the
mean divided by the number of items.

The following steps are employed to calculate the mean deviation from the mean:
1. Make a column of deviation from the mean.
2. Take the absolute value of each deviation. For calculating the mean deviation
from the mean of raw data, use:
Example:

Section A Section B
Observati Deviations from Observati Deviations from
ons (x) Mean │(x – x)│ ons (x) Mean │(x – x)│
(x– x) (x– x)
6 -13 13 15 -4 4
9 -10 10 16 -3 3
11 -8 8 16 -3 3
13 -6 6 17 -2 2
15 -4 4 18 -1 1
21 2 2 19 0 0
23 4 4 20 1 1
28 9 9 21 2 2
29 10 10 23 4 4
35 16 16 25 6 6
190 0 ∑│(x – x)│=82 190 0 ∑│(x – x)│=26

Thus, 82÷10 or 8.2 is the mean deviation Thus, 26÷10 or 2.6 is the mean deviation from
from mean of Section A. mean of Section B
Variance and Standard Deviation
Suppose that the center of a population data set {x1, x2,…, xN} is best described by the
arithmetic mean µ and that our goal is to get the average “distance” of each data point xi form
µ. Naturally, we would like to compute for

However, using the properties of summations, and the fact that n µ = x1 + x2 + · · · + xN we can
check that

In other words, the sum of the deviations from the mean is 0, and therefore, we cannot have a
meaningful measure of variability this way.
As we may have noticed, the formula for the sample variance differs significantly from the
formula for the population variance mainly because of the divisor n − 1. The reason behind this
is rather technical and mathematical in nature. Simply taken, the divisor n − 1 removes the
“bias” in s2 when we want it to estimate σ2 for the purposes of making inferences.
Example: Using the sample data sets in example 37, determine which section exhibits a
greater variability in terms of standard deviations.

Solution. Let x denote the scores of students sampled from section 1 and let y denote the
scores of students sampled from section 2. To calculate the standard deviations of each sample,
we first take note that the sample means from each section are

Mean Section 1: (40+38+42+40+39+39+43+40+39+40)/10= 400/10=40


Mean Section 2: (46+37+40+33+42+36+40+47+34+45)/10= 400/10=40
To calculate the sample standard deviation, we construct the following table.
Therefore, the sample variance for the sample from section 1 is

while the sample variance for the sample from section 2 is

Taking square roots, we find that the sample standard deviations of section 1 and section 2 respectively are
2.2222 ≈ 1:49 and 24.8888≈ 4. 99. We can conclude that for these samples, the one from section 1 exhibits
the lesser variability than that from section 2. We comment that even though the two samples have equal
means, the standard deviations showed the actual difference between the two data sets.
Assessment:
1. A research objective is presented. For each, identify the (a)population and (b) sample in the
study.
a) A polling organization contacts 2141 male university graduates who have a white-
collar job and asks whether or not they had received a raise at work during the past 4
months.
b) A quality-control manager randomly selects 70 bottles of ketchup that were filled on
July 17 to assess the calibration of the filling machine.
c) Every year the PSA releases the Current Population Report based on a survey of 50,000
households. The goal of this report is to learn the demographic characteristics, such as
income, of all households within the Philippines.

2. Determine the level of measurement of each variable.

a) birth order among siblings in a family


b) favorite movie
c) volume consumption of water used by a household in a day
d) eye color
e) number of siblings
3. Determine the type of sampling used.

a) A member of Congress wishes to determine her constituents’ opinion regarding estate


taxes. She divides her constituency into three income classes: low-income households,
middle-income households, and upper-income households. She then takes a simple
random sample of house- holds from each income class.
b) A college official divides the student population into five classes: freshman, sophomore,
junior, senior, and graduate student. The official takes a simple random sample from
each class and asks the members opinions regarding student services.

c) The presider of a guest-lecture series at a university stands outside the auditorium before
a lecture begins and hands every fifth person who arrives, beginning with the third, a
speaker evaluation survey to be completed and returned at the end of the program.

d) To determine his DSL Internet connection speed, Shawn divides up the day into four parts:
morning, midday, evening, and late night. He then measures his Internet connection speed
at 5 randomly selected times during each part of the day.

e) 24 Hour Fitness wants to administer a satisfaction survey to its current members. Using its
membership roster, the club randomly selects 40 club members and asks them about their
level of satisfaction with the club.
4. Patricia categorized her spending for this month into four categories: Rent, Food, Fun, and
Other. The percents she spent in each category are pictured here. If she spent a total of PhP
26,000 this month, how much did she spend on rent?
5. You recorded the time in seconds it took for 8 participants to solve a puzzle. The times
were: 15.2, 18.8, 19.3, 19.7, 20.2, 21.8, 22.1, and 29.4

a) Calculate the mean and the median time it took for the 8 participants to solve a
puzzle.
b) Calculate the range and standard deviation of the time it took for the 8
participants to solve the puzzle.

6. Make up three data sets with 5 numbers each that have:

a) the same mean but different standard deviations.


b) the same mean but different medians.
c) the same median but different means.

You might also like