You are on page 1of 10

MODULE 6: DESCRIPTIVE STATISTICAL MEASURES

UNIT 1: MEASURES OF CENTRAL TENDENCY


(For SEPTEMBER 23-25)

Learning Outcome:Calculate and interpret measures of central tendency given


a set of business-related data.

Numerical values that tend to locate in some sense the middle of a set of data when
arranged in increasing or decreasing order are called measures of central tendency or
central location. The term average is often associated with these measures, which are the
mean, median, mode,and midrange. In this unit, we will walk through the simple steps in
computing these measures.

MEAN

a. Arithmetic Mean. It is obtained by adding all the observations and dividing the sum by
the number of observations, thus it is called a computational average.

1. Population Mean: If 𝑥1 , 𝑥2 , … , 𝑥𝑁 represents the data values from a finite population of


size 𝑁, the population mean𝜇 (Greek letter “mu”, not the letter u) is given by
𝑥𝑖
𝜇=
𝑁

2. Sample Mean: If𝑥1 , 𝑥2 , … , 𝑥𝑛 representsthe data values from a finite sample of size 𝑛,
the sample mean𝑥 (“𝑥 bar”) is given by
𝑥𝑖
𝑥=
𝑛

The symbol 𝑥𝑖 , read “summation of 𝑥 sub 𝑖” means that we take the sum of all the
values in the data set. It uses the Greek letter Σ “sigma” (not the letter E!). Note that the
data values do not need to be arranged in any order when the mean is computed. For
data sets with many values, 𝑥𝑖 can be computed using the Statistics mode of a
scientific calculator.

Example 1:
Suppose you chose ten people who entered the campus and whose ages are as
follows: 15, 25, 18, 20, 25, 18, 18, 20, 20, 25. What is the mean age of this sample?

Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any
means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited. 77
Solution:
𝑥𝑖 15 + 25 + 18 + 20 + 25 + 18 + 18 + 20 + 20 + 25
𝑥= = = 20.40
𝑛 10
The mean age of the sample is 20.40 years.

Note that your solution does not have to include the list of all values being added.
For instance, you can simply write
𝑥𝑖 204
𝑥= = = 20.40.
𝑛 10

R Script
# Create the data vector (for small samples)
ages <-c(15, 25, 18, 20, 25, 18, 18, 20, 20, 25)

# Compute for the mean of the ages


mean(ages)

[1] 20.4

b. Weighted Mean. If the data values 𝑥1 , 𝑥2 , … , 𝑥𝑘 have assigned weights 𝑤1 , 𝑤2 , … , 𝑤𝑘 ,


respectively, the weighted mean is given by
𝑤𝑖 𝑥𝑖
𝑥=
𝑤𝑖

Example 2:
A student was taking 5 subjects last semester. Find his average if his final grades were as
follows:

Solution:
The grades will serve as the data values𝑥𝑖 and the units will be the corresponding
weights 𝑤𝑖 .
3 1.75 + 5 2.50 + 3 2.25 + 2 1.50 + 4 3.0
𝑥= = 2.32
3+5+3+2+4
The weighted average of the student is 2.32.

Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any
means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited. 78
R Script
# Create the data vectors
grade <-c(1.75, 2.50, 2.25, 1.50, 3.0)
units <-c(3, 5, 3, 2, 4)
# Compute for the weighted mean
weighted.mean(grade, units)

[1] 2.323529

Characteristics of the Mean


1. It can be used for interval and ratio measurements.
2. All the scores or measurements are considered in the computation of the mean.
3. Very high or very low scores or measurements affect the mean.

MODE

The mode of a data set is the value in the distribution with the highest frequency. It locates
the point where the observation values occur with the greatest density. It can be used for
quantitative aswell as qualitative data. The mode of a population is denoted by 𝜇 (“mu
hat”) while that of a sample is denoted by 𝑥 (“𝑥 hat”).

A data set can have one mode, more than one mode, or no mode.
 When two data values occur with the same greatest frequency, then the data set has
two modes and is calledbimodal.
 When more than two data values occur with the same greatest frequency, each of
those values is a mode and the data set is said to be multimodal.
 When no data value is repeated, or if all data values are repeated the same number of
times, we say that there is no mode.

Example 3:
Observe the given ungrouped data below:
a. 1,2,3,4,5,6,7 (No Mode)
b. 15.2, 12.3, 4.6, 12.3, 6.5, 12.3, 5.5 (There is one mode, 𝑥 = 12.3)
c. 15,12,4,15,4,6,5 (There are two modes, 𝑥 = 12 and 𝑥 = 4, so the data set is bimodal)
d. 3,4,5,1,3,2,4,5,7,10 (There are three modes, 𝑥 = 3,𝑥 = 4, and 𝑥 = 5, so the data set is
multimodal.)

Characteristics of the Mode


1. It is very easy to compute or determine but is seldom used because it is very unstable.
2. It is used when a rough or quick estimate of a central value is wanted.
3. It is most appropriate for nominal scale as a measure of popularity.

Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any
means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited. 79
R Script
Here we present hypothetical examples, considering both numeric data and
nonnumeric data.

# Create the data vector


values <-c(3, 4, 5, 1, 3, 2, 4, 5, 7, 10)
labels <-c("A1", "A2", "A1", "A4", "A4", "A3", "A3", "A2", "A2", "A2",
"A3", "A2")

# Determine the mode for numeric vector


sort(table(values), decreasing=TRUE)

values
3 4 5 1 2 7 10
2 2 2 1 1 1 1

# Determine the mode for nonnumeric vector


sort(table(labels), decreasing =TRUE)

labels
A2 A3 A1 A4
5 3 2 2

MEDIAN

The median of a data set is the value that divides the distribution into two equal parts(after
arranging thevalues in ascending or descending order). As such, it is a positional average.
The median𝜇(“mu curl” or “mu tilde”) of the population or 𝑥 (“𝑥 curl” or “𝑥 tilde”) can be
determined using the following formula:

𝑥𝑁+1 𝑖𝑓 𝑁 𝑖𝑠 𝑜𝑑𝑑 𝑥𝑛 +1 𝑖𝑓 𝑛 𝑖𝑠 𝑜𝑑𝑑


2 2
𝜇 = 𝑥𝑁 + 𝑥𝑁 +1 𝑜𝑟 𝑥 = 𝑥𝑛 + 𝑥𝑛 +1
2 2 2 2
𝑖𝑓 𝑁 𝑖𝑠 𝑒𝑣𝑒𝑛 𝑖𝑓 𝑛 𝑖𝑠 𝑒𝑣𝑒𝑛
2 2

𝑁+1 𝑁 𝑁 𝑛 +1 𝑛
where𝑁 denotes the population size and 𝑛 is the sample size. Note that , 2, + 1, ,
2 2 2 2
𝑛
and + 1 are all subscripts, referring to position of the data value in the data set, after
2
being arranged in increasing (or decreasing) order. For example, 𝑥7 refers to the seventh
data value in the sequence, while 𝑥4 is the fourth value in the data set.

Example 4:
A retail outlet selling a particular product sold this many packs in the past few days: 90,
92, 93, 88, 95, 88, 97, 87, and 98. What is the median number of packs sold?

Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any
means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited. 80
Solution:
Ordering the data from least to greatest and labeling these values, we get:
𝑥1 𝑥2 𝑥3 𝑥4 𝑥5 𝑥6 𝑥7 𝑥8 𝑥9
87 88 88 90 92 93 95 97 98

Since 𝑛 = 9 (odd),
𝑥 = 𝑥𝑛 +1 = 𝑥9+1 = 𝑥5 = 92
2 2
The median number of packs sold is 92. (Four days sold more packsthan 92 and four
days sold less than 92.)

R Script:
# Create the data vector
packs.sold<-c(90, 92, 93, 88, 95, 88, 97, 87, 98)

# Determine the Median


median(packs.sold)

[1] 92

Example 5:
The ages of 10 college students are listed below. Find the median.
18, 24, 20, 35, 19, 23, 26, 23, 19, 20

Solution:
Ordering the data from least to greatest and labeling these values, we get:
𝑥1 𝑥2 𝑥3 𝑥4 𝑥5 𝑥6 𝑥7 𝑥8 𝑥9 𝑥10
18 19 19 20 20 23 23 24 26 35
Since 𝑛 = 10 (even),
𝑥𝑛 + 𝑥𝑛 +1 𝑥10 + 𝑥10 +1 𝑥5 + 𝑥6 20 + 23
2 2 2 2
𝑥= = = = = 21.5
2 2 2 2
The median age of the college students is 21.5 years.

R Script
# Create the data vector
ages <-c(18, 24, 20, 35, 19, 23, 26, 23, 19, 20)

# Determine the Median


median(ages)

[1] 21.5

Characteristics of the Median


1. The median is most appropriate for ordinal or ranked measurements.

Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any
means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited. 81
2. Only the middle scores or measurements are considered in the computation of the
median.
3. Very high or very low scores do not affect the median.
4. When there are extreme values in the data set (interval or ratio data), that is, the
distribution is markedly skewed, it is more appropriate to use the median than the
mean since the extreme values affect the mean.
5. The median is used as a basis of knowing whether cases fall within the upper half or
the lower half of a data distribution

MIDRANGE

Another measure of center is the midrange. Because the midrange uses only the maximum
and minimum values, it is too sensitive to those extremes, so the midrange is rarely
used. However, the midrange does have three redeeming features:
1. It is very easy to compute.
2. It helps to reinforce the important point that there are several different ways to define
the center of a data set.
3. It is sometimes incorrectly used for the median, so confusion can be reduced by
clearly defining the midrange along with the median.

The midrange of a data set is the measure of center that is the value midway between the
maximum and minimum values in the original data set. It is found by adding the maximum
data value to the minimum data value and then dividing the sum by 2, as in the following
formula:
𝑚𝑎𝑥𝑖𝑚𝑢𝑚 𝑑𝑎𝑡𝑎 𝑣𝑎𝑙𝑢𝑒 + 𝑚𝑖𝑛𝑖𝑚𝑢𝑚 𝑑𝑎𝑡𝑎 𝑣𝑎𝑙𝑢𝑒
𝑚𝑖𝑑𝑟𝑎𝑛𝑔𝑒 =
2

Example 6:
Find the midrange of these values representing the sales, in pesos, of a restaurant on five
business days:
27,531 15,684 5,638 27,997 and 25,433.

Solution:
The midrange is found as follows:
𝑚𝑎𝑥𝑖𝑚𝑢𝑚 𝑑𝑎𝑡𝑎 𝑣𝑎𝑙𝑢𝑒 + 𝑚𝑖𝑛𝑖𝑚𝑢𝑚 𝑑𝑎𝑡𝑎 𝑣𝑎𝑙𝑢𝑒
𝑚𝑖𝑑𝑟𝑎𝑛𝑔𝑒 =
2
27997 + 5638
𝑚𝑖𝑑𝑟𝑎𝑛𝑔𝑒 = = 16817.50
2
The midrange is P 16,817.50.

Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any
means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited. 82
R Script
# Create the data vector
sales <-c(27531, 15684, 5638, 27997, 25433)

# Create vector of maximum and minimum values


x <-c(max(sales), min(sales)) # Vector x contains only the max
and min values

# Determine the Midrange


midrange<-mean(x)
midrange

[1] 16817.5

Suppose we wish to determine central tendency measures for numeric variables in a data
frame. In this case, the sapply() function in RStudio would be utilized if we wish to present
measures for variables in a data set simultaneously. For this example, we use the
“salaries.csv” file. Execute the following script in RStudio. Open a new file, select R Script to
proceed.

# Generating Measures of Central Tendency for a data frame

# We use the "salaries.csv" data


# Load packages in RStudio
library(readr)
library(pander)

# Import "salaries.csv" into RStudio


salaries <-read.csv("salaries.csv")

# Inspect the variables in the data frame


head(salaries)
X rank discipline yrs.since.phdyrs.service sex salary
1 1 Prof B 19 18 Male 139750
2 2 Prof B 20 16 Male 173200
3 3 AsstProf B 4 3 Male 79750
4 4 Prof B 45 39 Male 115000
5 5 Prof B 40 41 Male 141500
6 6 AssocProf B 6 6 Male 97000

# Generate the mean for the numeric variables


# Here, we use the sapply() function to get the mean for each variables in a
#data set simultaneously

averages<-sapply(salaries, mean)
pander(averages)

Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any
means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited. 83
X rank discipline yrs.since.phd yrs.service sex salary
199 NA NA 22.31 17.61 NA 113706

# Notice that the sapply function behaves inconsistently when there are
# nonnumeric variables in the data set.
# To avoid this problem, we exclude the nonnumeric variables by using the
# bracket notation where contained inside is a negation of the column number/s
# of the nonnumeric variable/s

averages<-sapply(salaries[c(-1, -2, -3, -6)], mean)


pander(averages)

yrs.since.phd yrs.service salary


22.31 17.61 113706

# If you wish to eliminate only one variable, say "rank" which is in column 2
averages <-sapply(salaries[-2], mean)
pander(averages)

X discipline yrs.since.phd yrs.service sex salary


199 NA 22.31 17.61 NA 113706

# Generate the median for the numeric variables


medians <-sapply(salaries[c(-1, -2, -3, -6)], median)
pander(medians)

yrs.since.phd yrs.service salary


21 16 107300

Suppose we would like to present the mean salary of the teacher grouped according to
rank. We can generate the statistical measures in RStudio by using the tapply and
aggregate functions. Check out the following script.

# Statistics by Group

# Using the tapply function. We generate the mean salary for each group of
teachers based on rank.
output1 <-tapply(salaries$salary, salaries$rank, mean)
pander(output1)

AssocProf AsstProf Prof


93876 80776 126772

# Using the aggregate function, we generate the same statistical measures for
the same groups.

Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any
means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited. 84
output2 <-aggregate(salary ~rank, salaries, mean)
pander(output2)

rank salary
AssocProf 93876
AsstProf 80776
Prof 126772

Practice Exercise 6-1

Find the mean, median, mode and midrange of the following data set on the total weight,
in kilograms, of ready-to-cook chicken inasal leg quarters sold by a frozen foods retail store
during selected days of June and July. Express your answers up to 2 decimal places.

35.2 7.0 24.0 42.4 33.0 27.5 24.0 21.0 8.0 45.6 25.9 14.8 29.8 21.0
17.5 9.7 40.0 18.8 57.9 21.0 12.0 12.0 19.6 51.5 12.0 36.8 13.7 32.8
12.0 10.5 22.5 19.5 37.5 35.0 10.5 33.6 14.5 36.5 17.9 26.9 12.0 41.5

Learning Reinforcement Activity No. 6-1: MEASURES OF CENTRAL TENDENCY


Accomplish by September 25, 2020

Using RStudio, solve the following problems as directed. Submit a single .docx file
containing the output of R for each problem and submit also the saved RStudio script.
Summarize your answers for each problem with a conclusion. Save your files as LRA6-
1<LASTNAME>.docx and LRA6-1<LASTNAME>.R.

1. Find the mean, median, mode, and midrange for the following data set
representing the number of applications for a fiber internet plan received in a day
by a service provider, over the past 30 working days. (10 points)
45 46 48 53 54 55 56 59 62 63
65 66 66 69 69 70 71 71 73 73
74 75 75 75 77 78 81 82 82 83

2. A BS Accountancy student received the following final grades in his course during
the second semester of his sophomore year. Find his general weighted average if his
final grades were as follows. Would he be part of the Dean’s List for the semester if
the cutoff grade is 88? (5 points)

Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any
means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited. 85
FINAL
COURSE NO. DESCRIPTIVE TITLE UNITS
GRADE
CFE 104 CICM Missionary Identity 3 89
GSTS Science, Technology, and Society 3 84
GMATH Mathematics in the Modern World 3 89
Physical Activity Towards Health & Fitness
FIT AQ 2 91
(Aquatics)
AE 221 Intermediate Accounting 3 3 88
AE 222 Accounting Information Systems 3 95
BLR 221 Business Laws and Regulations 2 3 87
CMPC 221 Accounting for Business Combinations 3 89
INCTAXa Income Taxation 6 87

3. Below is the number of units produced by a factory in the last 33 days of production.
Assuming the data to be a sample, compute the mean, median, mode and
midrange. (10 points)
322 343 348 358 361 366 374 376 386 390 396
329 344 349 359 362 366 375 377 389 392 397
333 347 351 360 365 367 376 379 390 395 398

4. The table that follows shows the time (in minutes) it takes for customers to wait in line
before being served at a fast food restaurant. Assuming the data to be a sample,
compute the mean, median, mode and midrange. (10 points)

3.2 3.3 3.5 3.9 4.1 4.4 4.7 4.8 5.2 5.6
5.6 5.7 5.8 6.0 6.2 6.3 6.4 6.5 6.7 6.7
6.9 7.0 7.2 7.5 8.0 8.8 8.9 9.4 9.7 9.9
10.0 11.3 12.4 12.5 14.8 15.0 16.5 16.8 17.2 19.3

5. Using the “salaries.csv” data file, determine (10 points)


1. the mode for all the variables.
2. the average salary of male and female teachers
3. the average years of service of the teachers grouped according to rank.

Congratulations! You just completed all the module and units for the Prelims.
You are now ready to take the examination.
Because you were diligent with your studies, you will surely ace the exam.

Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any
means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited. 86

You might also like