Descriptive Statistics Measures

MODULE 6: DESCRIPTIVE STATISTICAL MEASURES
UNIT 1: MEASURES OF CENTRAL TENDENCY

(For SEPTEMBER 23-25)
Learning Outcome:Calculate and interpret measures of central tendency given

a set of business-related data.
Numerical values that tend to locate in some sense the middle of a set of data when
arranged in increasing or decreasing order are called measures of central tendency or
central location. The term average is often associated with these measures, which are the
mean, median, mode,and midrange. In this unit, we will walk through the simple steps in
computing these measures.
MEAN
a. Arithmetic Mean. It is obtained by adding all the observations and dividing the sum by
the number of observations, thus it is called a computational average.
1. Population Mean: If 𝑥1 , 𝑥2 , … , 𝑥𝑁 represents the data values from a finite population of

size 𝑁, the population mean𝜇 (Greek letter “mu”, not the letter u) is given by
𝑥𝑖
𝜇=
𝑁
2. Sample Mean: If𝑥1 , 𝑥2 , … , 𝑥𝑛 representsthe data values from a finite sample of size 𝑛,
the sample mean𝑥 (“𝑥 bar”) is given by
𝑥𝑖
𝑥=
𝑛
The symbol 𝑥𝑖 , read “summation of 𝑥 sub 𝑖” means that we take the sum of all the
values in the data set. It uses the Greek letter Σ “sigma” (not the letter E!). Note that the
data values do not need to be arranged in any order when the mean is computed. For
data sets with many values, 𝑥𝑖 can be computed using the Statistics mode of a
scientific calculator.
Example 1:
Suppose you chose ten people who entered the campus and whose ages are as
follows: 15, 25, 18, 20, 25, 18, 18, 20, 20, 25. What is the mean age of this sample?
Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any
means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited. 77
Solution:
𝑥𝑖 15 + 25 + 18 + 20 + 25 + 18 + 18 + 20 + 20 + 25
𝑥= = = 20.40
𝑛 10
The mean age of the sample is 20.40 years.
Note that your solution does not have to include the list of all values being added.
For instance, you can simply write
𝑥𝑖 204
𝑥= = = 20.40.
𝑛 10
R Script
# Create the data vector (for small samples)
ages <-c(15, 25, 18, 20, 25, 18, 18, 20, 20, 25)
# Compute for the mean of the ages

mean(ages)
[1] 20.4
b. Weighted Mean. If the data values 𝑥1 , 𝑥2 , … , 𝑥𝑘 have assigned weights 𝑤1 , 𝑤2 , … , 𝑤𝑘 ,

respectively, the weighted mean is given by
𝑤𝑖 𝑥𝑖
𝑥=
𝑤𝑖
Example 2:
A student was taking 5 subjects last semester. Find his average if his final grades were as
follows:
Solution:
The grades will serve as the data values𝑥𝑖 and the units will be the corresponding
weights 𝑤𝑖 .
3 1.75 + 5 2.50 + 3 2.25 + 2 1.50 + 4 3.0
𝑥= = 2.32
3+5+3+2+4
The weighted average of the student is 2.32.
R Script
# Create the data vectors
grade <-c(1.75, 2.50, 2.25, 1.50, 3.0)
units <-c(3, 5, 3, 2, 4)
# Compute for the weighted mean
weighted.mean(grade, units)
[1] 2.323529
Characteristics of the Mean

1. It can be used for interval and ratio measurements.
2. All the scores or measurements are considered in the computation of the mean.
3. Very high or very low scores or measurements affect the mean.
MODE
The mode of a data set is the value in the distribution with the highest frequency. It locates
the point where the observation values occur with the greatest density. It can be used for
quantitative aswell as qualitative data. The mode of a population is denoted by 𝜇 (“mu
hat”) while that of a sample is denoted by 𝑥 (“𝑥 hat”).
A data set can have one mode, more than one mode, or no mode.
 When two data values occur with the same greatest frequency, then the data set has
two modes and is calledbimodal.
 When more than two data values occur with the same greatest frequency, each of
those values is a mode and the data set is said to be multimodal.
 When no data value is repeated, or if all data values are repeated the same number of
times, we say that there is no mode.
Example 3:
Observe the given ungrouped data below:
a. 1,2,3,4,5,6,7 (No Mode)
b. 15.2, 12.3, 4.6, 12.3, 6.5, 12.3, 5.5 (There is one mode, 𝑥 = 12.3)
c. 15,12,4,15,4,6,5 (There are two modes, 𝑥 = 12 and 𝑥 = 4, so the data set is bimodal)
d. 3,4,5,1,3,2,4,5,7,10 (There are three modes, 𝑥 = 3,𝑥 = 4, and 𝑥 = 5, so the data set is
multimodal.)
Characteristics of the Mode

1. It is very easy to compute or determine but is seldom used because it is very unstable.
2. It is used when a rough or quick estimate of a central value is wanted.
3. It is most appropriate for nominal scale as a measure of popularity.
R Script
Here we present hypothetical examples, considering both numeric data and
nonnumeric data.
# Create the data vector

values <-c(3, 4, 5, 1, 3, 2, 4, 5, 7, 10)
labels <-c("A1", "A2", "A1", "A4", "A4", "A3", "A3", "A2", "A2", "A2",
"A3", "A2")
# Determine the mode for numeric vector

sort(table(values), decreasing=TRUE)
values
3 4 5 1 2 7 10
2 2 2 1 1 1 1
# Determine the mode for nonnumeric vector

sort(table(labels), decreasing =TRUE)
labels
A2 A3 A1 A4
5 3 2 2
MEDIAN
The median of a data set is the value that divides the distribution into two equal parts(after
arranging thevalues in ascending or descending order). As such, it is a positional average.
The median𝜇(“mu curl” or “mu tilde”) of the population or 𝑥 (“𝑥 curl” or “𝑥 tilde”) can be
determined using the following formula:
𝑥𝑁+1 𝑖𝑓 𝑁 𝑖𝑠 𝑜𝑑𝑑 𝑥𝑛 +1 𝑖𝑓 𝑛 𝑖𝑠 𝑜𝑑𝑑

2 2
𝜇 = 𝑥𝑁 + 𝑥𝑁 +1 𝑜𝑟 𝑥 = 𝑥𝑛 + 𝑥𝑛 +1
2 2 2 2
𝑖𝑓 𝑁 𝑖𝑠 𝑒𝑣𝑒𝑛 𝑖𝑓 𝑛 𝑖𝑠 𝑒𝑣𝑒𝑛
2 2
𝑁+1 𝑁 𝑁 𝑛 +1 𝑛
where𝑁 denotes the population size and 𝑛 is the sample size. Note that , 2, + 1, ,
2 2 2 2
𝑛
and + 1 are all subscripts, referring to position of the data value in the data set, after
2
being arranged in increasing (or decreasing) order. For example, 𝑥7 refers to the seventh
data value in the sequence, while 𝑥4 is the fourth value in the data set.
Example 4:
A retail outlet selling a particular product sold this many packs in the past few days: 90,
92, 93, 88, 95, 88, 97, 87, and 98. What is the median number of packs sold?
Solution:
Ordering the data from least to greatest and labeling these values, we get:
𝑥1 𝑥2 𝑥3 𝑥4 𝑥5 𝑥6 𝑥7 𝑥8 𝑥9
87 88 88 90 92 93 95 97 98
Since 𝑛 = 9 (odd),
𝑥 = 𝑥𝑛 +1 = 𝑥9+1 = 𝑥5 = 92
2 2
The median number of packs sold is 92. (Four days sold more packsthan 92 and four
days sold less than 92.)
R Script:
packs.sold<-c(90, 92, 93, 88, 95, 88, 97, 87, 98)
# Determine the Median

median(packs.sold)
[1] 92
Example 5:
The ages of 10 college students are listed below. Find the median.
18, 24, 20, 35, 19, 23, 26, 23, 19, 20
Solution:
Ordering the data from least to greatest and labeling these values, we get:
𝑥1 𝑥2 𝑥3 𝑥4 𝑥5 𝑥6 𝑥7 𝑥8 𝑥9 𝑥10
18 19 19 20 20 23 23 24 26 35
Since 𝑛 = 10 (even),
𝑥𝑛 + 𝑥𝑛 +1 𝑥10 + 𝑥10 +1 𝑥5 + 𝑥6 20 + 23
2 2 2 2
𝑥= = = = = 21.5
2 2 2 2
The median age of the college students is 21.5 years.
R Script
ages <-c(18, 24, 20, 35, 19, 23, 26, 23, 19, 20)
# Determine the Median

median(ages)
[1] 21.5
Characteristics of the Median

1. The median is most appropriate for ordinal or ranked measurements.
2. Only the middle scores or measurements are considered in the computation of the
median.
3. Very high or very low scores do not affect the median.
4. When there are extreme values in the data set (interval or ratio data), that is, the
distribution is markedly skewed, it is more appropriate to use the median than the
mean since the extreme values affect the mean.
5. The median is used as a basis of knowing whether cases fall within the upper half or
the lower half of a data distribution
MIDRANGE
Another measure of center is the midrange. Because the midrange uses only the maximum
and minimum values, it is too sensitive to those extremes, so the midrange is rarely
used. However, the midrange does have three redeeming features:
1. It is very easy to compute.
2. It helps to reinforce the important point that there are several different ways to define
the center of a data set.
3. It is sometimes incorrectly used for the median, so confusion can be reduced by
clearly defining the midrange along with the median.
The midrange of a data set is the measure of center that is the value midway between the
maximum and minimum values in the original data set. It is found by adding the maximum
data value to the minimum data value and then dividing the sum by 2, as in the following
formula:
𝑚𝑎𝑥𝑖𝑚𝑢𝑚 𝑑𝑎𝑡𝑎 𝑣𝑎𝑙𝑢𝑒 + 𝑚𝑖𝑛𝑖𝑚𝑢𝑚 𝑑𝑎𝑡𝑎 𝑣𝑎𝑙𝑢𝑒
𝑚𝑖𝑑𝑟𝑎𝑛𝑔𝑒 =
2
Example 6:
Find the midrange of these values representing the sales, in pesos, of a restaurant on five
business days:
27,531 15,684 5,638 27,997 and 25,433.
Solution:
The midrange is found as follows:
𝑚𝑎𝑥𝑖𝑚𝑢𝑚 𝑑𝑎𝑡𝑎 𝑣𝑎𝑙𝑢𝑒 + 𝑚𝑖𝑛𝑖𝑚𝑢𝑚 𝑑𝑎𝑡𝑎 𝑣𝑎𝑙𝑢𝑒
𝑚𝑖𝑑𝑟𝑎𝑛𝑔𝑒 =
2
27997 + 5638
𝑚𝑖𝑑𝑟𝑎𝑛𝑔𝑒 = = 16817.50
2
The midrange is P 16,817.50.
R Script
sales <-c(27531, 15684, 5638, 27997, 25433)
# Create vector of maximum and minimum values

x <-c(max(sales), min(sales)) # Vector x contains only the max
and min values
# Determine the Midrange

midrange<-mean(x)
midrange
[1] 16817.5
Suppose we wish to determine central tendency measures for numeric variables in a data
frame. In this case, the sapply() function in RStudio would be utilized if we wish to present
measures for variables in a data set simultaneously. For this example, we use the
“salaries.csv” file. Execute the following script in RStudio. Open a new file, select R Script to
proceed.
# Generating Measures of Central Tendency for a data frame
# We use the "salaries.csv" data

# Load packages in RStudio
library(readr)
library(pander)
# Import "salaries.csv" into RStudio

salaries <-read.csv("salaries.csv")
# Inspect the variables in the data frame

head(salaries)
X rank discipline yrs.since.phdyrs.service sex salary
1 1 Prof B 19 18 Male 139750
2 2 Prof B 20 16 Male 173200
3 3 AsstProf B 4 3 Male 79750
4 4 Prof B 45 39 Male 115000
5 5 Prof B 40 41 Male 141500
6 6 AssocProf B 6 6 Male 97000
# Generate the mean for the numeric variables

# Here, we use the sapply() function to get the mean for each variables in a
#data set simultaneously
averages<-sapply(salaries, mean)
pander(averages)
X rank discipline yrs.since.phd yrs.service sex salary
199 NA NA 22.31 17.61 NA 113706
# Notice that the sapply function behaves inconsistently when there are
# nonnumeric variables in the data set.
# To avoid this problem, we exclude the nonnumeric variables by using the
# bracket notation where contained inside is a negation of the column number/s
# of the nonnumeric variable/s
averages<-sapply(salaries[c(-1, -2, -3, -6)], mean)

pander(averages)
yrs.since.phd yrs.service salary

22.31 17.61 113706
# If you wish to eliminate only one variable, say "rank" which is in column 2
averages <-sapply(salaries[-2], mean)
pander(averages)
X discipline yrs.since.phd yrs.service sex salary

199 NA 22.31 17.61 NA 113706
# Generate the median for the numeric variables

medians <-sapply(salaries[c(-1, -2, -3, -6)], median)
pander(medians)
yrs.since.phd yrs.service salary

21 16 107300
Suppose we would like to present the mean salary of the teacher grouped according to
rank. We can generate the statistical measures in RStudio by using the tapply and
aggregate functions. Check out the following script.
# Statistics by Group
# Using the tapply function. We generate the mean salary for each group of
teachers based on rank.
output1 <-tapply(salaries$salary, salaries$rank, mean)
pander(output1)
AssocProf AsstProf Prof

93876 80776 126772
# Using the aggregate function, we generate the same statistical measures for
the same groups.
output2 <-aggregate(salary ~rank, salaries, mean)
pander(output2)
rank salary
AssocProf 93876
AsstProf 80776
Prof 126772
Practice Exercise 6-1
Find the mean, median, mode and midrange of the following data set on the total weight,
in kilograms, of ready-to-cook chicken inasal leg quarters sold by a frozen foods retail store
during selected days of June and July. Express your answers up to 2 decimal places.
35.2 7.0 24.0 42.4 33.0 27.5 24.0 21.0 8.0 45.6 25.9 14.8 29.8 21.0
17.5 9.7 40.0 18.8 57.9 21.0 12.0 12.0 19.6 51.5 12.0 36.8 13.7 32.8
12.0 10.5 22.5 19.5 37.5 35.0 10.5 33.6 14.5 36.5 17.9 26.9 12.0 41.5
Learning Reinforcement Activity No. 6-1: MEASURES OF CENTRAL TENDENCY

Accomplish by September 25, 2020
Using RStudio, solve the following problems as directed. Submit a single .docx file
containing the output of R for each problem and submit also the saved RStudio script.
Summarize your answers for each problem with a conclusion. Save your files as LRA6-
1<LASTNAME>.docx and LRA6-1<LASTNAME>.R.
1. Find the mean, median, mode, and midrange for the following data set
representing the number of applications for a fiber internet plan received in a day
by a service provider, over the past 30 working days. (10 points)
45 46 48 53 54 55 56 59 62 63
65 66 66 69 69 70 71 71 73 73
74 75 75 75 77 78 81 82 82 83
2. A BS Accountancy student received the following final grades in his course during
the second semester of his sophomore year. Find his general weighted average if his
final grades were as follows. Would he be part of the Dean’s List for the semester if
the cutoff grade is 88? (5 points)
FINAL
COURSE NO. DESCRIPTIVE TITLE UNITS
GRADE
CFE 104 CICM Missionary Identity 3 89
GSTS Science, Technology, and Society 3 84
GMATH Mathematics in the Modern World 3 89
Physical Activity Towards Health & Fitness
FIT AQ 2 91
(Aquatics)
AE 221 Intermediate Accounting 3 3 88
AE 222 Accounting Information Systems 3 95
BLR 221 Business Laws and Regulations 2 3 87
CMPC 221 Accounting for Business Combinations 3 89
INCTAXa Income Taxation 6 87
3. Below is the number of units produced by a factory in the last 33 days of production.
Assuming the data to be a sample, compute the mean, median, mode and
midrange. (10 points)
322 343 348 358 361 366 374 376 386 390 396
329 344 349 359 362 366 375 377 389 392 397
333 347 351 360 365 367 376 379 390 395 398
4. The table that follows shows the time (in minutes) it takes for customers to wait in line
before being served at a fast food restaurant. Assuming the data to be a sample,
compute the mean, median, mode and midrange. (10 points)
3.2 3.3 3.5 3.9 4.1 4.4 4.7 4.8 5.2 5.6
5.6 5.7 5.8 6.0 6.2 6.3 6.4 6.5 6.7 6.7
6.9 7.0 7.2 7.5 8.0 8.8 8.9 9.4 9.7 9.9
10.0 11.3 12.4 12.5 14.8 15.0 16.5 16.8 17.2 19.3
5. Using the “salaries.csv” data file, determine (10 points)

1. the mode for all the variables.
2. the average salary of male and female teachers
3. the average years of service of the teachers grouped according to rank.
Congratulations! You just completed all the module and units for the Prelims.
You are now ready to take the examination.
Because you were diligent with your studies, you will surely ace the exam.

Descriptive Statistics Measures

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Descriptive Statistics Measures

Uploaded by

Copyright:

Available Formats

MODULE 6: DESCRIPTIVE STATISTICAL MEASURES

UNIT 1: MEASURES OF CENTRAL TENDENCY

Learning Outcome:Calculate and interpret measures of central tendency given

1. Population Mean: If 𝑥1 , 𝑥2 , … , 𝑥𝑁 represents the data values from a finite population of

# Compute for the mean of the ages

b. Weighted Mean. If the data values 𝑥1 , 𝑥2 , … , 𝑥𝑘 have assigned weights 𝑤1 , 𝑤2 , … , 𝑤𝑘 ,

Characteristics of the Mean

Characteristics of the Mode

# Create the data vector

# Determine the mode for numeric vector

# Determine the mode for nonnumeric vector

𝑥𝑁+1 𝑖𝑓 𝑁 𝑖𝑠 𝑜𝑑𝑑 𝑥𝑛 +1 𝑖𝑓 𝑛 𝑖𝑠 𝑜𝑑𝑑

# Determine the Median

# Determine the Median

Characteristics of the Median

# Create vector of maximum and minimum values

# Determine the Midrange

# Generating Measures of Central Tendency for a data frame

# We use the "salaries.csv" data

# Import "salaries.csv" into RStudio

# Inspect the variables in the data frame

# Generate the mean for the numeric variables

averages<-sapply(salaries[c(-1, -2, -3, -6)], mean)

yrs.since.phd yrs.service salary

X discipline yrs.since.phd yrs.service sex salary

# Generate the median for the numeric variables

yrs.since.phd yrs.service salary

AssocProf AsstProf Prof

Practice Exercise 6-1

Learning Reinforcement Activity No. 6-1: MEASURES OF CENTRAL TENDENCY

5. Using the “salaries.csv” data file, determine (10 points)

You might also like