You are on page 1of 13

Republic of the Philippines

NUEVA VIZCAYA STATE UNIVERSITY


Bayombong, Nueva Vizcaya
INSTRUCTIONAL MODULE
IM No.:IM- BE 14- 1STSEM-2021-2022

College: BUSINESS EDUCATION


Campus: BAYOMBONG ___

DEGREE PROGRAM BSBA COURSE NO. BE 14


SPECIALIZATION COURSE TITLE ECONOMIC STATISTICS

YEAR LEVEL 4 TIME FRAME 9 WK 10- IM 3


NO. 12 NO.

I. UNIT TITLE/CHAPTER TITLE


Chapter 3: Introduction to Descriptive Statistics

II. LESSON TITLE

Lesson 1: Introduction/Basic Terms and Concepts


Lesson 2: Measures of Central Tendency Lesson 3: Steps in Sampling
Lesson 4: Measures of Description
Lesson 5: Measure of Skewness
Lesson 6: Measures of Association and Correlation

III. LESSON OVERVIEW

We have seen how frequency distributions organize many observations of an economic variable
into a reduced and ordered form while still presenting the original data in its entirety. It is often useful,
however, to describe economic phenomena using a single measure(...)Reducing many observations to a
single, unique statistic is just one desirable property of a descriptive statistic. We would like our descriptive
statistics to exhibit several other properties namely: (1) The statistic is easily understood; (2) The statistic
is a single (unique) value; (3) The statistic’s value is not affected by extreme observations; (4) The statistic
is algebraically tractable; (5) The statistic utilizes all values in the dataset; and (6) The statistic utilizes the
frequencies of all values in the dataset, so the statistics are easy to comprehend and take full advantage
of the information available in the dataset.(Lewis, 2012).

IV. DESIRED LEARNING OUTCOMES

After reading this chapter, student will be able to:


1. Demonstrate knowledge of statistical terms;
2. Enumerate the different descriptive/summary measures;
3. Determine the appropriate use of descriptive/summary measures;
4. Identify the different measures of association and correlation; and
5. Interpret correctly the coefficient of association and correlation.

V. LESSON CONTENT
Chapter 3

Introduction to Descriptive Statistics

Descriptive or Summary Measures

Measures of Central Tendency are values used to identify the “center” or the typical value of a data set. It
is regarded as the most representative values of the given data. It is determined at the point where the
concentration of values is greatest. It is powerful as it can reduce huge arrays of data to a single, easily
understood number. The main purpose is to summarize or reduce data (Araneta, 2020). A measure of
central tendency is used to represent a total set of observations in a single numerical value that signifies
the location around which the variable’s observations tend to cluster, the so-called average value. The
most commonly used measures of central tendency in economics are, in order, the mean (technically, the

In accordance with Section 185, Fair Use of Copyrighted Work of Republic Act 8293, the copyrighted works included in this material may be reproduced for educational
purposes only and not for commercial distribution
NVSU-FR-ICD-05-00 (081220) Page 1 of 5
Prepared by: Aljanet M. Jandoc, PhD., EnP.
Republic of the Philippines
NUEVA VIZCAYA STATE UNIVERSITY
Bayombong, Nueva Vizcaya
INSTRUCTIONAL MODULE
IM No.:IM- BE 14- 1STSEM-2021-2022
arithmetic mean), the median, and the mode (Lewis, 2012). A measure of location is a value that is
calculated for a group of data and that is used to describe the data in some way. Typically, we wish the
value to be representative of all of the values in the group, and thus some kind of average is desired. In
the statistical sense an average is a measure of central tendency for a collection of
Values (Kazmier, 2004).

Measures of Central Tendency or Location:


1. Mean & Weighted Mean - Mean is the average score (Araneta, 2020). The arithmetic mean, or arithmetic
average, is defined as the sum of the values in the data group divided by the number of values. The
weighted mean or weighted average is an arithmetic mean in which each value is weighted according to
its importance in the overall group. (Kazmier, 2004).
2. Median - Median is the middle score (Araneta, 2020). The midpoint of the values of the data after being
ranked or ordered from smallest to largest (or lowest to highest) (Cabiles N. , 2013).
3. Mode - Mode is most common score or observation (Araneta, 2020)

Population Mean vs. Sample Mean

Population Mean is the central value of a population data where the data (or the data of an attribute) of a
certain population tends to converge (Cabiles N. , 2013).
Σ𝑋
𝜇=
𝑁

where: 𝜇 = Population Mean


X = value of an attribute of the population
N = number of values (i.e. observations) in the population
Note: In taking the Population Mean, all observations in the population are included.

Sample Mean is the central value of a sample data where the data (or the data of an attribute) of a certain
sample (from a population) tends to converge (Cabiles N. , 2013).

Σ𝑋
𝑋̅ =
𝑛

where: 𝑋̅􀈑 = Sample Mean


X = value of an attribute of the sample
n = number of values (i.e. observations) in the sample

Note: A Sample is a certain portion representative of the Population.

Properties of the Mean


1. Uniqueness - one and only one mean (Araneta, 2020). The mean is unique. Thus, for a set of data,
there is only one mean (Cabiles N. , 2013).
2. Simplicity - easy to calculate (Araneta, 2020). Every set of interval or ratio-level data has a mean. All
values must be included in computing the mean (Cabiles N. , 2013).
3. Affected by extreme values - influenced by each value; extreme values can distort the mean (Araneta,
2020)
4. The sum of the deviations of each value from the mean is zero. Mathematically Σ(𝑋-𝑋̅) = 0
Demonstration (Cabiles N. , 2013):
Consider 3 values: 3, 8, 4
𝑋̅􀈑 = (3+8+4)/3 = 5 (Note that, n = 3)
Σ(𝑋-𝑋̅) = (3-5) + (8-5) + (4-5)
= -2 + 3 – 1 = 0

Weakness of the Mean as Measure of Location the usage of all values makes the mean affected
by the presence of extreme values, where in such case, the Mean no longer becomes an accurate
Measure of Location (i.e. Central Tendency of Data) (Cabiles N. , 2013).
Demonstration:
Consider the Following Data on the Price of an Economics Textbook:
In accordance with Section 185, Fair Use of Copyrighted Work of Republic Act 8293, the copyrighted works included in this material may be reproduced for educational
purposes only and not for commercial distribution
NVSU-FR-ICD-05-00 (081220) Page 2 of 5
Prepared by: Aljanet M. Jandoc, PhD., EnP.
Republic of the Philippines
NUEVA VIZCAYA STATE UNIVERSITY
Bayombong, Nueva Vizcaya
INSTRUCTIONAL MODULE
IM No.:IM- BE 14- 1STSEM-2021-2022
 Php 629 Php 616 Php 625 Php 608 Php 1,200
 Mean Price of Economics Textbook: Php 735.60
 Comparing the above data & the mean, it can be observed that none of the data is actually that
close of the mean of Php 735.60. In fact, the majority of the data that is at the ones at the 600’s
level, are far behind the Mean. This is due to the extreme value of Php 1,200, pulling up the Mean.
Weighted Mean is a special case of the mean, used for data whose values may have more than one
frequency (or occurs repeatedly) (Cabiles N. , 2013).

Σ(𝑤𝑋)
𝑋̅𝑤 =
Σ𝑤

where: 𝑋̅𝑤 = Weighted Mean


X = value of an attribute of the sample
w = weight (or frequency)

Example:
Suppose that Tea Academy, a milk tea shop, has sold 31 Grande (G)-Size Tumblers, 64 Tall (T)
Size Tumblers & 121 Short (S)-Size Tumblers. The prices for the different size tumblers are as follows
(Cabiles N. , 2013):
Grande – Php 150, Tall – Php 125, Short – Php 100.
• What is the Weight Mean Price (P􀈑w) for Tea Academy’s Milk Tea?
Solution:
1. Note that the recurring value in this dataset are the respective prices of the tumblers of different size.
(e.g. Php 150 Grande Tumblers occur 31 times)
2. The Frequencies/Weights (w) of the different size Tumblers are: wG = 31 wT = 64 wS = 121
3. At the same time, note the prices are:
PG = 150 PT = 125 PS = 100

Weighted Mean
Solution:
4. The Weighted Mean Price (𝑃̅𝑤 ) is then:
wG = 31 wT = 64 wS = 121
PG = 150 PT = 125 PS = 100

Σ𝑤𝑥 𝑃𝑥 (31 ∙ 150) + (64 ∙ 125) + (121 ∙ 100)


𝑃̅𝑤 = =
Σ𝑤𝑥 (31 + 64 + 121)

𝑃̅𝑤 = 114.58

Median – is the alternative measure of central tendency in the event that the Mean is compromised due
to the presence of extreme values in the data. It is the midpoint of the values of the data after being ranked
or ordered from smallest to largest (or lowest to highest). For even-numbered observations, there are 2
midpoints values. The Median, in such case is given by the average of the two midpoint values. The
Median (unlike the Mean) is unaffected by extreme values since the Median has an equal number of
observations “below” it and “above” it (Cabiles N. , 2013).

Example:

Suppose the following data for Smartphone prices (Cabiles N. , 2013):

In accordance with Section 185, Fair Use of Copyrighted Work of Republic Act 8293, the copyrighted works included in this material may be reproduced for educational
purposes only and not for commercial distribution
NVSU-FR-ICD-05-00 (081220) Page 3 of 5
Prepared by: Aljanet M. Jandoc, PhD., EnP.
Republic of the Philippines
NUEVA VIZCAYA STATE UNIVERSITY
Bayombong, Nueva Vizcaya
INSTRUCTIONAL MODULE
IM No.:IM- BE 14- 1STSEM-2021-2022

Source: (Cabiles N. , 2013) (Cabiles N. , 2013)


Note that the Median is closer to the values of the data despite the presence of an extreme value.
As pointed out, there are equal number of observations “below” and “above” the Median (Cabiles N. ,
2013).

Properties of the Median


1. Uniqueness - only one median for each set of data
2. Simplicity - easy to calculate
3. Affected by extreme values - not as drastically affected by extreme values as is the mean

Mode is the measure of central tendency more frequently used in cases of nominal-level data. The value
or attribute of the observations that appears most frequently (i.e. value or attribute with the most
frequencies or largest class frequency) (Cabiles N. , 2013).

Example:

Recall the example on Tea Academy Sales (Cabiles N. , 2013):

Source: (Cabiles N. , 2013)

Properties of the Mode

1. Uniqueness - does not always exist; if it does, may not unique


2. Simplicity - easy to determine
3. Affected by extreme values – not affected by extreme values

Choosing the Most Suitable Measure of Central Tendency


Measures of Central Tendency
Criteria Mean Median Mode
Definition Center of mass or Center of the Typical value
Balancing point array
Data requirement At least interval scale At least interval Even if nominal
and values that are scale scale only
close to each other
Existence/ Always exists/ always Always exists/ Might not exist /
Uniqueness unique always unique Not always
unique
Takes into account Yes No No
every value?
Affected by Yes No No
Outliers?
In accordance with Section 185, Fair Use of Copyrighted Work of Republic Act 8293, the copyrighted works included in this material may be reproduced for educational
purposes only and not for commercial distribution
NVSU-FR-ICD-05-00 (081220) Page 4 of 5
Prepared by: Aljanet M. Jandoc, PhD., EnP.
Republic of the Philippines
NUEVA VIZCAYA STATE UNIVERSITY
Bayombong, Nueva Vizcaya
INSTRUCTIONAL MODULE
IM No.:IM- BE 14- 1STSEM-2021-2022
Source: (Araneta, 2020)

Relative Positions of the Mean, Median and Mode:

Case 1: Symmetric Distribution

Source: (Cabiles N. , 2013)

Case 2: Positively Skewed Distribution

Source: (Cabiles N. , 2013)

Case 3: Negatively Skewed Distribution

Source: (Cabiles N. , 2013)

Measures of Dispersion – it characterizes the data set in terms of how varied the observations are from
each other. The amount may be small when the values are close together; may be large when the
observations are widely spread out from the center. The smallest possible value is 0 indicating absence
of variation (Araneta, 2020). An indication of how close the values of a data set are to each other. The
extent by which the values in a data set are clustered around the Mean (Cabiles N. , 2013).

Why measure dispersion?

1. The mean becomes unreliable in the presence of extreme values. Recall the weakness of the mean
(Page 2). A wide dispersion of the data (i.e. a high Measure of Dispersion) determines when the Median
In accordance with Section 185, Fair Use of Copyrighted Work of Republic Act 8293, the copyrighted works included in this material may be reproduced for educational
purposes only and not for commercial distribution
NVSU-FR-ICD-05-00 (081220) Page 5 of 5
Prepared by: Aljanet M. Jandoc, PhD., EnP.
Republic of the Philippines
NUEVA VIZCAYA STATE UNIVERSITY
Bayombong, Nueva Vizcaya
INSTRUCTIONAL MODULE
IM No.:IM- BE 14- 1STSEM-2021-2022
or the Mode must be used as a Measure of Location or Central Tendency for a data set (instead of the
mean).
2. Given two or more sets of data with equal (or almost equal) Means, dispersion determines how
comparable the means are of these data sets (Cabiles N. , 2013).

Illustration:
Consider the Exam Scores of Two Sections, A & B, with 5 students each:

Source: (Cabiles N. , 2013)

From above, both Sections A & B have the same Mean. Thus, looking only at the Mean, it may be
said that students of both Sections A & B then to have a score of 83. However, note the scores of the
students vary more Section B than in Section A. This point is not captured by the Means of Sections A and
B. As such, looking only at the Means of 2 (or more) sets of data can be misleading (Cabiles N. , 2013).

Measures of Dispersion:
1. Range
2. Mean Deviation
3. Variance & Standard Deviation

Some Uses of Measures of Dispersion

Range, R = is the maximum minus the minimum. It uses only the extreme values. It fails to communicate
any information about the clustering or the lack of clustering of the values between the extremes. An outlier
can greatly alter its value (Araneta, 2020). A measure of dispersion considering only two values in the
dataset, especially the smallest and largest values in the dataset. Difference between the smallest and the
largest values in a dataset (Cabiles N. , 2013).

𝑅𝑎𝑛𝑔𝑒 = 𝐿𝑎𝑟𝑔𝑒𝑠𝑡 𝑉𝑎𝑙𝑢𝑒 − 𝑆𝑚𝑎𝑙𝑙𝑒𝑠𝑡 𝑉𝑎𝑙𝑢𝑒

Recall the data on Exam Scores for Sections A & B:

Source: (Cabiles N. , 2013)

The Largest Value for Section A (L A) = 86, Smallest Value for Section A (SA) = 80. Range for
Section A (RA) = 6. LB = 90, SB = 75. RB = 25. Since RA < RB, the Exam Scores of Section B are more
widely dispersed than that of Section A. With the Exam Scores of Section B, being widely dispersed, the
Mean of Section B may not be that reliable as a value of Central Tendency (Cabiles N. , 2013).

Mean Deviation - gives the average (i.e. mean) amount by which the values in a population (or a sample),
that is the values in a dataset, vary from the Mean. It is more accurate Measure of Dispersion as opposed
to the Range as the latter only considers 2 values in the dataset (Cabiles N. , 2013).

Σ |𝑋 − ̅̅̅
𝑋|
𝑀𝐷 =
𝑛

where: 𝑀𝐷 = Mean Deviation


𝑋 = value of each observation
𝑋̅􀈑 = mean of the values of the population/sample/dataset
𝑛 = number of observations in the population/sample/dataset
In accordance with Section 185, Fair Use of Copyrighted Work of Republic Act 8293, the copyrighted works included in this material may be reproduced for educational
purposes only and not for commercial distribution
NVSU-FR-ICD-05-00 (081220) Page 6 of 5
Prepared by: Aljanet M. Jandoc, PhD., EnP.
Republic of the Philippines
NUEVA VIZCAYA STATE UNIVERSITY
Bayombong, Nueva Vizcaya
INSTRUCTIONAL MODULE
IM No.:IM- BE 14- 1STSEM-2021-2022

Note: That the absolute values of the deviations are taken (Cabiles N. , 2013).

Illustration:

Σ |𝑋 − ̅̅̅
𝑋| |85 − 83| + |80 − 83| + |83 − 83| + |81 − 83| + |86 − 83|
𝑀𝐷𝐴 = =
𝑛 5
̅̅̅
Σ |𝑋 − 𝑋| |75 − 83| + |90 − 83| + |77 − 83| + |88 − 83| + |85 − 83|
𝑀𝐷𝐵 = =
𝑛 5

Mean Deviations: MDA = 2, MDB = 5.6

With MDA <MDB, the Exam Scores of Section B are more widely dispersed than that of Section A.
Since the Exam Scores of Section B are widely dispersed, the Mean of Section B may not be that reliable
as a value of Central Tendency (Cabiles N. , 2013).

Variance – it is used to measure the dispersion of values relative to the mean. When values are close to
their mean (narrow range), the dispersion is less than when there is scattering over a wide range (Araneta,
2020). It gives the average of the squared deviations from the Mean. Similar to the Mean Deviation, but
instead of using the absolute deviations from the Mean, it uses the Squared Deviations from the Mean
(Cabiles N. , 2013).

Standard Deviation – it is most important measure of variation and the most frequently used measure of
dispersion. It is the most frequently used measure of dispersion (Araneta, 2020). The square root of the
Variance. It takes out the tendency of the Variance to be “bloated” due to the squaring of the deviations.
Note: The Variance & the Standard Deviation is always non-negative and will only assume the value of 0
if all values in the dataset are equal (Cabiles N. , 2013).

Population Variance

Σ(𝑋 − 𝜇)2
𝜎2 =
𝑁

where: 𝜎 2 = Variance
𝑋 = value of each observation
𝜇 = Population Mean
𝑁 = number of observations in the Population

Population Standard Deviation

Σ(𝑋 − 𝜇)2
𝜎= √
𝑁
where: 𝜎 = Standard Deviation

Sample Variance

Σ(𝑋 − 𝑋̅)2
𝑠2 =
𝑛−1
where: 𝑠 2 = Variance
𝑋 = value of each observation
In accordance with Section 185, Fair Use of Copyrighted Work of Republic Act 8293, the copyrighted works included in this material may be reproduced for educational
purposes only and not for commercial distribution
NVSU-FR-ICD-05-00 (081220) Page 7 of 5
Prepared by: Aljanet M. Jandoc, PhD., EnP.
Republic of the Philippines
NUEVA VIZCAYA STATE UNIVERSITY
Bayombong, Nueva Vizcaya
INSTRUCTIONAL MODULE
IM No.:IM- BE 14- 1STSEM-2021-2022
𝑋̅ 􀈑 = Sample Mean
𝑛 = number of observations in the Sample

Sample Standard Deviation

Σ(𝑋 − 𝑋̅)2
𝑠= √
𝑛−1
where: 𝑠 = Standard Deviation

The Population Variance and Standard Deviation vs the Sample Variance and the Standard
Deviation

1. The Population Parameters (i.e. Variance & Standard Deviation) use the Population Mean (𝜇), while the
Sample Statistics use the Sample Mean (X).
2. The Population Parameters use the total Number of Observations (N), while the Sample Statistics use
the Numbers of Observations in the Sample less 1 (i.e. n – 1). Since the Sample is only a representative
or portion of the entire Population, usage of n (the Number of Observations in the Sample) might
underestimate the Variance & the Standard Deviation. Deducting n by 1 addresses this issue (Cabiles N.
, 2013).

Illustration:
Suppose that the entire population of students is composed of Sections A & B with the given Scores Data
below (Cabiles N. , 2013):

Source: (Cabiles N. , 2013)

N = 10
Σ𝑋 (85 + 80 + 83 + 81 + 86 + 75 + 90 + 77 + 88 + 85)
𝜇= =
𝑁 10

𝜇 = 83

Σ(𝑋− 𝜇)2 (85−83)2 +⋯+(86−83)2 +(75−83)2 +⋯+(85−83)2


𝜎2 = =
𝑁 10

Σ(𝑋 − 𝜇)2
𝜎2 = 20.4 → 𝜎 = √ = 4.517
𝑁

Illustration:

Take Sections A & B as respective Samples of the entire Student Population:

nA = 5, 𝑋̅A = 83 (calculated previously, see Page 6)


nB = 5, 𝑋̅B = 83

In accordance with Section 185, Fair Use of Copyrighted Work of Republic Act 8293, the copyrighted works included in this material may be reproduced for educational
purposes only and not for commercial distribution
NVSU-FR-ICD-05-00 (081220) Page 8 of 5
Prepared by: Aljanet M. Jandoc, PhD., EnP.
Republic of the Philippines
NUEVA VIZCAYA STATE UNIVERSITY
Bayombong, Nueva Vizcaya
INSTRUCTIONAL MODULE
IM No.:IM- BE 14- 1STSEM-2021-2022
Σ(𝑋 − ̅)2
𝑋 (85 − 83)2 + (80 − 83)2 + (83 − 83)2 + (81 − 83)2 + (86 − 83)2
𝑠2 𝐴 = =
𝑛−1 5−1

Σ(𝑋 − 𝑋̅)2
𝑠 2 𝐴 = 6.5 → 𝑠𝐴 = √
𝑛−1
= 2.549

Illustration:

Take Sections A & B as respective Samples of the entire Student Population:


nA = 5, 𝑋̅A = 83 (calculated previously, see Slide 17)
nB = 5, 𝑋̅B = 83

Σ(𝑋 − 𝑋̅)2 (75 − 83)2 + (90 − 83)2 + (77 − 83)2 + (88 − 83)2 + (85 − 83)2
𝑠2 𝐵 = =
𝑛−1 5−1

Σ(𝑋 − 𝑋̅)2
𝑠 2 𝐵 = 44.5
𝑛−1
→ 𝑠𝐵 =
= 6.671 √

Note: Just as before, the Exam Scores of Section are more dispersed than that of Section A (Cabiles N. ,
2013).

Uses of the Standard Deviation

Empirical Rule/Normal Rule - For a symmetrical, bell-shaped or normal frequency distribution,


approximately 68% of the values will lie within ± 1-standard deviations from the Mean, approximately 95%
of the values will lie within ± 2-standard deviations from the Mean and about 99.7% or approximately all
values will lie within ± 3-standard deviations from the Mean (Cabiles N. , 2013).

Graphical Illustration 1:

Symmetrical Curve/Standard Normal Curve – bell shaped


Let standard deviation = 1 (on the X-axis)

In accordance with Section 185, Fair Use of Copyrighted Work of Republic Act 8293, the copyrighted works included in this material may be reproduced for educational
purposes only and not for commercial distribution
NVSU-FR-ICD-05-00 (081220) Page 9 of 5
Prepared by: Aljanet M. Jandoc, PhD., EnP.
Republic of the Philippines
NUEVA VIZCAYA STATE UNIVERSITY
Bayombong, Nueva Vizcaya
INSTRUCTIONAL MODULE
IM No.:IM- BE 14- 1STSEM-2021-2022
Graphical Illustration 2:

From the Empirical Rule: 68% of the values will lie within ± 1-standarddeviations from the Mean.
Graphical Illustration 3:

From the Empirical Rule: 95% of the values will lie within ± 2-standard deviations from the Mean.
Graphical Illustration 4:

From the Empirical Rule: 99.7% of the values will lie within ± 3- standard deviation.
Source: (Cabiles N. , 2013)

Coefficient of Variation – it is the measure of relative dispersion which expresses the standard deviation
as a percentage of the mean. It is used to compare two or more groups of values when units of
measurement differ or when the means differ markedly. It is used to compare two or more groups of values
when units of measurement differ or when the means differ markedly and it is always expressed in
percentage (%) (Araneta, 2020).

Measure of Skewness it describes the degree of departures of the distribution of the data from symmetry.
It indicates not only the amount of skewness but also the direction. The degree of skewness is measured
by the coefficient of skewness, denoted as SK. A distribution is said to be symmetric about the mean, if
the distribution to the left of mean is the “mirror image” of the distribution to the right of the mean (Araneta,
2020).

In accordance with Section 185, Fair Use of Copyrighted Work of Republic Act 8293, the copyrighted works included in this material may be reproduced for educational
purposes only and not for commercial distribution
NVSU-FR-ICD-05-00 (081220) Page 10 of 5
Prepared by: Aljanet M. Jandoc, PhD., EnP.
Republic of the Philippines
NUEVA VIZCAYA STATE UNIVERSITY
Bayombong, Nueva Vizcaya
INSTRUCTIONAL MODULE
IM No.:IM- BE 14- 1STSEM-2021-2022

Types of Skewness

Source: (Araneta, 2020)

Measures of Association and Correlation

Descriptive Study of Bivariate Data (Data on two variables) – it allows us to discover if any relationships
exist between the variables, and how strong the relationships appear to be (Araneta, 2020).

Coefficient of Association - numerical expression of the degree for which variables are in step or fluctuate
in relation to one another (Araneta, 2020).

Measures of association - procedures that yield coefficients of association (Araneta, 2020).

Strength of Association - one way of discussing variables’ association. The strong association between
two variables is gauged according to how closely coefficient of association approaches +1.00 (a perfect
positive association) or 1.00 (a perfect negative association). 0 coefficient of association means no
association (Araneta, 2020).

Measures of Association for Two Nominal Variables

Phi Coefficient 𝝓 – it is used when the data can be arranged into a 2 x 2 table and the variables are
naturally dichotomous and measured at the nominal level. The values range from 1 to 1. Example: Sex
(Male or Female); Residence type (Urban or Rural) (Araneta, 2020).

Data layout: 2 x 2 Contingency Table

Source: (Araneta, 2020)

Yule’s Q is another measure of strength of the association between two dichotomous. Like the Phi
coefficient, Q may assume any value between 1 and +1, inclusive (Araneta, 2020).

In accordance with Section 185, Fair Use of Copyrighted Work of Republic Act 8293, the copyrighted works included in this material may be reproduced for educational
purposes only and not for commercial distribution
NVSU-FR-ICD-05-00 (081220) Page 11 of 5
Prepared by: Aljanet M. Jandoc, PhD., EnP.
Republic of the Philippines
NUEVA VIZCAYA STATE UNIVERSITY
Bayombong, Nueva Vizcaya
INSTRUCTIONAL MODULE
IM No.:IM- BE 14- 1STSEM-2021-2022
Cramer’s V provides an appropriate measure of the strength of association between two categorical
variables yielding data that may be displayed in a contingency table of any size (Araneta, 2020).

Measures of Association for Two Ordinal Variables

Kendall’s Tau (Tk) is a useful measure of association between two sets of scores that been measured
according to ordinal scales. It is appropriate for data arranged into ranks and not in cross tabulated forms.

Assumptions:
• Samples are randomly selected
• Scores are measured according to ordinal scales
• n > 10, for proper application of the measure

The Spearman Rank Correlation Coefficient Rho (ρ)

Assumptions:
• The data consist of a random sample of n pairs of numeric observations.
• Each pair of observations represents two measurements taken on the same subject or individual,
called the unit of association.

Measures of Association for at least Two Interval Variables

The Pearson r is one of the known measure of association and perhaps the best, provided the assumptions
are satisfied.

Assumptions:
a. random samples are taken
b. variables are measured according to an interval scale
c. Pearson’s r measures linear association
d. two variables are normally distributed
e. homoscedasticity (equality of variances) must be satisfied

Interpretations: Pearson r

• r = +1, strong positive linear relationship


• r = 1, strong negative linear relationship
• values of r near 0, weak linear relationship
• r > 0, positively linearly correlated, meaning that y tends to increase linearly as x increases with
the tendency being greater the closer r is to 1
• r < 0, negatively linearly correlated, meaning that y tends to decrease linearly as x increases with
the tendency being greater the closer r is to 1.
• A zero-correlation coefficient does not mean that they are not associated; it may still indicate that
they are associated but not linear (Araneta, 2020).

Example:
Task: We want to determine whether sex and marital status are associated.

Source: (Araneta, 2020)

Interpretation:
In accordance with Section 185, Fair Use of Copyrighted Work of Republic Act 8293, the copyrighted works included in this material may be reproduced for educational
purposes only and not for commercial distribution
NVSU-FR-ICD-05-00 (081220) Page 12 of 5
Prepared by: Aljanet M. Jandoc, PhD., EnP.
Republic of the Philippines
NUEVA VIZCAYA STATE UNIVERSITY
Bayombong, Nueva Vizcaya
INSTRUCTIONAL MODULE
IM No.:IM- BE 14- 1STSEM-2021-2022
There is a moderately high relationship between marital status and sex with being male making it more
likely to be married and being female making it more likely to be single (Araneta, 2020).

VI. LEARNING ACTIVITIES (30 points)

Use the range rule of thumb to solve the problem.

1. Six college buddies bought each other Christmas gifts. They spent:
$236.88 $150.51 $154.55
$299.92 $290.97 $251.46
What was the mean amount spent? Round your answer to the nearest cent.
A) $264.86 B) $230.72 C) $346.07 D) $276.86

2. The number of vehicles passing through a bank drive-up line during each 15-minute period was
recorded. The results are shown below. Find the median number of vehicles going through the line in a
fifteen-minute period.
23 25 23 26
26 23 28 25
33 29 29 27
22 29 23 18
13 25 25 25
A) 26 B) 29 C) 25 D) 24.85

3. Find the mode(s) for the given sample data.


79, 25, 79, 13, 25, 29, 56, 79
A) 79 B) 48.1 C) 42.5 D) 25

VII. ASSIGNMENT (70 points)


Empirical Rule/Normal Rule Application:
1. Suppose the Population Mean Income of Call Center Agents is at Php25,000.00 and that the Population
Standard Deviation of Income among Call Center Agents is Php2,500.
2. Assume that the Frequency Distribution of Call Center Agents’ Income is Normal. What will the Income
Levels of 68% of the Population of Call Center Agents most likely be?

VIII. EVALUATION (Note: Not to be included in the student’s copy of the IM)
IX. REFERENCES
Araneta. (2020). 3 Webinars of Guide to Writing a Health Research Proposal Modules: Statisticsl Analysis.
Department of Science and Technology VI and Western Visayas Health Research and
Development Consortium. Iloilo: Department of Science and Technology VI and Western Visayas
Health Research and Development Consortium.

Cabiles, N. (2013). Statistics for Economists. Ateneo de Manila University, Quezon City.

Hanneman, R. A., Kposowa, A. J., & Riddle, M. D. (2013). Basic Statistics for Social Research. California,
US: Jossey-Bass A Wiley Imprint.

Kazmier, L. J. (2004). Schaum's Outline of Theory and Problem of Business Statistics (Fourth ed.). New
York, USA: Mc_Graw-Hill.

Lewis, M. (2012). Applied Statistics for Economists. New York, USA: Routledge Taylor & Francis Group.

Mathai, A. M., & Haubold, H. J. (2018). Probability and Statistics: A Course for Physicist and Engineers.
Germany: De Gruyter.

Spiegel, M. R., Schiller, J. J., & Srinivasan, R. A. (n.d.). Schaum's Outline of Probability and Statistics
(Third ed.). New York, USA: McGraw-Hill.

In accordance with Section 185, Fair Use of Copyrighted Work of Republic Act 8293, the copyrighted works included in this material may be reproduced for educational
purposes only and not for commercial distribution
NVSU-FR-ICD-05-00 (081220) Page 13 of 5
Prepared by: Aljanet M. Jandoc, PhD., EnP.

You might also like