You are on page 1of 34

Introduction

Statistics nowadays are very useful. It enables the researchers to easily find
the solutions to the problems, either personal or societal, interpret the results and
give the implications of these solutions to our everyday lives. Furthermore, these
results brought many improvements and inventions produced had good and
positive impact to the society.

Statistics may be used in education, politics, economics, and the like. With
this, it also gives us information on the trends in the society and helps us to
discover problems which may need an immediate solutions.
STATISTICS AND ITS IMPORTANCE
Key Concepts
• Statistics is a science that deals with the collection, organization,
analysis, and interpretation of data.
- Collection means gathering relevant information or data from the
population through survey, test, interview, experiment, etc.
- Organization or presentation refers to the systematic arrangement
of data into textual form, table, graph, or chart.
- Analysis is the careful examination of data and may be with the use
of statistical tool.
- Interpretation of data is making a generalization or conclusion from
the data that have been analyzed.

• Population - the group from which data are to be collected.


• Sample - a subset of a population.
• Variable - a feature characteristic of any member of a population differing
in quality or quantity from one member to another.
• Quantitative variable - a variable differing in quantity. For example, the
weight of a person, number of people in a car.
• Qualitative variable - a variable differing in quality or attribute. For
example, color, the degree of damage of a car in an accident.
• Discrete variable - a variable which no value may be assumed between
two given values, for example, number of children in a family. It is a
whole number, and are usually a count of objects.
• Continuous variable - a variable which any value may be assumed
between two given values, for example, the length and width of a
rectangular table is 3.5 meters by 1.75 meters.

Two Divisions of Statistics:


1. Descriptive Statistics:
Descriptive statistics deals with collection of data, its presentation in
various forms, such as tables, graphs and diagrams and findings averages
and other measures which would describe the data.
Example:
Industrial statistics, population statistics, trade statistics etc. and
businessmen use descriptive statistics in presenting their annual reports,
final accounts, bank statements.

2. Inferential Statistics:
Inferential statistics deals with techniques used for analysis of data,
making predictions, comparisons, and drawing conclusions about a
population using information gathered about a representative portion or
sample of that population.

Worktext: Mathematics in the Modern World 83


Example:
Suppose we want to have an idea about the percentage of indigents
in our country. We take a sample from the population and find the
proportion of indigents in the sample. This sample proportion with the help
of probability enables us to make some inferences about the population
proportion.

Importance of Statistics
Statistics plays a vital role in every field of human activity. Statistical
methods are useful tools in aiding researches and studies in different fields such
as education, economics, social sciences, business, health and many others. It
helps provide more critical analyses of information. Examples: (1) In Economics:
Economics largely depends upon statistics. National income accounts are
multipurpose indicators for the economists and administrators. Statistical
methods are used for preparation of these accounts. (2) In Natural and Social
Sciences: Statistical methods are commonly used for analyzing the experiments
results, testing their significance in Biology, Physics, Chemistry, Mathematics,
Meteorology, Research chambers of commerce, Sociology, Business, Public
Administration, Communication and Information Technology, etc…

MEASURES OF CENTRAL TENDENCY


A measure of central tendency or measure of central location is a summary
measure that describes a whole set of data. It is a single number that indicates
the center of a collection of data. The most commonly used measures of central
tendency are the mean, median, and mode.

A. Mean, Median and Mode of Ungrouped Data


MEAN (𝑥̅ )
The mean, also called as the “average” or arithmetic mean/average”, is
the most commonly used measure of central tendency. It is said to be the
most reliable measure of central tendency. To calculate mean, add all the
numbers in a set and then divide the sum by the total count of numbers.

Properties of Mean
1. A set of data has only one mean and does not have an outlier.
2. All values in the data set are included in computing the mean.
3. It is very useful in comparing two or more data sets.
4. It is affected by the extreme small or large values on a data set.
5. It is appropriate in symmetrical data.

𝑆𝑢𝑚 𝑜𝑓 𝑎𝑙𝑙 𝑣𝑎𝑙𝑢𝑒𝑠


Mean: (𝑥̅ ) =
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑣𝑎𝑙𝑢𝑒𝑠

Worktext: Mathematics in the Modern World 84


∑𝑥 ∑𝑥
Sample Mean: 𝑥̅ = Population Mean: 𝜇 =
𝑛 𝑁
Where:
𝑥̅ -sample mean (read as “x bar”)
𝜇 -population mean (read as “mu”)
𝑥 -the value of any particular observation or measurement
𝛴𝑥 -sum of all values
𝑛 -total number of values in the sample
𝑁 -total number of values in the population

Illustrative Examples:
1. Jean has been working part- time on a fast-food company. The following
numbers represent the number of hours Jean has worked on this fast-food
company for each of the past 8 months: 30, 45, 43, 60, 71, 82, 71, 83. What
is the mean (average) number of hours that Jean worked on this company?
Solution:
Step 1: Add the numbers to determine the total number of hours he
worked.
30 + 45 + 43 + 60 + 71 + 82 + 71 + 83 = 485
485
Step 2: Divide the total by the number of months. = 60.63 hours/month
8

The average number of hours John worked on the Website is


60.63hours/ month.

2. Joseph operates Technology Giant, a Website service that employs 8


people. Find the mean age (in years) of his workers if the ages of the
employees are as follows:
26, 23, 30, 25, 29, 33, 38, 35

Solution:
Step 1: Add the numbers to determine the total age of the workers.
26 + 23 + 30+ 25+ 29 + 33 + 38 + 35 = 239
Step 2: Divide the total by the number of workers
239
= 29.875 𝑜𝑟 30 𝑦𝑒𝑎𝑟𝑠
8

The average age of Joseph’s workers is 30 years old.

Weighted Mean/Average
The weighted mean/average is particularly useful when various classes or
groups contribute differently to the total.

Worktext: Mathematics in the Modern World 85


The weighted mean/average may be calculated by using the following
three-step procedure:
1. multiply each value by its corresponding weight;
2. find the sum of those products; and
3. divide that sum by the sum of the weights.

The following formula expresses the procedure:


∑ 𝑤𝑥 𝑤1 𝑥1 + 𝑤2 𝑥2 + ⋯ 𝑤𝑛 𝑥𝑛
𝑥̅ = =
∑𝑤 𝑤1 + 𝑤2 + ⋯ 𝑤𝑛
where 𝑤 represents the weight and 𝑥 represents the data value.

Illustrative Example:
Rena, a fourth year student majoring in mathematics took the following
courses with the corresponding units and grade during the first semester of the
school year. What is her average grade?

Course Title Unit Grade


The Teaching Profession 3 1.4
Field Study 5 1 1.3
Field Study 6 1 1.5
Special Topic 3 1 1.8
Calculus 1 3 1.7
Calculus 11 3 1.8
Seminar on Technology in Mathematics 3 1.2
Abstract Algebra 3 1.8
Mathematical Investigation and Modelling 3 1.7
TOTAL 21

Solution:
3(1.4) + 1(1.3) + 1(1.5) + 1(1.8) + 3(1.7) + 3(1.8) + 3(1.2) + 3(1.8) + 3(1.7)
𝑥̅ =
21
33.4
𝑥̅ =
21
𝑥̅ = 1.59

The weighted average grade of Rena is 1.59

MEDIAN (𝑥̃)
The median is the number that falls in the middle position after the data
has been organized either in ascending or descending order or array.

Worktext: Mathematics in the Modern World 86


Properties of Median
1. It is unique, there is only one median for a set of data.
2. It is found by arranging the set of data in ascending or descending
order and getting the value of the middle observation.
3. It is not affected by extreme values.

To determine the value of the median for ungrouped data, consider two rules.
1. If n is odd, the median is the middle ranked.
2. If n is even, then the median is the average of the two middle ranked
values.
𝑛+1
Median: 𝑥̃ = 2

Illustrative Examples:
1. Find the median of the following data:12, 3, 17, 8, 14, 10, 6
Solution:
Step 1: Organize the data in an array.
3, 6, 8, 10, 12, 14, 17

Step 2: Since the number of data values is odd, the median is the middle
most position. In this case, the median is the value that is found
in the fourth position of the data in an array.
3, 6, 8, 10 , 12, 14, 17

2. Find the median of the following data: 7, 9, 3, 4, 15, 2, 8, 6, 2, 4


Solution:
Step 1: Arrange the data in an array.
2, 2, 3, 4, 4, 6, 7, 8, 9, 15

Step 2: Since the number of data values is even, the median will be
𝑛+1
the mean value of the numbers found before and after the 2
position.
𝑛 + 1 10 + 1 11
= = = 5.5
2 2 2

Step 3: The number before the 5.5 position is 4 and the number after the
5.5 position is 6. Now, you need to find the mean value.
2, 2, 3, 4, 4, 6, 7, 8, 9, 15
4+6
=5
2
The median is 5.

Worktext: Mathematics in the Modern World 87


MODE (𝑥̂)
The mode (𝑥̂) is the value in a data set that appears most frequently.

Illustrative Examples:
1. Find the mode of the following data: 76, 81, 76, 80, 76, 83, 77, 79, 82, 76

Solution:
There is no need to organize the data in an array, unless you think
that it would be easier to locate the mode if the numbers are in an array. In
the above data set, the number 76 appears thrice, but all the other numbers
appear only once. Since 76 appears with the greatest frequency, it is the
mode of the data set.

2. The ages of 12 randomly selected customers at a local 7-Eleven listed


below:
21, 21, 29, 24, 31, 21, 27, 24, 24, 32, 33, 19
What is the mode of the above ages?

Solution:
The above data set has two values that each occur with a frequency of
3. These values have 2 modes 21, and 24 which is called bi-modal. All
other values occur only once.

3. The coach of a sports team begins to observe the color of t-shirts his
athletes wear. His goal is to find out what color is worn most frequently so
that he can offer a common color or uniform shirts to his athletes.
Monday: Green, Blue, Pink, White, Blue, and Blue
Tuesday: Blue, Red, Black, Pink, Green, and Blue
Wednesday: Orange, White, White, Blue, Blue, and Red
Thursday: Brown, Black, Brown, Blue, White, and Blue
Friday: Black, Blue, Red, Blue, Red, and Pink
What is the mode of the colors above?

Solution:
The color blue was worn 11 times during the week. All other colors
were worn with much less frequency in comparison to the color blue.

The owner can offer a blue shirt for his employees.

Worktext: Mathematics in the Modern World 88


Name: Date:
Program and Section: Score:

Try this!

Direction: Answer the following.

A. Solve for the mean, median and mode of the following data set and
interpret the results

1. 54, 50, 54, 55, 56, 57,57, 58, 58, 60, 68


2. 45, 48, 52, 46, 41, 26, 36, 34, 38, 41, 39, 38, 30, 49, 46, 55
3. 154, 133, 232, 267, 289, 274, 321, 348, 188, 439

B. Ben and his friends are comparing the number of times they have been to
the movies in the past year. The table below illustrates how many times
each person went to the movie theatre in each month.

Jan Feb Mar Apr May June July Aug Sept Oct Nov Dec

Ben 3 3 2 5 2 3 2 4 2 3 2 2

John 3 2 1 1 1 3 3 3 2 4 1 2

Matthew 1 3 3 2 1 4 5 3 2 2 2 3

Rose 2 2 2 1 3 2 4 1 3 2 3 3

1. By comparing modes, who among the friends has gone to the movies
the least per month?

2. By comparing medians, who among the friends has gone to the most
per month?

3. Rank the friends in the order of most movies seen to least movies
seen by comparing their means.

4. By comparing the means of movies seen in each month, what month


is the most popular movie-watching month?

5. By comparing medians, which month is the least popular movie-


watching month?

6. What is the mean of the medians for each month (the arithmetic
average of the medians of the number of movies seen in each
month)?

Worktext: Mathematics in the Modern World 89


Definition of Terms
Raw Data is the data collected in original form

Range is the difference between the highest value and the lowest value in the
distribution.

Frequency Distribution Table is the organization of data in a tabular form,


using mutually exclusive classes showing the frequency or count of
the occurrences of values in the sample.

Class Interval/width/size (i) is the distance between the class lower limit and
the class upper limit.

Class Limit is the smallest and largest observation (data, events etc.) in each
class. Hence, each class has two limits: a lower and an upper limit.

Class Boundary or True Class Limit. It is 0.5 more of an upper class limit
and 0.5 less of a lower class limit. Therefore, each class has an upper
and lower class boundary or true upper and true lower class limit.

Midpoint or Class Mark (X) is found by adding the upper and lower class
limits of any class and dividing the sum by 2

Frequency (𝑓) is the number of values in a specific class of a frequency


distribution table.

Cumulative Frequency (𝑐𝑓) – is the sum of the frequencies accumulated up


to the upper boundary of a class in frequency distribution table.

Frequency Distribution Table


Illustrative Example:
1. Construct a frequency distribution table for the following total scores in the
1st Quarter Quizzes in a Mathematics class.
118, 123, 128, 129, 130, 130, 133, 124, 125, 127, 136, 138,
141, 141, 149, 154, 150
Solution:
The following steps are involved in the construction of a frequency
distribution.

1. Decide the approximate number of classes in which the data are to be


grouped. There are no hard and first rules for number of classes. In
most cases we have 5 to 20 classes. H.A. Sturges provides a formula
for determining the approximation number of classes.
K =1+.3.322 log N, where = Number of classes,
N = the total number of observations

Worktext: Mathematics in the Modern World 90


K =1+.3.322 log 17
K = 5.09
K≈5

2. Find the range of the data. The range is the difference between the
largest and the smallest value.
Range ( R ) = R = 154 – 118 = 36

3. Determine the approximate class interval/width/size (i). The class


interval is obtained by dividing the range by the number of classes.
𝑅𝑎𝑛𝑔𝑒
𝑖 = 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝐶𝑙𝑎𝑠𝑠𝑒𝑠
36
Class size, 𝑖 = = 7.2
5

In the case of fractional results, the next higher whole number is


taken as the size of the class interval.
Class size (𝑖) = 7.2 becomes 8

4. Decide the starting point. The lower class limit should cover the
smallest value in the raw data. Write down your lowest value for your
first minimum data value. The lowest value is 118.

5. Determine the remaining class limit .When the lowest class limit has
been determined, then adding the class width/size to the lower class
limit (118 + 8 = 126) the next lower class limit is found. The remaining
lower class limits may be determined by adding the class size
repeatedly until the largest value of the data is observed in the class.
You can compute the upper class limit by subtracting one from the
class width and add that to the minimum data value. For example: 118
+ (8 – 1) = 125
or
118 – 125 150-157
126 – 133 142-149
134 – 141 134 – 141
142 – 149 126 – 133
150 – 157 118 – 125

Tally the observations or scores in each class, and determine the


frequency. The total of the frequency must be equal to the number of
observations. The scores are:
118, 123, 128, 129, 130, 130, 133, 124, 125, 127, 136, 138, 141, 141,
149, 154, 150

Worktext: Mathematics in the Modern World 91


Frequency Distribution Table
Score Tally Frequency (f)
118-125 IIII 4
126-133 IIII – I 6
134-141 IIII 4
142-149 I 1
150-157 II 2
Total 17
By using the frequency distribution table above:
a) What are the lower and the upper class limits of the first two classes?
For the first class 118 – 125 the lower class limit is 118 and the
upper class limit is 125. For the second class, 126-133, the lower
class limit is 126 and the upper class limit is 133.

b) What are the true class limits/class boundaries of the first two
classes?
For the first class 118 – 125, the lower class boundary is
118 – 0.5 = 117.5, and for the second class, 126-133, it is
126 – 0.5 = 125.5

While the true upper limits or upper class boundaries for the first
class 118 – 125, the true upper limit or upper class boundary is
125 + 0.5 = 125.5, and for the second class, 126-133, it is
133 + 0.5 =133.5

c) What is the class interval/width/size?


Class Interval/width/size is the distance between the class lower
limit and the class upper limit. It can be obtained by getting the
difference of the two lower limits or upper limits of two succeeding
classes
For the two succeeding classes: 118-125
126-133
The class width is 126-118 = 8 or 133-125 = 8

d) Find the class midpoint or class mark of the first class.


For the first class, 118-125, the class midpoint,
𝐿𝑜𝑤𝑒𝑟 𝑐𝑙𝑎𝑠𝑠 𝑙𝑖𝑚𝑖𝑡 + 𝑈𝑝𝑝𝑒𝑟 𝑐𝑙𝑎𝑠𝑠 𝑙𝑖𝑚𝑖𝑡
𝑋=
2
118 + 125
𝑋=
2
𝑋 = 121.5

Worktext: Mathematics in the Modern World 92


B. Mean, Median, Mode of Grouped Data
MEAN (𝑥̅ ) OF GROUPED DATA
Steps in computing the mean of grouped data:
a. Find the midpoint/class mark (𝑋) of each class.
b. Multiply the frequency (f) of each class by its midpoint (𝑋) to get 𝑓𝑋.
c. Find the sum of 𝑓𝑋
d. Find the sum of all the frequencies (𝑛)
e. Divide the sum 𝑓𝑋 by the sum of the frequencies.

Formula for the Mean (𝑥̅ ) of Grouped Data


∑ 𝑓𝑋
𝑥̅ =
𝑛

Illustrative Example:
Nowadays, most people spend their leisure time in facebook. Fifty-eight
students in a class recorded the time they spent in facebook during their free
time. The frequency distribution table data below shows the number of minutes
they spent in facebook.

Classes 𝑓 𝑋 𝑓𝑋
5–9 1 7 7
10 – 14 2 12 24
15 – 19 2 17 34
20 – 24 6 22 132
25 – 29 7 27 189
30 – 34 10 32 320
35 – 39 30 37 1 110
𝛴𝑓 = n = 58 𝛴𝑓𝑋 = 1 816
Solution: Applying the formula,
∑ 𝑓𝑋 1 816
𝑥̅ = = = 31.31
𝑛 58

Therefore, 31.31 minutes spent on facebook in the mean of 58


students.

MEDIAN (𝒙 ̃) OF GROUPED DATA


Steps in computing the median for grouped data
a. Compute the less than cumulative frequency (< 𝑐𝑓) of the data. The
less than cumulative frequency (< 𝑐𝑓 ) is obtained by adding the
frequencies successively starting from the lowest class.
𝑛
b. Determine the median class by computing the value of 2 .
c. Determine the value of the cumulative frequency before the median
class (𝑐𝑓𝑥̃).

Worktext: Mathematics in the Modern World 93


d. Determine the true class limit L𝑥̃ of the median class
e. Determine the class width.
f. Apply the formula.

Formula of Median of Grouped Data:


𝑛
−< 𝑐𝑓
𝑥̃ = 𝐿𝑥̃ + (2 )𝑖
𝑓𝑚
(Using the same data in the previous lesson). The data shows the time
spent of 43 students in studying during examination in their math course.
Find the median.
Classes 𝑓 < 𝑐𝑓
25 – 29 3 3
30 – 34 2 5
35 – 39 5 10
40 – 44 8 18
45 – 49 Median class 8 26
50 – 54 8 34
55 – 59 9 43

Solution: Follow the steps in determining the median for grouped data.
𝑛 43
a) 2 = 2 = 21.5

b) The cumulative frequency before the median class (𝑐𝑓𝑥̃) is 18. (If
the classes are arranged in ascending order, the word before refers
to the cumulative frequency less than the frequency of the median
class)
c) The frequency of the median class (𝑓𝑚 ) is 8.
d) The true class limit L𝑥̃ of the median class L𝑥̃ = 45 – 0.5 , L𝑥̃ = 44.5
Determine the class width, 𝑖= 5.
𝑛
−< 𝑐𝑓
𝑥̃ = 𝐿𝑥̃ + (2 )𝑖
𝑓𝑚

43
− 18
𝑥̃ = 44.5 + ( 2 )5
8

21.5 − 18
𝑥̃ = 44.5 + ( )5
8
𝑥̃ = 44.5 + 2.1875
𝑥̃ = 46.69

The median time spent by the student in studying is 46. 69 minutes.

Worktext: Mathematics in the Modern World 94


MODE (𝒙̂) OF GROUPED DATA
Steps in computing the mode for grouped data:
a. Identify the modal class by determining the class with the highest
frequency.
b. Determine the true lower limit or class boundary (L𝒙 ̂) of the modal
class.
c. Calculate 𝑑1 , the difference of the frequency of the modal class and
the frequency of the class preceding (1 class lower in value from
the modal class) the modal class.
d. Calculate 𝑑2 , the difference of the frequency of the modal class
and the frequency of the class succeeding (1 class higher in value
from the modal class) the modal class.
e. Determine the class width/size (𝑖)
f. Substitute the values in the formula.
Formula of Mode of Grouped Data:
𝑑1
𝑥̂ = 𝐿𝑥̂ + ( )𝑖
𝑑1 + 𝑑2
Illustrative Example:
The data show the time spent of 43 students in studying during
examination in their math course. Find the mode and interpret the result.
Classes 𝒇
25 – 29 3
30 – 34 2
35 – 39 5
40 – 44 7
45 – 49 6
50 – 54 8
55 – 59 modal class 9
Solution:
a) 55-59 is the modal class
b) The lower boundary of the modal class is, Lx̂ = 55 − 0.5 = 54.5
c) 𝑑1 = 9 - 8 , 𝑑1 = 1
d) 𝑑2 = 9 – 0, 𝑑2 = 9
e) 𝑖 = 5
f) Substitute the values in the formula.
𝑑1
𝑥̂ = 𝐿𝑥̂ + ( )𝑖
𝑑1 + 𝑑2
1
𝑥̂ = 54.5 + ( )5
1+9
𝑥̂ = 54.5 + 0.5 = 55
Therefore, most students spent 55 minutes in studying during their
exam in math.
Note: The above formula for finding the exact mode for grouped data applies only for
uni-modal distribution.

Worktext: Mathematics in the Modern World 95


OTHER MEASURES OF RELATIVE POSITION
Measures of relative position or location also called quantiles are used to
partition or divide an ordered (array) data set into equal parts like the median.
The common measures of relative position are the quartiles, deciles, and
percentiles.

Median divides the ordered data into 2 equal parts while Quartiles divide a
data set into four equal parts. The three quartiles: Quartile 1 (Q 1) also called the
lower quartile is the value that below which 25% of the data lie; Quartile 2 (Q2)
that is equivalent to the median is the value that below which 50% of the data lie,
and Q3 also called the upper quartile is the value that below which 75% or three-
fourths of the data lie.

Deciles divide the array data set into ten equal parts and there are 9
deciles, denoted by D1, D2, …, D9. The Decile 1 or D1 is the value that below
which 10% of the data lie.

Percentiles divide the array data set into one hundred equal parts. There
are 99 percentiles, denoted by P1, P2, …, P99. The Percentile 1 or P1 is the value
that below which 1% of the data lie.

Interquartile range (IQR)= 𝑈𝑝𝑝𝑒𝑟 𝑞𝑢𝑎𝑟𝑡𝑖𝑙𝑒 – 𝑙𝑜𝑤𝑒𝑟 𝑞𝑢𝑎𝑟𝑡𝑖𝑙𝑒


Note: A quantile is a number or cut-off, and not a range of values.

The figure below illustrates the relationship of the quantiles in a given distribution.

Q1 = P25
Q2 = P50 = D5 = 𝑥̃
Q3 = P75

The formulas are as follows:


Ungrouped
Grouped Data Notation
Data
Quartile 𝑘(𝑁 + 1) 𝑘𝑁 Where:
𝑄𝑘 = −< 𝐶𝐹𝑏
4 𝑄𝑘 = 𝐿𝐵𝑄𝑘 + ( 4 )𝑖 𝑄𝑘 -𝑘𝑡ℎ quartile
𝑐𝑓𝑄𝑘
𝐿𝐵𝑄𝑘 -lower class
boundary of the 𝑘𝑡ℎ
or quartile
< 𝐶𝐹𝑏 -less than
𝑛 cumulative
−< 𝑐𝑓
𝑄𝑘 = 𝐿 + (4 )𝑖 frequency below the
𝑓𝑚 𝑘𝑡ℎ quartile class
𝑐𝑓𝑄𝑘 -frequency of the 𝑘𝑡ℎ
quartile class

Worktext: Mathematics in the Modern World 96


Ungrouped Grouped Data Notation
Data
Decile 𝑘(𝑁 + 1) 𝑘𝑁 Where:
𝐷𝑘 = −< 𝐶𝐹𝑏
10 𝐷𝑘 = 𝐿𝐵𝐷𝑘 + ( 10 )𝑖 𝐷𝑘 𝑘𝑡ℎ decile
𝑐𝑓𝐷𝑘
𝐿𝐵𝐷𝑘 -lower class
boundary of the
𝑘𝑡ℎ decile
< 𝐶𝐹𝑏 -less than
cumulative frequency
below the 𝑘𝑡ℎ decile
class
𝑐𝑓𝐷𝑘 -frequency of the 𝑘𝑡ℎ
decile class
Percentile 𝑘(𝑁 + 1) 𝑘𝑁 Where:
𝑃𝑘 = 100 −< 𝐶𝐹𝑏
100 𝑃𝑘 = 𝐿𝐵𝑃𝑘 +( )𝑖 𝑃𝑘 -𝑘𝑡ℎ percentile
𝑐𝑓𝑃𝑘
𝐿𝐵𝑃𝑘 -lower class
boundary of the
𝑘𝑡ℎ percentile
< 𝐶𝐹𝑏 -less than
cumulative
frequency below
the 𝑘𝑡ℎ percentile
class
𝑐𝑓𝑃𝑘 -frequency of the
𝑘𝑡ℎ percentile
class

Illustrative Example:
The monthly salary in pesos of 16 DepEd Elementary teachers are
as follows:
Teacher Salary Teacher Salary
1 30,531 9 22,216
2 32,469 10 32,072
3 36,942 11 21,038
4 20,754 12 21,038
5 23,222 13 21,038
6 21,327 14 20,754
7 37,400 15 20,754
8 45,269 16 20,754

Find the a) lower quartile (Q1), b) 7th decile, and c) 30th percentile.

Worktext: Mathematics in the Modern World 97


Solution:
First, arrange the observation in an array.
16 45,269
15 37,400
14 36,942
13 32,469
12 32,072
11 30,531 D7
10 23,222
9 22,216
8 21,327
7 21,038
6 21,038
5 21,038 P30
4 20,754 Q1
3 20,754
2 20,754
1 20,754

a) lower quartile (Q1)


Substitute the values in the formula:
𝑘(𝑁 + 1)
𝑄𝑘 =
4
1(16 + 1)
𝑄1 =
4
17
𝑄1 = = 4.25
4

The 4th observation or item in the table is Php 20 754. Therefore, 25%
of the 16 DepEd Elementary teachers have salaries that are below or lower
than Php 20 754.

b) 7th decile
Using the formula:
𝑘(𝑁 + 1)
𝐷𝑘 =
10
7(16 + 1) 7(17) 119
𝐷7 = = =
10 10 10
𝐷7 = 11.9
th
The 11 observation or item which is Php 30 531 shows that 70% of
the 16 DepEd Elementary teachers have salaries that are below or lower
than Php 30 531.

Worktext: Mathematics in the Modern World 98


c) 30th percentile
Using the formula:
𝑘(𝑁 + 1)
𝑃𝑘 =
100
30(16 + 1)
P30 =
100
30(17) 510
P30 = =
100 100
P30 = 5.10

The 5th observation or item which is Php 21 038 implies that 30% of
the 16 DepEd Elementary teachers have salaries that are below or lower
than Php 21 038.

QUANTILES FOR GROUPED DATA


Illustrative Examples:
The data show the time spent in Facebook by 43 students.
Find: (a) Quartile 1 and (b) Decile 2 and (c) Percentile 52.
Class interval 𝑓 < 𝑐𝑓
25 – 29 3 3
30 – 34 2 5
Decile 2 35 – 39 5 10
Quartile 1 40 – 44 8 18
Percentile 52 45 – 49 8 26
50 – 54 8 34
55 – 59 9 43
Solution:
a) Solving for the 1st quartile, Q1
Quartile 1 is one-fourth (or 25%) of the data falls on or below
𝑛 1(𝑛)
Q1, replace by 0.25n or in the formula of the median.
2 4
1(𝑛)
First solve 0.25n or and locate in the column for < 𝑐𝑓 the
4
1(𝑛) 43
location of Q1. So, = =10.75
4 4
𝑛
0.25𝑛−<𝑐𝑓 −<𝑐𝑓
𝑄1 = 𝐿 + ( )𝑖 or 𝑄𝑘 = 𝐿 + ( 4 )𝑖
𝑓𝑚 𝑓𝑚
0.25(43)−10
𝑄1 = 39.5 + ( )5
8
10.75−10
𝑄1 = 39.5 + ( )5
8

𝑄1 = 39.5 + 0.46875
𝑄1 = 39.97
Interpretation: 25% of the students spent 39.97 or about 40 minutes or below
in using Facebook.
Worktext: Mathematics in the Modern World 99
b) Solving for the Decile 2:
Decile 2 is two-tenths (or 20%) of the data falls on or below D2.
2(𝑛)
First solve 0.20n or and locate in the column for <cf the
10
2(𝑛) 2(43)
location of D2, so, = =8.6
10 10
2𝑛
0.20𝑛−<𝑐𝑓 −<𝑐𝑓
10
D2 = 𝐿 + ( )𝑖 or D2 = 𝐿 + ( )𝑖
𝑓𝑚 𝑓𝑚

0.20(43) − 5
D2 = 34.5 + ( )5
5
8.6) − 5
D2 = 34.5 + ( )5
5
D2 = 34.5 + 3.6
D2 = 38.1
Interpretation: 20% of the students spent 38.1 or about 38 minutes
in Facebook.

c) Solving for the Percentile 52 (P52)


In Percentile 52, 52% of the data falls on or below the P52.
52(𝑛)
First solve 0.52n or and locate in the column for <cf the
100
52(𝑛) 52(43)
location of P52, so, = =22.36
100 100

52𝑛
0.52𝑛−<𝑐𝑓 −<𝑐𝑓
100
P52 = 𝐿 + ( )𝑖 or P52 = 𝐿 + ( )𝑖
𝑓𝑚 𝑓𝑚

0.52(43) − 18
P52 = 44.5 + ( )5
8
P52 = 44.5 + 2.73
P52 = 47.23
Interpretation: 52% of the students spent 47.23 or about 47
minutes in Facebook.

Worktext: Mathematics in the Modern World 100


Name: Date:
Program and Section: Score:

Try this!

Direction: Answer the following.

A. A supermart recorded the time in minutes of a sample of 28 customers


stayed in the store.
30 12 24 12 16 32 8 24 23 26 18 24 23 26
28 18 22 42 36 26 12 24 18 30 12 24 18 30

Construct a frequency distribution table with less than cumulative


frequency and determine the class marks of the classes.

B. Answer the following questions:


1. What decile is equivalent to percentile 70?

2. What quartile is equivalent to percentile 75?

3. What percentile is equivalent to decile 4?

4. What percentile is equivalent to the median?

5. What percentile is equivalent to quartile 1?

C. Find the, lower quartile, upper quartile, interquartile range, decile 3, and
percentile 49 of the following scores of students in a Science 50-item
summative test: 15, 42, 38, 12, 6, 22, 31, 7, 36, 14, 41, 15, 50, 27 , 65

Worktext: Mathematics in the Modern World 101


Name: Date:
Program and Section: Score:

D. The data below show the age distribution of residents in 7 th street of


South Hill Subdivision: Compute the (1) Mean, (2) Median, (3) Mode,
(4) Percentile 22, (5) Quartile 3 and (6) Decile 8. Interpret the results.

𝑓 < 𝑐𝑓
25 –
Class interval
29 3 3
30 – 34 2 5
35 – 39 5 10
40 – 44 8 18
45 – 49 8 26
50 – 54 8 34
55 – 59 9 43

Worktext: Mathematics in the Modern World 102


MEASURES OF VARIABILITY
Measures of Variation or Dispersion refers to how clustered or spread out
the values/observations of the distribution from the mean of the distribution.

When the measure of variability is large, the values are widely scattered;
when it is small the values are tightly clustered.

There are four measures of variability that will be discussed in this lesson:
the range, mean absolute deviation, variance, and standard deviation.

MEASURES OF VARIABILITY OF UNGROUPED DATA


Range (R) is the simplest measure of variability. It is the difference between
the highest and lowest score in a distribution.
Range (R) = Highest value (H) – Lowest value (L)

Although it is easy to compute, it is not often used as the sole


measure of variability due to its instability. Because it is based only on the
most extreme scores in the distribution and does not fully reflect the pattern
of variation within a distribution.

Illustrative Example:
The data below are scores in a 20-item quiz in English of Grade 8 boys
and girls. Find the range of the scores of the two groups. Solve also for the
mean.
Girls’ scores
Girls’ Boys’
8 3
9 7
11 10
12 7
10 13
9 13 Boys’ scores
10 17
11 10
10 10

Solution:
Range (R) = Highest value (H) – Lowest value (L)
R=H-L R=H-L
Rgirls = 12-8 Rboys = 17-3
Rgirls = 4 Rboys = 14
Interpretation: The girls’ scores are more clustered in the mean than the of
boys’.

Worktext: Mathematics in the Modern World 103


The Mean Absolute Deviation (MAD)
Mean Absolute Deviation (MAD) – is the arithmetic mean or
average of the absolute deviations from the mean.
∑|𝑥 − 𝑥̅ |
𝑀𝐴𝐷 =
𝑛

where: 𝑥 = raw score


∑|𝑥 − 𝜇|
𝑀𝐴𝐷 =
𝑁

Steps in Computing the MAD:


1. Solve the mean of the observations/measurements.
2. Get the absolute value of the difference between each observation and
the mean (|𝑥 − 𝑥̅ |).
3. Find the sum of all the absolute value of the differences.
4. Divide the sum obtained in Step 3 by the number of observations to find
the MAD

Illustrative Examples:
Find the mean absolute deviation (MAD) of the girls’ and boys’ scores
in the previous example and interpret.
a. Find the MAD of the scores of girls.
Score (𝑥) Mean (𝑥̅ ) |𝑥 − 𝑥̅ |
8 10 2
9 10 1
11 10 1
12 10 2
10 10 0
9 10 1
10 10 0
11 10 1
10 10 0
∑𝑥 = 90 ∑|𝑥 − 𝑥̅ | = 8

Solution: MAD of girl’s scores


∑|𝑥 − 𝑥̅ |
𝑀𝐴𝐷 =
𝑛
8
𝑀𝐴𝐷 =
9
𝑀𝐴𝐷 = 0.89

Worktext: Mathematics in the Modern World 104


b. Find the MAD of the scores of boys.
Score (𝑥) Mean (𝑥̅ ) |𝑥 − 𝑥̅ |
3 10 7
7 10 3
10 10 0
7 10 3
13 10 3
13 10 3
17 10 7
10 10 0
10 10 0
∑𝑥 = 90 ∑|𝑥 − 𝑥̅ | = 8

Solution: MAD of boy’s scores


∑|𝑥 − 𝑥̅ |
𝑀𝐴𝐷 =
𝑛
26
𝑀𝐴𝐷 =
9
𝑀𝐴𝐷 = 2.89
Interpretation: The MAD of the girls and boys is 0.89 and 2.89,
respectively. This shows the same result as in the range, that the girls’
scores are more clustered than the boys’ scores. It implies that girls’
scores are more homogenous while the boys’ scores are quite
heterogenous.

The Variance and the Standard Deviation of Ungrouped Data


The variance indicates to what degree the individual observations of a data
set are dispersed or 'spread out' around their mean. The square root of
the variance gives the standard deviation or squaring the standard deviation will
give the variance.

Variance is the arithmetic mean or average of the squared


deviation of the mean.

Population Variance Sample Variance


2
∑(𝑥 − 𝜇)2 2
∑(𝑥 − 𝑥̅ )2
𝜎 = 𝑠 =
𝑁 𝑛−1

Standard Deviation is the square root of the variance (𝜎 = √𝜎 2 ).

Worktext: Mathematics in the Modern World 105


Steps in Computing for the Variance:
1. Find the mean.
2. Get the difference between each observation and the mean, and square
each difference.
3. Find the sum of all the squared deviations.
4. Divide the sum of all the squared deviations by the number of
observations.

Illustrative Examples:
Find the variance and standard deviation of the scores of the girls and boys.

a. Girls 𝑥̅ = 10
Score (𝑥) 𝑥 − 𝑥̅ (𝑥 − 𝑥̅ )2
8 −2 4
9 −1 1
11 1 1
12 2 4
10 0 0
9 −1 1
10 0 0
11 1 1
10 0 0
∑(𝑥 − 𝑥̅ )2 = 12

Solution:
∑(𝑥 − 𝑥̅ )2
𝑠2 =
𝑛−1
12
𝑠2 =
9−1
12
𝑠2 =
8
𝑠 2 𝑔𝑖𝑟𝑙𝑠 = 1.5 variance of the girl’s scores
𝑠 = √15
𝑠 = 1.22 standard deviation of the girl’s score

Worktext: Mathematics in the Modern World 106


b. Boys 𝑥̅ = 10
Score (𝑥) 𝑥 − 𝑥̅ (𝑥 − 𝑥̅ )2
3 −7 49
7 −4 9
10 0 0
7 −3 9
13 3 9
13 3 9
17 7 49
10 0 0
10 0 0
∑(𝑥 − 𝑥̅ )2 = 134

Solution:
∑(𝑥 − 𝑥̅ )2
𝑠2 =
𝑛−1
134
𝑠2 =
9−1
134
𝑠2 =
8
𝑠 2 𝑏𝑜𝑦𝑠 = 16.75. variance of the boy’s scores
𝑠 = √16.75
𝑠 = 4.09 standard deviation of the boy’s score

Direct from the scores Method of Solving the Variance and Standard Deviation
Formulas of Variance and Standard Deviation (Direct from the scores Method)
(∑ 𝑥)2 (∑ 𝑥)2
∑ 𝑥2− ∑ 𝑥2−
2 𝑁 2 𝑛
Population: 𝜎 = Sample: 𝑠 =
𝑁 𝑛−1

where,
𝜎 2 (sigma-squared) –variance for population
𝑠 2 –variance for a sample
𝜎 –standard deviation for population
s –standard deviation for sample
𝑥 – value of each observation or measurement

Illustrative Examples:
Solve for the variance and standard deviation of the boys’ and girls’ scores
using the same data in the previous example.

Worktext: Mathematics in the Modern World 107


1. Girls’ scores
Score (𝑥) 𝑥2
8 64
9 81
11 121
12 144
10 100
9 81
10 100
11 121
10 100
∑ 𝑥 = 90 ∑ 𝑥 2 = 912

Solution:
(∑ 𝑥)2 (90)2
∑ 𝑥2 − 912 −
𝑠2 = 𝑛 = 9
𝑛−1 9−1
8100
912 − 9 912 − 900 12
2
𝑠 = = =
8 8 8
𝑠 2 = 1.5 variance of the girls’ scores
𝑠 = 1.22 standard deviation of the girls’ scores

2. Boys’ scores
Score (𝑥) 𝑥2
3 9
7 49
10 100
7 49
13 169
13 169
17 289
10 100
10 100
∑ 𝑥 = 90 ∑ 𝑥 2 = 1034

Solution:
(∑ 𝑥)2 (90)2
∑ 𝑥2 − 1034 −
𝑠2 = 𝑛 = 9
𝑛−1 9−1
8100
1034 − 9 1034 − 900 134
2
𝑠 = = =
8 8 8
𝑠 2 = 16.75 variance of the boys’ scores
𝑠 = 4.09 standard deviation of the boys’ scores

Interpretation: The same results were obtained using the two formulas, girls’
scores are more clustered than boys’ scores.
Worktext: Mathematics in the Modern World 108
MEASURES OF VARIABILITY OF GROUPED DATA
The range (R) of grouped data in a frequency distribution may be
determined either by:
Method 1. Finding the difference between the true upper limit (UL)
and the true lower limit (LL).
𝑅 = 𝑈𝐿 − 𝐿𝐿
Method 2. Finding the difference between the midpoints or class
marks (X)
𝑅 = 𝑋𝐻 − 𝑋𝐿

Illustrative Example:
The data below shows the monthly electrical consumption (in kwh) of 100
households. Find the range.

Monthly electrical
𝑓
consumption (kwh)
40 – 44 4
45 – 49 11
50 – 54 20
55 – 59 31
60 – 64 19
65 – 69 11
70 - 74 4
100

Solution:
Method 1: Method 2:
𝑅 = 𝑈𝐿 – 𝐿𝐿 𝑅 = 𝑋𝐻 – 𝑋𝐿
𝑅 = 74.5 – 39.5 𝑅 = 72 − 42
𝑅 = 35 𝑅 = 30
70+74
𝑋𝐻 (midpoint of the highest class, 70 – 74) = = 72
2
40+44
𝑋𝐿 (midpoint of the lowest class, 40 – 44) = = 42
2

The Mean Absolute Deviation of Grouped Data


Formula for Sample data Formula for Population
∑ 𝑓|𝑋−𝑥̅ | ∑ 𝑓|𝑋−𝜇|
𝑀𝐴𝐷 = 𝑀𝐴𝐷 =
𝑛−1 𝑁

Illustrative Example:
Solve the MAD of the sample monthly electrical consumption (kwh) of
100 households.

Worktext: Mathematics in the Modern World 109


Monthly
electrical
𝑓 Midpoint (𝑋) 𝑓𝑋 |𝑋 − 𝑥̅ | 𝑓 |𝑋 − 𝑥̅ |
consumption
(kwh)
40 – 44 4 42 168 14.95 59.8
45 – 49 11 47 517 9.95 109.45
50 – 54 20 52 1 040 4.95 99
55 – 59 31 57 1 767 0.05 1.55
60 – 64 19 62 1 178 5.05 95.95
65 – 69 11 67 737 10.05 110.55
70 – 74 4 72 288 15.05 60.2
100 5 695 536.5

Solution:
a) First is to solve for the mean:
∑ 𝑓𝑋
𝑥̅ =
𝑛
5 695
𝑥̅ =
100
𝑥̅ = 56.95

b) Solve for the MAD


∑ 𝑓 |𝑋 − 𝑥̅ |
𝑀𝐴𝐷 =
𝑛−1
536.50 536.50
𝑀𝐴𝐷 = =
100 − 1 99

𝑀𝐴𝐷 = 5.42
The mean absolute deviation of the monthly electrical consumption is
5.42 kwh.

The Variance and Standard Deviation of Grouped Data


Population Standard Deviation Sample Standard Deviation
2 2
𝑓𝑋 2 −(∑ 𝑓𝑋) 𝑓𝑋 2 −
(∑ 𝑓𝑋)

𝜎=√ 𝑁
𝑠=√ 𝑛
𝑁 𝑛−1

where,
𝑓 –frequency
𝑋 –class marks or midpoint

Worktext: Mathematics in the Modern World 110


Illustrative Example:
Monthly
electrical Midpoint
𝑓 𝑓𝑋 𝑓𝑋 2
consumption (𝑋)
(kwh)
40 – 44 4 42 168 7 056
45 – 49 11 47 517 24 299
50 – 54 20 52 1 040 54 080
55 – 59 31 57 1 767 100 719
60 – 64 19 62 1 178 73 036
65 – 69 11 67 737 49 379
70 – 74 4 72 288 20 736
100 5 695 329 305
Solution:
The calculation essential for computing the standard deviation is
shown above. Computing the standard deviation of the sample:

2
2 (∑ 𝑓𝑋)
√𝑓𝑋 − 𝑛
𝑠=
𝑛−1

(5 695)2
329 305 −
𝑠=
√ 100 = √329 305 − 324 330.25
100 − 1 99

4 974.75 4 974.75
𝑠=√ =√
99 99

𝑠 = √50.25
𝑠 = 7.0887 or 7.09

Interpretation: The electrical consumption of 100 renters are spread out or


vary from each other.

Worktext: Mathematics in the Modern World 111


Name: Date:
Program and Section: Score:

Try this!

Direction: Answer the following.

1. The distribution of the number of children in a sample of 40 families


selected at random is shown below:

Number of
𝑓
child(ren)
0 5
1 11
2 9
3 5
4 5
5 0
6 4
7 1

Find the following descriptive measures:


a. range
b. MAD
c. variance and standard deviation

2. The ages of the customers at the local theater are shown below.

Age 𝑓
18 – 22 15
23 – 27 33
28 – 32 45
33 – 37 26
38 – 42 13
43 – 47 8
n=40

Find the following descriptive measures:


a. range
b. MAD
c. variance and standard deviation

Worktext: Mathematics in the Modern World 112


LINEAR CORRELATION
Key Concepts:
Correlation analysis is a measure of strength of the relationship between
two variables by means of a single number.

Scatter diagram is a graph of points representing two series, with the


unknown variable plotted on the x- coordinate and the variable to be
estimated plotted on y- coordinate.

Linear relationship is a type of correlation between two variables that can


be described mathematically by a straight line.

Correlation coefficient (r) is an estimate of the measure of linear


association between two variable x and y.

Scatter Diagram

Pearson Product – Moment Correlation Coefficient (r, or Pearson’s r) is a


parametric test of relationship, requires interval scale of measurement.
𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦
𝑟=
√[𝑛 ∑ 𝑥 2 − (∑ 𝑥 )2 ][𝑛 ∑ 𝑦 2 − (∑ 𝑦)2 ]

where,
𝑟 –coefficient of correlation
𝑥, 𝑦 –random variable on observed data
𝑛 –sample size

Illustrative Example:
A school psychologist is interested to find the relationship between a
child’s Emotional Quotient (EQ) and Creativity. To investigate this problem, the
psychologist samples some children and administers a standard Emotional
Quotient and a test of creativity to each child.

Worktext: Mathematics in the Modern World 113


The data are shown below:
Emotional
Student Creativity (𝑦)
Quotient (𝑥)
1 138 51
2 88 30
3 120 38
4 90 24
5 86 20
6 131 50
7 113 35
8 120 27
9 95 30
10 110 24
11 100 52
12 127 39
13 117 30
14 81 18

Solution:
a) Solve for 𝑥𝑦, 𝑥 2 , 𝑦 2 and find the sum.
b) Then solve for r.

Student EQ Creativity 𝒙𝒚 𝒙𝟐 𝒚𝟐
1 138 51 7 038 19 044 2 601
2 88 30 2 400 7 744 900
3 120 38 4 560 14 400 1 444
4 90 24 2 160 8 100 576
5 86 20 1 720 7 396 400
6 131 50 6 550 17 161 2 500
7 113 35 3 955 12 769 1 225
8 120 27 3 240 14 400 729
9 95 30 2 850 9 025 900
10 110 24 2 640 12 100 576
11 100 52 5 200 10 000 2 704
12 127 39 4 953 16 129 1 521
13 117 30 3 510 3 689 900
14 81 18 1 458 6 561 324
∑𝑥 = 1 516 ∑𝑦 = 468 ∑𝑥𝑦 = 52 474 ∑𝑥 2 = 168 518 ∑𝑦 2 = 17 300

Worktext: Mathematics in the Modern World 114


Using the formula:
𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦
𝑟=
√[𝑛 ∑ 𝑥 2 − (∑ 𝑥 )2 ] [𝑛 ∑ 𝑦 2 − (∑ 𝑦)2 ]

14(52 474) − (1 516)(468)


𝑟=
√[14(168 518) − (1 516)2 ][14(17 300) − (468)2 ]
734 636 − 709 488
𝑟=
√[2 359 252 − 2 298 256][242 200 − 219 024]
25 148
𝑟=
√[60 996][23 176]
25 148
𝑟=
√1 413 643 296
25 148
𝑟=
37 598.45
𝑟 = 0.67

Interpretation: Since r = 0.67, the child’s emotional quotient and creativity


is moderately correlated which further means that if the
emotional quotient is moderate then the child’s creativeness is
moderate too.

Worktext: Mathematics in the Modern World 115


Copy protected with Online-PDF-No-Copy.com

You might also like