Professional Documents
Culture Documents
1
Table of Contents
2
Course Code: ENG’G 304
Course Requirements:
Assessment Tasks - 60%
Major Exams - 40%
_________
Periodic Grade 100%
Computation of Grades:
MIDTERM GRADE = 30%(Prelim Grade) + 70 %[60% (Activity 5-7) + 40% (Midterm exam)]
FINAL GRADE = 30%(Midterm Grade) + 70 %[60% (Activity 8-10) + 40% (Final exam)]
3
MODULE 8
CONCEPTS OF STATISTICS
Introduction
Learning Outcomes
89
Lesson 1. Basic concept of statistics
STATISTICS
According to Matloff (2019), statistics is a science that deals with the methods of collecting,
organizing, and summarizing quantitative data which are analyzed and interpreted.
a. Descriptive Statistics → utilizes numerical and graphical methods to look for patterns, to
summarize and to present the information in a set of data.
b. Inferential Statistics → utilizes sample data to make estimates, decisions, predictions
about a larger set of data.
a. Nominal Data → these are measurements that simply classify the units of the sample or
population into categories.
b. Ordinal Data → these are measurements that enable the units of the sample or
population to be ordered or ranked with respect to the variable of interest.
c. Interval Data → these are measurements that enable the determination of the differential
of the characteristic being measured between one unit of the sample or population and
another.
90
d. Ratio Data → these are measurements that enable the determination of the multiple of
the characteristic being measured between one unit of the sample or population and
another.
e. Qualitative Data → these are measurements that have meaningful numbers associated
with the, these include interval and ratio data (Matloff, 2019).
FREQUENCY DISTRIBUTION
This refers to the organization of data in tabular form showing the frequency of occurrence
of the values or objects in each class or category (Matloff, 2019).
f
%f= x 100%
n
where:
f → frequency of each class
n → sample size
c. The cumulative frequency distribution can be obtained by simply adding the class
frequencies.
91
a. Class Interval → refers to the grouping per category defined by the lower limit and the
upper limit.
b. Class Mark → defined as the midpoint of a class interval and is computed by:
c. Class Boundary → a point that represents the halfway or dividing point between
successive classes.
i. The class size or class width is equal to the difference between two consecutive
upper limits around that class.
ii. The range R refers to the difference between the highest and lowest value in the
distribution.
k = 1 + 3.5 log n
d. Find the size of the class interval. The value can be obtained by the desired
number of classes.
e. Construct the classes by choosing a convenient value to start the first class.
f. Determine the frequency of each class by counting the number of items that fall
in each interval.
92
Assessment Task 8
On a clean short bond paper answer the following provided with a complete solution.
3. Give an example of sets of data, identify and explain what method of describing
sets of data was used.
Summary
Types of Statistics
a. Descriptive Statistics → utilizes numerical and graphical methods to look for patterns,
to summarize and to present the information in a set of data.
a. Nominal Data → these are measurements that simply classify the units of the sample or
population into categories.
93
b. Ordinal Data → these are measurements that enable the units of the sample or
population to be ordered or ranked with respect to the variable of interest.
c. Interval Data → these are measurements that enable the determination of the differential
of the characteristic being measured between one unit of the sample or population and
another.
Reference
Matloff, Norman. (2019), Probability and Statistics for Data Science: Chapman Hall
.
94
MODULE 9
MEASURE OF CENTRAL TENDENCY
Introduction
A measure of central tendency is a single value that attempts to describe a set of data by
identifying the central position within that set of data. As such, measures of central tendency are
sometimes called measures of central location. They are also classed as summary statistics. The
mean (often called the average) is most likely the measure of central tendency that you are most
familiar with, but there are others, such as the median and the mode.
The mean, median and mode are all valid measures of central tendency, but under
different conditions, some measures of central tendency become more appropriate to use than
others. In the following sections, we will look at the mean, mode and median, and learn how to
calculate them and under what conditions they are most appropriate to be used (Matloff, 2019).
Learning Outcomes
95
Lesson 1. Mean
According to Matloff (2019), mean is the value obtained by adding the values in the
distribution and dividing the sum by the total number of values or items; it is also the simplest and
most efficient measure of central tendency.
∑x
x̅ = n
∑ wx
x̅ = ∑w
In using the Midpoint Method; the midpoint of each class interval is taken as the
representative of each class. Hence
∑ fx
x̅ =
n
where:
f → represents the frequency of each class
x → midpoint of each class
n → total number of frequencies or the sample size
The Unit Deviation Method uses unit deviation and is usually implemented by considering
an arbitrary point as the initial step in approximating the value of the mean. Hence
96
∑ fd
x̅ = xa + [ ]c
n
where:
xa → assumed mean; or the midpoint of the class interval with the highest frequency
f → frequency of each class
d → unit deviation
c → size of the class interval
n → sample size
Example:
Find the mean of the following set of integers 8, 11, –6, 22, –3.
Solution:
∑x
x̅ = n
8+11+(−6)+22+(−3)
x̅ = 5
x̅ = 6.4
Example:
The mean of 40 numbers was found to be 38. Later on, it was detected that a number 56 was
misread as 36. Find the correct mean of given numbers.
Solution:
= (1520 - 36 + 56)
= 1540.
Example:
8. The mean of the heights of 6 boys is 152 cm. If the individual heights of five of them are 151
cm, 153 cm, 155 cm, 149 cm and 154 cm, find the height of the sixth boy.
Solution:
Sum of the heights of 5 boys = (151 + 153 + 155 + 149 + 154) cm = 762 cm.
Example:
98
Solution:
⇒ 39 + x = 15 × 5
⇒ 39 + x = 75
⇒ 39 - 39 + x = 75 - 39
⇒ x = 36
Hence, x = 36.
Lesson 2. Median
According to Matloff (2019), median is the middle most value in the distribution and is
denoted by 𝐱̃.
In determining the median for ungrouped data; the values must be arranged first terms of
magnitude either from lowest to highest or vice versa. Hence
x̃ = x n+1 ; if n is odd
2
99
x n x n
( ) + ( + 1)
2 2
x̃ = ; if n is even
2
The procedure requires the construction of the less than cumulative frequency column; (<
cumf). Hence
n
− cumfb
2
x̃ = xb + [ ]c
fm
where:
x_b → lower boundary limit of the median class
cumf_b → cumulative frequency before the median class
f_m → frequency of the median class
c → size of the class interval
Example:
Find the median of the data 25, 37, 47, 18, 19, 26, 36.
Solution:
Arranging the data in ascending order, we get 18, 19, 25, 26, 36, 37, 47
= (7 + 1/2)th observation.
= (8/2)th observation
100
= 4th observation.
Example:
Find the median of the data 24, 33, 30, 22, 21, 25, 34, 27.
Solution:
Arranging the data in ascending order, we get 21, 22, 24, 25, 27, 30, 33, 34
= {25 + 27}/2
= 52/2
= 26
101
Lesson 3. Mode
According to Matloff (2019), mode is the most frequent value in the distribution and is
denoted by 𝐱̂
d1
x̂ = xb + [ ]c
d1 + d2
where:
Example:
Solution:
2, 2, 2, 3, 3, 3, 3, 4, 5, 5
102
Therefore, mode of this data is 3.
Note:
Example:
Example:
Lesson 4. Range
According to Matloff (2019), Range is the difference between the lowest and highest
values. Example: In {4, 6, 9, 3, 7} the lowest value is 3, and the highest is 9. So, the range is 9 −
3 = 6.
Example:
What is the range for the following set of numbers? 10, 99, 87, 45, 67, 43, 45, 33, 21, 7, 65, 98?
Solution:
Step 2: Subtract the smallest number in the set from the largest number in the set:
99 – 7 = 92
The range is 92
Example:
Solution:
Step 2: Subtract the smallest number in the set from the largest number in the set:
19 – -12 = 19 + 12 = 31
Example:
Solution:
104
Step 1: Sort the numbers in order, from smallest to largest:
Step 2: Subtract the smallest number in the set from the largest number in the set:
105
Assessment Task 9
On a clean short bond paper answer the following provided with a complete solution.
1. Find the mean, median, mode and range for the following list of values
5, 13, 9, 7, 1, 9, 2, 9, and 11
2. Find the mean, median, mode, and range for the following list of values:
10, 6, 4, 4, 6, 4, 110,6,4,4,6,4,1
10, 6, 4, 4, 6, 4, 110,6,4,4,6,4,1
5. The set of scores 12, 5, 7, -8, x, 10 has a mean of 5. Find the value of x.
Summary
Mean is the value obtained by adding the values in the distribution and dividing the sum
by the total number of values or items; it is also the simplest and most efficient measure of central
tendency.
106
Median is the middle most value in the distribution and is denoted by 𝐱̃.
Mode is the most frequent value in the distribution and is denoted by 𝐱̂.
Reference
Matloff, Norman. (2019), Probability and Statistics for Data Science: Chapman Hall
.
107
MODULE 10
MEASURES OF DISPERSION
Introduction
We live in a changing world, and changes are taking place in all areas of life. The study of
statistics does not show much interest in things which are constant. Many experts are engaged
in the study of changing phenomena. Agricultural, industrial and mineral production and their
transportation are of great interest to economists, statisticians, and other experts. Changes in
human populations, changes in standards of living and changes in literacy rates attract experts to
perform detailed studies and then correlate these changes to human life. Thus, variability or
variation is connected with human life and its study is very important for mankind.
The word dispersion has a technical meaning in statistics. The average measures the
center of the data, and it is one aspect of observation. Another feature of the observation is how
the observations are spread about the center. The observations may be close to the center or
they may be spread away from the center. If the observations are close to the center (usually the
arithmetic mean or median), we say that dispersion, scatter or variation is small. If the
observations are spread away from the center, we say dispersion is large (Matloff, 2019).
Learning Outcomes
108
Lesson 1. Variance and Standard Deviation
According to Matloff (2019), the standard deviation is a measure of the amount of variation
or dispersion of a set of values. A low standard deviation indicates that the values tend to be close
to the mean (also called the expected value) of the set, while a high standard deviation indicates
that the values are spread out over a wider range.
∑n ̅ ]2
i =1[xi − x
σ2 = ; sample variance
n−1
or
1 (∑n
i = 1 xi )
2
σ2 = n − 1 [∑ni=1 xi2 – ]
n
∑n ̅ ]2
i =1[xi − x
σ=√ ; sample standard deviation
n−1
∑n ̅ ]2
i =1[xi − x
σ2 = N−1
; population variance
∑n ̅ ]2
i =1[xi − x
σ=√ ; population standard deviation
N−1
∑n ̅ ]2
i =1 fi [xi − x ∑n
i=1 fi xi
σ2 = ; x̅ =
n−1 n
109
2
1 (∑k
i = 1 fi xi )
σ =
2 [∑ki=1 fi xi2 – ]
n−1 n
Example:
For the set of values 8, 16, 12, 10, 14, 16, find
(a) ∑6i= 1(xi − x̅)
(b) ∑6i=1|xi − x̅|, and
(c) M.D.
Solution:
a. determine ∑6i= 1(xi − x̅)
8 +16 + 12 + 10 + 14 + 16
x̅ =
6
x̅ = 11
∑6i= 1(xi − x̅) = (8 – 11) + (16 – 11) + (12 – 11) + (10 – 11) + (14 – 11) + (16 – 11)
∑6i=1|xi − x̅| = |8 − 11| + |16 − 11| + |12 − 11| + |10 − 11| + |14 − 11| + |16 − 11|
c. determine M.D.
∑6i=1|xi −x
̅| 18
M.D. = =
𝑛 6
M.D. = 3 ← Answer
Example:
For the following data 18, 19, 16, 12, 7, 10, 23; find
110
(a) x̅,
(b) M.D,
(c) σ2 , and
(d) σ
Solution:
a. determine x̅
18 + 19 + 16 + 12 + 7 + 10 + 23
x̅ =
7
x̅ = 15 ← Answer
b. determine M.D.
|18 − 15| + |19 − 15| + |16 − 15| + |12 − 15| + |7 − 15|+|10 − 15|+|23 − 15|
M.D. =
7
c. determine σ2
∑7i=1[xi − 11]2
σ2 =
7−1
(18 − 15)2 +(19 − 15)2 +(16 − 15)2 +(12 − 15)2 +(7 − 15)2 +(10 − 15)2 +(23 − 15)2
=
6
σ2 = 31.33 ← Answer
d. determine σ
2
∑7
i =1[xi
− 15]
σ=√
7−1
σ = 5.597 ← Answer
111
Assessment Task 10
On a clean short bond paper answer the following provided with a complete solution.
2. Sam has 20 rose bushes, but only counted the flowers on 6 of them!
The "population" is all 20 rose bushes, and the "sample" is the 6 bushes that Sam
Summary
Measures of central tendency (mean, median and mode) provide information on the data
values at the center of the data set. Measures of dispersion (quartiles, percentiles, ranges)
provide information on the spread of the data around the center. In this section we will look at
two more measures of dispersion called the variance and the standard deviation.
112
a. Variance is a measure found by squaring the individual deviations and is denoted by σ 2.
b. Standard deviation denoted by σ is found by getting the square root of the variance.
Reference
Matloff, Norman. (2019), Probability and Statistics for Data Science: Chapman Hall
113