Data Anaylsis

Engineering Data Analysis
Jan Lexver C. Tiangco
1
Table of Contents
Module 8: Concepts of Statistics

Introduction 89
Learning Objectives 89
Lesson 1. Basic concept of statistics 90
Lesson 2. Methods for describing sets of data 90
Lesson 3. Construction of a frequency distribution 92
Assessment Task 8 93
Summary 93
Reference 94
Module 9: Measure of Central tendency

Introduction 95
Lesson 1. Mean 96
Lesson 2. Median 99
Lesson 3. Mode 102
Lesson 4. Range 103
Summary 106
Reference 107
Module 10: Measures of Dispersion

Introduction 108
Lesson 1. Variance and Standard Deviation 109
Summary 112
Reference 113
2
Course Code: ENG’G 304
Course Description: This course is designed for undergraduate engineering

students with a focus on problem solving related to societal issues that engineers and
scientists are called upon to tackle. This incorporates different methods of data
collection and the suitability of a particular method for a given situation. The
relationship between probability and statistics is also explored, providing students with
the tools they need to understand how "chance" plays a role in statistical analysis.
Course Intended Learning Outcomes (CILO):

At the end of the course, students should be able to:
1. Apply statistical methods in the analysis of data
2. Design experiments involving several factors
3. Solve problems involving probability
Course Requirements:
Assessment Tasks - 60%
Major Exams - 40%
_________
Periodic Grade 100%
Computation of Grades:
PRELIM GRADE = 60% (Activity 1-4) + 40% (Prelim exam)
MIDTERM GRADE = 30%(Prelim Grade) + 70 %[60% (Activity 5-7) + 40% (Midterm exam)]
FINAL GRADE = 30%(Midterm Grade) + 70 %[60% (Activity 8-10) + 40% (Final exam)]
3
MODULE 8
CONCEPTS OF STATISTICS
Introduction
Statistics is a mathematical science including methods of collecting, organizing and

analyzing data in such a way that meaningful conclusions can be drawn from them. In general,
its investigations and analyses fall into two broad categories called descriptive and inferential
statistics.
Descriptive statistics deals with the processing of data without attempting to draw any
inferences from it. The data are presented in the form of tables and graphs. The characteristics
of the data are described in simple terms. Events that are dealt with include everyday happenings
such as accidents, prices of goods, business, incomes, epidemics, sports data, population data
(Matloff, 2019).
Learning Outcomes
At the end of this module, students should be able to:
1. Describe the other types of statistics;

2. Familiarized with the methods for describing sets of data; and
3. Identify the population, sample, and variance.
89
Lesson 1. Basic concept of statistics
STATISTICS
According to Matloff (2019), statistics is a science that deals with the methods of collecting,
organizing, and summarizing quantitative data which are analyzed and interpreted.
a. Descriptive Statistics → utilizes numerical and graphical methods to look for patterns, to
summarize and to present the information in a set of data.
b. Inferential Statistics → utilizes sample data to make estimates, decisions, predictions
about a larger set of data.
POPULATION, SAMPLE, AND VARIABLE
a. Population → a set of existing units or items such as people, objects, transactions, or

events.
b. Sample → a sub-collection of items drawn from population
c. Variable → a characteristic or property of an individual unit such as height of a person,
time of a reflex, amount of transaction, etc (Matloff, 2019).
Lesson 2. Methods for describing sets of data
a. Nominal Data → these are measurements that simply classify the units of the sample or
population into categories.
b. Ordinal Data → these are measurements that enable the units of the sample or
population to be ordered or ranked with respect to the variable of interest.
c. Interval Data → these are measurements that enable the determination of the differential
of the characteristic being measured between one unit of the sample or population and
another.
90
d. Ratio Data → these are measurements that enable the determination of the multiple of
the characteristic being measured between one unit of the sample or population and
another.
e. Qualitative Data → these are measurements that have meaningful numbers associated
with the, these include interval and ratio data (Matloff, 2019).
FREQUENCY DISTRIBUTION
This refers to the organization of data in tabular form showing the frequency of occurrence
of the values or objects in each class or category (Matloff, 2019).
a. Frequency is the number of times a value appears in the listing or data.

b. The relative frequency distribution of a given set of data shows the proportion in percent
the frequency of each class to the total frequency.
The relative frequency denoted by % f is given by
f
%f= x 100%
n
where:
f → frequency of each class
n → sample size
c. The cumulative frequency distribution can be obtained by simply adding the class
frequencies.
There are two types of cumulative frequency distribution

i. Less than cumulative frequency distribution; < cumf → refers to a distribution
whose frequencies are less than or below the upper-class boundary they
correspond.
ii. Greater than cumulative distribution; > cumf → refers to the distribution
whose frequencies are greater than or above the lower-class boundary.
CLASS INTERVALS, CLASS MARK, AND CLASS BOUNDARIES
91
a. Class Interval → refers to the grouping per category defined by the lower limit and the
upper limit.
b. Class Mark → defined as the midpoint of a class interval and is computed by:
[lower limit + upper limit]

of the class of the class
Class Mark =
2
c. Class Boundary → a point that represents the halfway or dividing point between
successive classes.
i. The class size or class width is equal to the difference between two consecutive
upper limits around that class.
ii. The range R refers to the difference between the highest and lowest value in the
distribution.
Lesson 3. Construction of a frequency distribution
Consider the following steps in constructing a frequency distribution (Matloff, 2019).

a. Get the highest and lowest value in the distribution.
b. Compute the value of the range.
c. Determine the number of classes
i. There is no standard method to follow in determining the number of classes.
ii. The number of classes must not be less than 5 and should not be more than 15.
iii. In some instances, the number of classes k can be approximated by using the formula
k = 1 + 3.5 log n
d. Find the size of the class interval. The value can be obtained by the desired
number of classes.
e. Construct the classes by choosing a convenient value to start the first class.
f. Determine the frequency of each class by counting the number of items that fall
in each interval.
92
Assessment Task 8
On a clean short bond paper answer the following provided with a complete solution.
1. Define and give an example of Descriptive Statistics.
2. Define and give an example of Inferential Statistics.
3. Give an example of sets of data, identify and explain what method of describing
sets of data was used.
Summary
Types of Statistics
a. Descriptive Statistics → utilizes numerical and graphical methods to look for patterns,
to summarize and to present the information in a set of data.
b. Inferential Statistics → utilizes sample data to make estimates, decisions, predictions

about a larger set of data.
Methods for describing sets of data
a. Nominal Data → these are measurements that simply classify the units of the sample or
population into categories.
93
b. Ordinal Data → these are measurements that enable the units of the sample or
population to be ordered or ranked with respect to the variable of interest.
c. Interval Data → these are measurements that enable the determination of the differential
of the characteristic being measured between one unit of the sample or population and
another.
Reference
 Matloff, Norman. (2019), Probability and Statistics for Data Science: Chapman Hall
.
94
MODULE 9
MEASURE OF CENTRAL TENDENCY
Introduction
A measure of central tendency is a single value that attempts to describe a set of data by
identifying the central position within that set of data. As such, measures of central tendency are
sometimes called measures of central location. They are also classed as summary statistics. The
mean (often called the average) is most likely the measure of central tendency that you are most
familiar with, but there are others, such as the median and the mode.
The mean, median and mode are all valid measures of central tendency, but under
different conditions, some measures of central tendency become more appropriate to use than
others. In the following sections, we will look at the mean, mode and median, and learn how to
calculate them and under what conditions they are most appropriate to be used (Matloff, 2019).
Learning Outcomes
1. Describe the importance of measure of tendency;

2. Describe mean, median and mode; and
3. Solve the value of mean, median and mode.
95
Lesson 1. Mean
According to Matloff (2019), mean is the value obtained by adding the values in the
distribution and dividing the sum by the total number of values or items; it is also the simplest and
most efficient measure of central tendency.
i. Mean for Ungrouped Data
The mean for ungrouped data denoted by x̅ is given by:
∑x
x̅ = n
When each value in the distribution is associated to a certain weight or degree of

importance, hence the weighted mean will be:
∑ wx
x̅ = ∑w
ii. Mean of Grouped Data
In using the Midpoint Method; the midpoint of each class interval is taken as the
representative of each class. Hence
∑ fx
x̅ =
n
where:
f → represents the frequency of each class
x → midpoint of each class
n → total number of frequencies or the sample size
The Unit Deviation Method uses unit deviation and is usually implemented by considering
an arbitrary point as the initial step in approximating the value of the mean. Hence
96
∑ fd
x̅ = xa + [ ]c
n
where:
xa → assumed mean; or the midpoint of the class interval with the highest frequency
f → frequency of each class
d → unit deviation
c → size of the class interval
n → sample size
Example:
Find the mean of the following set of integers 8, 11, –6, 22, –3.
Solution:
∑x
x̅ = n
8+11+(−6)+22+(−3)
x̅ = 5
x̅ = 6.4
Example:
The mean of 40 numbers was found to be 38. Later on, it was detected that a number 56 was
misread as 36. Find the correct mean of given numbers.
Solution:
Calculated mean of 40 numbers = 38.
Therefore, calculated sum of these numbers = (38 × 40) = 1520.
Correct sum of these numbers

97
= [1520 - (wrong item) + (correct item)]
= (1520 - 36 + 56)
= 1540.
Therefore, the correct mean = 1540/40 = 38.5.
Example:
8. The mean of the heights of 6 boys is 152 cm. If the individual heights of five of them are 151
cm, 153 cm, 155 cm, 149 cm and 154 cm, find the height of the sixth boy.
Solution:
Mean height of 6 boys = 152 cm.
Sum of the heights of 6 boys = (152 × 6) = 912 cm
Sum of the heights of 5 boys = (151 + 153 + 155 + 149 + 154) cm = 762 cm.
Height of the sixth boy

= (sum of the heights of 6 boys) - (sum of the heights of 5 boys)
= (912 - 762) cm = 150 cm.
Hence, the height of the sixth girl is 150 cm.
Example:
If the mean of 9, 8, 10, x, 12 is 15, find the value of x.
98
Solution:
Mean of the given numbers = (9 + 8 + 10 + x + 12)/5 = (39 + x)/5
According to the problem, mean = 15 (given).
Therefore, (39 + x)/5 = 15
⇒ 39 + x = 15 × 5
⇒ 39 + x = 75
⇒ 39 - 39 + x = 75 - 39
⇒ x = 36
Hence, x = 36.
Lesson 2. Median
According to Matloff (2019), median is the middle most value in the distribution and is
denoted by 𝐱̃.
i. Median for Ungrouped Data
In determining the median for ungrouped data; the values must be arranged first terms of
magnitude either from lowest to highest or vice versa. Hence
x̃ = x n+1 ; if n is odd
2
99
x n x n
( ) + ( + 1)
2 2
x̃ = ; if n is even
2
ii. Median for Grouped Data
The procedure requires the construction of the less than cumulative frequency column; (<
cumf). Hence
n
− cumfb
2
x̃ = xb + [ ]c
fm
where:
x_b → lower boundary limit of the median class
cumf_b → cumulative frequency before the median class
f_m → frequency of the median class
Example:
Find the median of the data 25, 37, 47, 18, 19, 26, 36.
Solution:
Arranging the data in ascending order, we get 18, 19, 25, 26, 36, 37, 47
Here, the number of observations is odd, i.e., 7.
Therefore, median = (n + 1/2)th observation.
= (7 + 1/2)th observation.
= (8/2)th observation
100
= 4th observation.
4th observation is 26.
Therefore, median of the data is 26.
Example:
Find the median of the data 24, 33, 30, 22, 21, 25, 34, 27.
Solution:
Here, the number of observations is even, i.e., 8.
Arranging the data in ascending order, we get 21, 22, 24, 25, 27, 30, 33, 34
Therefore, median = {(n/2)th observation + (n + 1/2)th observation}/2
= (8/2)th observation + (8/2 + 1)th observation
= 4th observation + (4 + 1)th observation
= {25 + 27}/2
= 52/2
= 26
Therefore, the median of the given data is 26.
101
Lesson 3. Mode
According to Matloff (2019), mode is the most frequent value in the distribution and is
denoted by 𝐱̂
i. Mode for Ungrouped Data

Mode can be obtained through inspection.
ii. Mode for grouped Data
Identify the modal class; that is the interval which contains the highest
frequency in the distribution. Hence
d1
x̂ = xb + [ ]c
d1 + d2
where:
xb → lower boundary limit of the modal class

d1 ; d2 → difference between the frequency of the modal class and the
frequency
of the interval before and after the modal class respectively
Example:
Find the mode of the given set of number 2, 2, 3, 5, 4, 3, 2, 3, 3, 5
Solution:
Arranging the number with same values together, we get
2, 2, 2, 3, 3, 3, 3, 4, 5, 5
We observe that 3 occurs maximum number of times, i.e., four.
102
Therefore, mode of this data is 3.
Note:
A data may not have a mode.
Example:
The data 3, 4, 1, 5, 4, 2 has no mode because no number occurs more number

of times than any other number.
● A data may have more than one mode.
Example:
The data 2, 5, 1, 3, 5, 7, 6, 3, 8 have two modes 3 and 5.
Therefore, each is repeated two times which is maximum.
Lesson 4. Range
According to Matloff (2019), Range is the difference between the lowest and highest
values. Example: In {4, 6, 9, 3, 7} the lowest value is 3, and the highest is 9. So, the range is 9 −
3 = 6.
Example:
What is the range for the following set of numbers? 10, 99, 87, 45, 67, 43, 45, 33, 21, 7, 65, 98?
Solution:
Step 1: Sort the numbers in order, from smallest to largest:

103
7, 10, 21, 33, 43, 45, 45, 65, 67, 87, 98, 99
Step 2: Subtract the smallest number in the set from the largest number in the set:
99 – 7 = 92
The range is 92
Example:
What is the range of these integers?

14, -12, 7, 0, -5, -8, 17, -11, 19
Solution:
-12, -11, -8, -5, 0, 7, 14, 17, 19
19 – -12 = 19 + 12 = 31
The range is 31.
Example:
What is the range of the following times?

2.7 hrs, 8.3 hrs, 3.5 hrs, 5.1 hrs, 4.9 hrs
Solution:
104
2.7, 3.5, 4.9, 5.1, 8.3
8.3 hr – 2.7 hr = 5.6 hr
The range is 5.6 hr.
105
Assessment Task 9
1. Find the mean, median, mode and range for the following list of values
5, 13, 9, 7, 1, 9, 2, 9, and 11
2. Find the mean, median, mode, and range for the following list of values:
13, 18, 13, 14, 13, 16, 14, 21, 13
3. What is the arithmetic mean of the following numbers?
10, 6, 4, 4, 6, 4, 110,6,4,4,6,4,1
4. What is the mode of the following numbers?
10, 6, 4, 4, 6, 4, 110,6,4,4,6,4,1
5. The set of scores 12, 5, 7, -8, x, 10 has a mean of 5. Find the value of x.
Summary
Mean is the value obtained by adding the values in the distribution and dividing the sum
by the total number of values or items; it is also the simplest and most efficient measure of central
tendency.
106
Median is the middle most value in the distribution and is denoted by 𝐱̃.
Mode is the most frequent value in the distribution and is denoted by 𝐱̂.
Range is the difference between the lowest and highest values.
Reference
.
107
MODULE 10
MEASURES OF DISPERSION
Introduction
We live in a changing world, and changes are taking place in all areas of life. The study of
statistics does not show much interest in things which are constant. Many experts are engaged
in the study of changing phenomena. Agricultural, industrial and mineral production and their
transportation are of great interest to economists, statisticians, and other experts. Changes in
human populations, changes in standards of living and changes in literacy rates attract experts to
perform detailed studies and then correlate these changes to human life. Thus, variability or
variation is connected with human life and its study is very important for mankind.
The word dispersion has a technical meaning in statistics. The average measures the
center of the data, and it is one aspect of observation. Another feature of the observation is how
the observations are spread about the center. The observations may be close to the center or
they may be spread away from the center. If the observations are close to the center (usually the
arithmetic mean or median), we say that dispersion, scatter or variation is small. If the
observations are spread away from the center, we say dispersion is large (Matloff, 2019).
Learning Outcomes
1. Describe the applications of measure of dispersion;

2. Familiarized the variance and standard deviation; and
3. Solve the variance and standard deviation of sets of values.
108
Lesson 1. Variance and Standard Deviation
According to Matloff (2019), the standard deviation is a measure of the amount of variation
or dispersion of a set of values. A low standard deviation indicates that the values tend to be close
to the mean (also called the expected value) of the set, while a high standard deviation indicates
that the values are spread out over a wider range.
a. Variance is a measure found by squaring the individual deviations and is denoted by σ2 .

b. Standard deviation denoted by σ is found by getting the square root of the variance.
i. For Ungrouped Data
∑n ̅ ]2
i =1[xi − x
σ2 = ; sample variance
n−1
or
1 (∑n
i = 1 xi )
2
σ2 = n − 1 [∑ni=1 xi2 – ]
n
∑n ̅ ]2
i =1[xi − x
σ=√ ; sample standard deviation
n−1
∑n ̅ ]2
i =1[xi − x
σ2 = N−1
; population variance
∑n ̅ ]2
i =1[xi − x
σ=√ ; population standard deviation
N−1
ii. For Grouped Data
∑n ̅ ]2
i =1 fi [xi − x ∑n
i=1 fi xi
σ2 = ; x̅ =
n−1 n
109
2
1 (∑k
i = 1 fi xi )
σ =
2 [∑ki=1 fi xi2 – ]
n−1 n
Example:
For the set of values 8, 16, 12, 10, 14, 16, find
(a) ∑6i= 1(xi − x̅)
(b) ∑6i=1|xi − x̅|, and
(c) M.D.
Solution:
a. determine ∑6i= 1(xi − x̅)
8 +16 + 12 + 10 + 14 + 16
x̅ =
6
x̅ = 11
∑6i= 1(xi − x̅) = (8 – 11) + (16 – 11) + (12 – 11) + (10 – 11) + (14 – 11) + (16 – 11)
∑6i= 1(xi − x̅) = 0 ← Answer
b. determine ∑6i=1|xi − x̅|
∑6i=1|xi − x̅| = |8 − 11| + |16 − 11| + |12 − 11| + |10 − 11| + |14 − 11| + |16 − 11|
∑6i=1|xi − x̅|= 18 ← Answer
c. determine M.D.
∑6i=1|xi −x
̅| 18
M.D. = =
𝑛 6
M.D. = 3 ← Answer
Example:
For the following data 18, 19, 16, 12, 7, 10, 23; find
110
(a) x̅,
(b) M.D,
(c) σ2 , and
(d) σ
Solution:
a. determine x̅
18 + 19 + 16 + 12 + 7 + 10 + 23
x̅ =
7
x̅ = 15 ← Answer
b. determine M.D.
|18 − 15| + |19 − 15| + |16 − 15| + |12 − 15| + |7 − 15|+|10 − 15|+|23 − 15|
M.D. =
7
M.D. = 4.57 ← Answer
c. determine σ2
∑7i=1[xi − 11]2
σ2 =
7−1
(18 − 15)2 +(19 − 15)2 +(16 − 15)2 +(12 − 15)2 +(7 − 15)2 +(10 − 15)2 +(23 − 15)2
=
6
σ2 = 31.33 ← Answer
d. determine σ
2
∑7
i =1[xi
− 15]
σ=√
7−1
σ = 5.597 ← Answer
111
Assessment Task 10
1. Sam has 20 Rose Bushes. The number of flowers on each bush is
9, 2, 5, 4, 12, 7, 8, 11, 9, 3, 7, 4, 12, 5, 4, 10, 9, 6, 9, 4
Work out the Standard Deviation.
2. Sam has 20 rose bushes, but only counted the flowers on 6 of them!
The "population" is all 20 rose bushes, and the "sample" is the 6 bushes that Sam
counted the flowers of.
Let us say Sam's flower counts are: 9, 2, 5, 4, 12, 7
Work out the Standard Deviation.
Summary
Measures of central tendency (mean, median and mode) provide information on the data
values at the center of the data set. Measures of dispersion (quartiles, percentiles, ranges)
provide information on the spread of the data around the center. In this section we will look at
two more measures of dispersion called the variance and the standard deviation.
112
a. Variance is a measure found by squaring the individual deviations and is denoted by σ 2.
b. Standard deviation denoted by σ is found by getting the square root of the variance.
Reference
113

Data Anaylsis

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Anaylsis

Uploaded by

Copyright:

Available Formats

Engineering Data Analysis

Jan Lexver C. Tiangco

Module 8: Concepts of Statistics

Module 9: Measure of Central tendency

Module 10: Measures of Dispersion

Course Description: This course is designed for undergraduate engineering

Course Intended Learning Outcomes (CILO):

PRELIM GRADE = 60% (Activity 1-4) + 40% (Prelim exam)

Statistics is a mathematical science including methods of collecting, organizing and

At the end of this module, students should be able to:

1. Describe the other types of statistics;

POPULATION, SAMPLE, AND VARIABLE

a. Population → a set of existing units or items such as people, objects, transactions, or

Lesson 2. Methods for describing sets of data

a. Frequency is the number of times a value appears in the listing or data.

The relative frequency denoted by % f is given by

There are two types of cumulative frequency distribution

CLASS INTERVALS, CLASS MARK, AND CLASS BOUNDARIES

[lower limit + upper limit]

Lesson 3. Construction of a frequency distribution

Consider the following steps in constructing a frequency distribution (Matloff, 2019).

1. Define and give an example of Descriptive Statistics.

2. Define and give an example of Inferential Statistics.

b. Inferential Statistics → utilizes sample data to make estimates, decisions, predictions

Methods for describing sets of data

At the end of this module, students should be able to:

1. Describe the importance of measure of tendency;

i. Mean for Ungrouped Data

The mean for ungrouped data denoted by x̅ is given by:

When each value in the distribution is associated to a certain weight or degree of

ii. Mean of Grouped Data

Calculated mean of 40 numbers = 38.

Therefore, calculated sum of these numbers = (38 × 40) = 1520.

Correct sum of these numbers

Therefore, the correct mean = 1540/40 = 38.5.

Mean height of 6 boys = 152 cm.

Sum of the heights of 6 boys = (152 × 6) = 912 cm

Height of the sixth boy

= (912 - 762) cm = 150 cm.

Hence, the height of the sixth girl is 150 cm.

If the mean of 9, 8, 10, x, 12 is 15, find the value of x.

Mean of the given numbers = (9 + 8 + 10 + x + 12)/5 = (39 + x)/5

According to the problem, mean = 15 (given).

Therefore, (39 + x)/5 = 15

i. Median for Ungrouped Data

ii. Median for Grouped Data

Here, the number of observations is odd, i.e., 7.

Therefore, median = (n + 1/2)th observation.

4th observation is 26.

Therefore, median of the data is 26.

Here, the number of observations is even, i.e., 8.

Therefore, median = {(n/2)th observation + (n + 1/2)th observation}/2

= (8/2)th observation + (8/2 + 1)th observation

= 4th observation + (4 + 1)th observation

Therefore, the median of the given data is 26.

i. Mode for Ungrouped Data

xb → lower boundary limit of the modal class

Find the mode of the given set of number 2, 2, 3, 5, 4, 3, 2, 3, 3, 5

Arranging the number with same values together, we get

We observe that 3 occurs maximum number of times, i.e., four.

A data may not have a mode.

The data 3, 4, 1, 5, 4, 2 has no mode because no number occurs more number

● A data may have more than one mode.

The data 2, 5, 1, 3, 5, 7, 6, 3, 8 have two modes 3 and 5.