You are on page 1of 28

Engineering Data Analysis

Jan Lexver C. Tiangco

1
Table of Contents

Module 8: Concepts of Statistics


Introduction 89
Learning Objectives 89
Lesson 1. Basic concept of statistics 90
Lesson 2. Methods for describing sets of data 90
Lesson 3. Construction of a frequency distribution 92
Assessment Task 8 93
Summary 93
Reference 94

Module 9: Measure of Central tendency


Introduction 95
Learning Objectives 95
Lesson 1. Mean 96
Lesson 2. Median 99
Lesson 3. Mode 102
Lesson 4. Range 103
Assessment Task 9 106
Summary 106
Reference 107

Module 10: Measures of Dispersion


Introduction 108
Learning Objectives 108
Lesson 1. Variance and Standard Deviation 109
Assessment Task 10 112
Summary 112
Reference 113

2
Course Code: ENG’G 304

Course Description: This course is designed for undergraduate engineering


students with a focus on problem solving related to societal issues that engineers and
scientists are called upon to tackle. This incorporates different methods of data
collection and the suitability of a particular method for a given situation. The
relationship between probability and statistics is also explored, providing students with
the tools they need to understand how "chance" plays a role in statistical analysis.

Course Intended Learning Outcomes (CILO):


At the end of the course, students should be able to:
1. Apply statistical methods in the analysis of data
2. Design experiments involving several factors
3. Solve problems involving probability

Course Requirements:
Assessment Tasks - 60%
Major Exams - 40%
_________
Periodic Grade 100%

Computation of Grades:

PRELIM GRADE = 60% (Activity 1-4) + 40% (Prelim exam)

MIDTERM GRADE = 30%(Prelim Grade) + 70 %[60% (Activity 5-7) + 40% (Midterm exam)]

FINAL GRADE = 30%(Midterm Grade) + 70 %[60% (Activity 8-10) + 40% (Final exam)]

3
MODULE 8
CONCEPTS OF STATISTICS

Introduction

Statistics is a mathematical science including methods of collecting, organizing and


analyzing data in such a way that meaningful conclusions can be drawn from them. In general,
its investigations and analyses fall into two broad categories called descriptive and inferential
statistics.
Descriptive statistics deals with the processing of data without attempting to draw any
inferences from it. The data are presented in the form of tables and graphs. The characteristics
of the data are described in simple terms. Events that are dealt with include everyday happenings
such as accidents, prices of goods, business, incomes, epidemics, sports data, population data
(Matloff, 2019).

Learning Outcomes

At the end of this module, students should be able to:

1. Describe the other types of statistics;


2. Familiarized with the methods for describing sets of data; and
3. Identify the population, sample, and variance.

89
Lesson 1. Basic concept of statistics

STATISTICS

According to Matloff (2019), statistics is a science that deals with the methods of collecting,
organizing, and summarizing quantitative data which are analyzed and interpreted.

a. Descriptive Statistics → utilizes numerical and graphical methods to look for patterns, to
summarize and to present the information in a set of data.
b. Inferential Statistics → utilizes sample data to make estimates, decisions, predictions
about a larger set of data.

POPULATION, SAMPLE, AND VARIABLE

a. Population → a set of existing units or items such as people, objects, transactions, or


events.
b. Sample → a sub-collection of items drawn from population
c. Variable → a characteristic or property of an individual unit such as height of a person,
time of a reflex, amount of transaction, etc (Matloff, 2019).

Lesson 2. Methods for describing sets of data

a. Nominal Data → these are measurements that simply classify the units of the sample or
population into categories.
b. Ordinal Data → these are measurements that enable the units of the sample or
population to be ordered or ranked with respect to the variable of interest.
c. Interval Data → these are measurements that enable the determination of the differential
of the characteristic being measured between one unit of the sample or population and
another.

90
d. Ratio Data → these are measurements that enable the determination of the multiple of
the characteristic being measured between one unit of the sample or population and
another.
e. Qualitative Data → these are measurements that have meaningful numbers associated
with the, these include interval and ratio data (Matloff, 2019).

FREQUENCY DISTRIBUTION

This refers to the organization of data in tabular form showing the frequency of occurrence
of the values or objects in each class or category (Matloff, 2019).

a. Frequency is the number of times a value appears in the listing or data.


b. The relative frequency distribution of a given set of data shows the proportion in percent
the frequency of each class to the total frequency.

The relative frequency denoted by % f is given by

f
%f= x 100%
n
where:
f → frequency of each class
n → sample size
c. The cumulative frequency distribution can be obtained by simply adding the class
frequencies.

There are two types of cumulative frequency distribution


i. Less than cumulative frequency distribution; < cumf → refers to a distribution
whose frequencies are less than or below the upper-class boundary they
correspond.
ii. Greater than cumulative distribution; > cumf → refers to the distribution
whose frequencies are greater than or above the lower-class boundary.

CLASS INTERVALS, CLASS MARK, AND CLASS BOUNDARIES

91
a. Class Interval → refers to the grouping per category defined by the lower limit and the
upper limit.
b. Class Mark → defined as the midpoint of a class interval and is computed by:

[lower limit + upper limit]


of the class of the class
Class Mark =
2

c. Class Boundary → a point that represents the halfway or dividing point between
successive classes.
i. The class size or class width is equal to the difference between two consecutive
upper limits around that class.
ii. The range R refers to the difference between the highest and lowest value in the
distribution.

Lesson 3. Construction of a frequency distribution

Consider the following steps in constructing a frequency distribution (Matloff, 2019).


a. Get the highest and lowest value in the distribution.
b. Compute the value of the range.
c. Determine the number of classes
i. There is no standard method to follow in determining the number of classes.
ii. The number of classes must not be less than 5 and should not be more than 15.
iii. In some instances, the number of classes k can be approximated by using the formula

k = 1 + 3.5 log n

d. Find the size of the class interval. The value can be obtained by the desired
number of classes.
e. Construct the classes by choosing a convenient value to start the first class.
f. Determine the frequency of each class by counting the number of items that fall
in each interval.

92
Assessment Task 8

On a clean short bond paper answer the following provided with a complete solution.

1. Define and give an example of Descriptive Statistics.

2. Define and give an example of Inferential Statistics.

3. Give an example of sets of data, identify and explain what method of describing
sets of data was used.

Summary

Types of Statistics

a. Descriptive Statistics → utilizes numerical and graphical methods to look for patterns,
to summarize and to present the information in a set of data.

b. Inferential Statistics → utilizes sample data to make estimates, decisions, predictions


about a larger set of data.

Methods for describing sets of data

a. Nominal Data → these are measurements that simply classify the units of the sample or
population into categories.
93
b. Ordinal Data → these are measurements that enable the units of the sample or
population to be ordered or ranked with respect to the variable of interest.
c. Interval Data → these are measurements that enable the determination of the differential
of the characteristic being measured between one unit of the sample or population and
another.

Reference

 Matloff, Norman. (2019), Probability and Statistics for Data Science: Chapman Hall
.

94
MODULE 9
MEASURE OF CENTRAL TENDENCY

Introduction

A measure of central tendency is a single value that attempts to describe a set of data by
identifying the central position within that set of data. As such, measures of central tendency are
sometimes called measures of central location. They are also classed as summary statistics. The
mean (often called the average) is most likely the measure of central tendency that you are most
familiar with, but there are others, such as the median and the mode.

The mean, median and mode are all valid measures of central tendency, but under
different conditions, some measures of central tendency become more appropriate to use than
others. In the following sections, we will look at the mean, mode and median, and learn how to
calculate them and under what conditions they are most appropriate to be used (Matloff, 2019).

Learning Outcomes

At the end of this module, students should be able to:

1. Describe the importance of measure of tendency;


2. Describe mean, median and mode; and
3. Solve the value of mean, median and mode.

95
Lesson 1. Mean

According to Matloff (2019), mean is the value obtained by adding the values in the
distribution and dividing the sum by the total number of values or items; it is also the simplest and
most efficient measure of central tendency.

i. Mean for Ungrouped Data

The mean for ungrouped data denoted by x̅ is given by:

∑x
x̅ = n

When each value in the distribution is associated to a certain weight or degree of


importance, hence the weighted mean will be:

∑ wx
x̅ = ∑w

ii. Mean of Grouped Data

In using the Midpoint Method; the midpoint of each class interval is taken as the
representative of each class. Hence

∑ fx
x̅ =
n
where:
f → represents the frequency of each class
x → midpoint of each class
n → total number of frequencies or the sample size

The Unit Deviation Method uses unit deviation and is usually implemented by considering
an arbitrary point as the initial step in approximating the value of the mean. Hence

96
∑ fd
x̅ = xa + [ ]c
n
where:
xa → assumed mean; or the midpoint of the class interval with the highest frequency
f → frequency of each class
d → unit deviation
c → size of the class interval
n → sample size

Example:

Find the mean of the following set of integers 8, 11, –6, 22, –3.

Solution:

∑x
x̅ = n
8+11+(−6)+22+(−3)
x̅ = 5

x̅ = 6.4

Example:

The mean of 40 numbers was found to be 38. Later on, it was detected that a number 56 was
misread as 36. Find the correct mean of given numbers.

Solution:

Calculated mean of 40 numbers = 38.

Therefore, calculated sum of these numbers = (38 × 40) = 1520.

Correct sum of these numbers


97
= [1520 - (wrong item) + (correct item)]

= (1520 - 36 + 56)

= 1540.

Therefore, the correct mean = 1540/40 = 38.5.

Example:

8. The mean of the heights of 6 boys is 152 cm. If the individual heights of five of them are 151
cm, 153 cm, 155 cm, 149 cm and 154 cm, find the height of the sixth boy.

Solution:

Mean height of 6 boys = 152 cm.

Sum of the heights of 6 boys = (152 × 6) = 912 cm

Sum of the heights of 5 boys = (151 + 153 + 155 + 149 + 154) cm = 762 cm.

Height of the sixth boy


= (sum of the heights of 6 boys) - (sum of the heights of 5 boys)

= (912 - 762) cm = 150 cm.

Hence, the height of the sixth girl is 150 cm.

Example:

If the mean of 9, 8, 10, x, 12 is 15, find the value of x.

98
Solution:

Mean of the given numbers = (9 + 8 + 10 + x + 12)/5 = (39 + x)/5

According to the problem, mean = 15 (given).

Therefore, (39 + x)/5 = 15

⇒ 39 + x = 15 × 5

⇒ 39 + x = 75

⇒ 39 - 39 + x = 75 - 39

⇒ x = 36

Hence, x = 36.

Lesson 2. Median

According to Matloff (2019), median is the middle most value in the distribution and is
denoted by 𝐱̃.

i. Median for Ungrouped Data

In determining the median for ungrouped data; the values must be arranged first terms of
magnitude either from lowest to highest or vice versa. Hence

x̃ = x n+1 ; if n is odd
2

99
x n x n
( ) + ( + 1)
2 2
x̃ = ; if n is even
2

ii. Median for Grouped Data

The procedure requires the construction of the less than cumulative frequency column; (<
cumf). Hence

n
− cumfb
2
x̃ = xb + [ ]c
fm

where:
x_b → lower boundary limit of the median class
cumf_b → cumulative frequency before the median class
f_m → frequency of the median class
c → size of the class interval

Example:

Find the median of the data 25, 37, 47, 18, 19, 26, 36.

Solution:

Arranging the data in ascending order, we get 18, 19, 25, 26, 36, 37, 47

Here, the number of observations is odd, i.e., 7.

Therefore, median = (n + 1/2)th observation.

= (7 + 1/2)th observation.

= (8/2)th observation

100
= 4th observation.

4th observation is 26.

Therefore, median of the data is 26.

Example:

Find the median of the data 24, 33, 30, 22, 21, 25, 34, 27.

Solution:

Here, the number of observations is even, i.e., 8.

Arranging the data in ascending order, we get 21, 22, 24, 25, 27, 30, 33, 34

Therefore, median = {(n/2)th observation + (n + 1/2)th observation}/2

= (8/2)th observation + (8/2 + 1)th observation

= 4th observation + (4 + 1)th observation

= {25 + 27}/2

= 52/2

= 26

Therefore, the median of the given data is 26.

101
Lesson 3. Mode

According to Matloff (2019), mode is the most frequent value in the distribution and is
denoted by 𝐱̂

i. Mode for Ungrouped Data


Mode can be obtained through inspection.
ii. Mode for grouped Data
Identify the modal class; that is the interval which contains the highest
frequency in the distribution. Hence

d1
x̂ = xb + [ ]c
d1 + d2

where:

xb → lower boundary limit of the modal class


d1 ; d2 → difference between the frequency of the modal class and the
frequency
of the interval before and after the modal class respectively
c → size of the class interval

Example:

Find the mode of the given set of number 2, 2, 3, 5, 4, 3, 2, 3, 3, 5

Solution:

Arranging the number with same values together, we get

2, 2, 2, 3, 3, 3, 3, 4, 5, 5

We observe that 3 occurs maximum number of times, i.e., four.

102
Therefore, mode of this data is 3.

Note:

A data may not have a mode.

Example:

The data 3, 4, 1, 5, 4, 2 has no mode because no number occurs more number


of times than any other number.

● A data may have more than one mode.

Example:

The data 2, 5, 1, 3, 5, 7, 6, 3, 8 have two modes 3 and 5.

Therefore, each is repeated two times which is maximum.

Lesson 4. Range

According to Matloff (2019), Range is the difference between the lowest and highest
values. Example: In {4, 6, 9, 3, 7} the lowest value is 3, and the highest is 9. So, the range is 9 −
3 = 6.

Example:

What is the range for the following set of numbers? 10, 99, 87, 45, 67, 43, 45, 33, 21, 7, 65, 98?

Solution:

Step 1: Sort the numbers in order, from smallest to largest:


103
7, 10, 21, 33, 43, 45, 45, 65, 67, 87, 98, 99

Step 2: Subtract the smallest number in the set from the largest number in the set:

99 – 7 = 92

The range is 92

Example:

What is the range of these integers?


14, -12, 7, 0, -5, -8, 17, -11, 19

Solution:

Step 1: Sort the numbers in order, from smallest to largest:

-12, -11, -8, -5, 0, 7, 14, 17, 19

Step 2: Subtract the smallest number in the set from the largest number in the set:

19 – -12 = 19 + 12 = 31

The range is 31.

Example:

What is the range of the following times?


2.7 hrs, 8.3 hrs, 3.5 hrs, 5.1 hrs, 4.9 hrs

Solution:

104
Step 1: Sort the numbers in order, from smallest to largest:

2.7, 3.5, 4.9, 5.1, 8.3

Step 2: Subtract the smallest number in the set from the largest number in the set:

8.3 hr – 2.7 hr = 5.6 hr

The range is 5.6 hr.

105
Assessment Task 9

On a clean short bond paper answer the following provided with a complete solution.

1. Find the mean, median, mode and range for the following list of values

5, 13, 9, 7, 1, 9, 2, 9, and 11

2. Find the mean, median, mode, and range for the following list of values:

13, 18, 13, 14, 13, 16, 14, 21, 13

3. What is the arithmetic mean of the following numbers?

10, 6, 4, 4, 6, 4, 110,6,4,4,6,4,1

4. What is the mode of the following numbers?

10, 6, 4, 4, 6, 4, 110,6,4,4,6,4,1

5. The set of scores 12, 5, 7, -8, x, 10 has a mean of 5. Find the value of x.

Summary

Mean is the value obtained by adding the values in the distribution and dividing the sum
by the total number of values or items; it is also the simplest and most efficient measure of central
tendency.

106
Median is the middle most value in the distribution and is denoted by 𝐱̃.
Mode is the most frequent value in the distribution and is denoted by 𝐱̂.

Range is the difference between the lowest and highest values.

Reference

 Matloff, Norman. (2019), Probability and Statistics for Data Science: Chapman Hall
.

107
MODULE 10
MEASURES OF DISPERSION

Introduction

We live in a changing world, and changes are taking place in all areas of life. The study of
statistics does not show much interest in things which are constant. Many experts are engaged
in the study of changing phenomena. Agricultural, industrial and mineral production and their
transportation are of great interest to economists, statisticians, and other experts. Changes in
human populations, changes in standards of living and changes in literacy rates attract experts to
perform detailed studies and then correlate these changes to human life. Thus, variability or
variation is connected with human life and its study is very important for mankind.

The word dispersion has a technical meaning in statistics. The average measures the
center of the data, and it is one aspect of observation. Another feature of the observation is how
the observations are spread about the center. The observations may be close to the center or
they may be spread away from the center. If the observations are close to the center (usually the
arithmetic mean or median), we say that dispersion, scatter or variation is small. If the
observations are spread away from the center, we say dispersion is large (Matloff, 2019).

Learning Outcomes

At the end of this module, students should be able to:

1. Describe the applications of measure of dispersion;


2. Familiarized the variance and standard deviation; and
3. Solve the variance and standard deviation of sets of values.

108
Lesson 1. Variance and Standard Deviation

According to Matloff (2019), the standard deviation is a measure of the amount of variation
or dispersion of a set of values. A low standard deviation indicates that the values tend to be close
to the mean (also called the expected value) of the set, while a high standard deviation indicates
that the values are spread out over a wider range.

a. Variance is a measure found by squaring the individual deviations and is denoted by σ2 .


b. Standard deviation denoted by σ is found by getting the square root of the variance.

i. For Ungrouped Data

∑n ̅ ]2
i =1[xi − x
σ2 = ; sample variance
n−1

or

1 (∑n
i = 1 xi )
2
σ2 = n − 1 [∑ni=1 xi2 – ]
n

∑n ̅ ]2
i =1[xi − x
σ=√ ; sample standard deviation
n−1

∑n ̅ ]2
i =1[xi − x
σ2 = N−1
; population variance

∑n ̅ ]2
i =1[xi − x
σ=√ ; population standard deviation
N−1

ii. For Grouped Data

∑n ̅ ]2
i =1 fi [xi − x ∑n
i=1 fi xi
σ2 = ; x̅ =
n−1 n

109
2
1 (∑k
i = 1 fi xi )
σ =
2 [∑ki=1 fi xi2 – ]
n−1 n

Example:

For the set of values 8, 16, 12, 10, 14, 16, find
(a) ∑6i= 1(xi − x̅)
(b) ∑6i=1|xi − x̅|, and
(c) M.D.

Solution:
a. determine ∑6i= 1(xi − x̅)

8 +16 + 12 + 10 + 14 + 16
x̅ =
6
x̅ = 11
∑6i= 1(xi − x̅) = (8 – 11) + (16 – 11) + (12 – 11) + (10 – 11) + (14 – 11) + (16 – 11)

∑6i= 1(xi − x̅) = 0 ← Answer

b. determine ∑6i=1|xi − x̅|

∑6i=1|xi − x̅| = |8 − 11| + |16 − 11| + |12 − 11| + |10 − 11| + |14 − 11| + |16 − 11|

∑6i=1|xi − x̅|= 18 ← Answer

c. determine M.D.

∑6i=1|xi −x
̅| 18
M.D. = =
𝑛 6

M.D. = 3 ← Answer

Example:

For the following data 18, 19, 16, 12, 7, 10, 23; find

110
(a) x̅,
(b) M.D,
(c) σ2 , and
(d) σ

Solution:

a. determine x̅

18 + 19 + 16 + 12 + 7 + 10 + 23
x̅ =
7

x̅ = 15 ← Answer

b. determine M.D.

|18 − 15| + |19 − 15| + |16 − 15| + |12 − 15| + |7 − 15|+|10 − 15|+|23 − 15|
M.D. =
7

M.D. = 4.57 ← Answer

c. determine σ2

∑7i=1[xi − 11]2
σ2 =
7−1

(18 − 15)2 +(19 − 15)2 +(16 − 15)2 +(12 − 15)2 +(7 − 15)2 +(10 − 15)2 +(23 − 15)2
=
6

σ2 = 31.33 ← Answer

d. determine σ

2
∑7
i =1[xi
− 15]
σ=√
7−1

σ = 5.597 ← Answer
111
Assessment Task 10

On a clean short bond paper answer the following provided with a complete solution.

1. Sam has 20 Rose Bushes. The number of flowers on each bush is

9, 2, 5, 4, 12, 7, 8, 11, 9, 3, 7, 4, 12, 5, 4, 10, 9, 6, 9, 4

Work out the Standard Deviation.

2. Sam has 20 rose bushes, but only counted the flowers on 6 of them!

The "population" is all 20 rose bushes, and the "sample" is the 6 bushes that Sam

counted the flowers of.

Let us say Sam's flower counts are: 9, 2, 5, 4, 12, 7

Work out the Standard Deviation.

Summary

Measures of central tendency (mean, median and mode) provide information on the data
values at the center of the data set. Measures of dispersion (quartiles, percentiles, ranges)
provide information on the spread of the data around the center. In this section we will look at
two more measures of dispersion called the variance and the standard deviation.

112
a. Variance is a measure found by squaring the individual deviations and is denoted by σ 2.
b. Standard deviation denoted by σ is found by getting the square root of the variance.

Reference

 Matloff, Norman. (2019), Probability and Statistics for Data Science: Chapman Hall

113

You might also like