You are on page 1of 19

COLLEGE OF ENGINEERING

Bachelor of Science in Civil Engineering


1st Semester, SY 2020-2021

Name of Student: __________________________________ Course/ Year: ________________


Week 1 Module 1 Submission: December , 2020

MODULES IN
ES 214 –
ENGINEERING DATA ANALYSIS

MODULE 1:
INTRODUCTION to STATISTICS

JEMIL L. DULAY
SPECIAL LECTURER

ES 214a- ENGINEERING DATA ANALYSIS 1


TABLE OF CONTENTS

Pages

Title Page 1
Table of Contents 2
About the Module 3

MODULE 1 Introduction to Statistics


 Learning Outcomes
▪ Pre-test
▪ What do You Need to Know?
▪ What Have You Learned?
▪ Feedback
▪ Summary
▪ References

MODULE 2 Probability
 Learning Outcomes
▪ Pre-test
▪ What do You Need to Know?
▪ What Have You Learned?
▪ Feedback
▪ Summary
▪ References

MODULE 3 Discrete Probability Distribution


▪ Learning Outcomes
▪ Pre-test
▪ What do You Need to Know?
▪ What Have You Learned?
▪ Feedback
▪ Summary
▪ References

MODULE 4 Continuous Probability Distribution


▪ Learning Outcomes
▪ Pre-test
▪ What do You Need to Know?
▪ What Have You Learned?

ES 214a- ENGINEERING DATA ANALYSIS 2


▪ Feedback
▪ Summary
▪ References

MODULE 5 Joint Probability Distribution


▪ Learning Outcomes
▪ Pre-test
▪ What do You Need to Know?
▪ What Have You Learned?
▪ Feedback
▪ Summary
▪ References

MODULE 6 Sampling Distributions and Point Estimation of Parameters


 Learning Outcomes
▪ Pre-test
▪ What do You Need to Know?
▪ What Have You Learned?
▪ Feedback
▪ Summary
▪ References
MODULE 7 Statistics Interval
▪ Learning Outcomes
▪ Pre-test
▪ What do You Need to Know?
▪ What Have You Learned?
▪ Feedback
▪ Summary
▪ References

MODULE 8 Test of Hypothesis for a Single Sample


▪ Learning Outcomes
▪ Pre-test
▪ What do You Need to Know?
▪ What Have You Learned?
▪ Feedback
▪ Summary
▪ References

ES 214a- ENGINEERING DATA ANALYSIS 3


MODULE 9 Statistical Inference of Two Samples
 Learning Outcomes
▪ Pre-test
▪ What do You Need to Know?
▪ What Have You Learned?
▪ Feedback
▪ Summary
▪ References

MODULE 10 Simple Linear Regression and Correlation


▪ Learning Outcomes
▪ Pre-test
▪ What do You Need to Know?
▪ What Have You Learned?
▪ Feedback
▪ Summary
▪ References

ABOUT THE LEARNING MODULES

Welcome to ES 214- Engineering Data Analysis!

This module introduces different methods of data collection and the suitability of using a
particular method for a given situation. It includes a coverage and discussion of the relationship of
probability to statistics, probability distributions of random variables and their uses, linear functions
of random variables with the context of their application to data analysis and inference, estimation
techniques for unknown parameters, and hypothesis testing used in making inferences from
sample to population, inference for regression parameters and build models for estimating means
and predicting future values of key variables and under study. Statistically based experimental
design techniques and analysis of outcomes of experiments are discussed with the aid of
statistical software.. It consists of 3 competencies that you as learners to achieve, as follows:

ES 214a- ENGINEERING DATA ANALYSIS 4


1. Compute the probability distribution of a random variable for both discrete and continuous
data;
2. Apply statistical methods in the analysis of data;
3. Identify, formulate and solve complex problems in Civil/ Electrical engineering

These competencies are covered separately in 10 lessons. As shown below, lessons is


directed to the attainment of various learning outcomes.

MODULE 1 Introduction to Statistics


MODULE 2 Probability
MODULE 3 Discrete Probability Distribution
MODULE 4 Continuous Probability Distribution
MODULE 5 Joint Probability Distribution
MODULE 6 Sampling Distributions and Point Estimation of Parameters
MODULE 7 Statistics Interval
MODULE 8 Test of Hypothesis for a Single Sample
MODULE 9 Statistical Inference of Two Samples
MODULE 10 Simple Linear Regression and Correlation

Your success in this course is shown in your ability to perform the performance standards
found in each learning outcome.

How Do You Use This Learning


Modules

This learning material has 10 modules. Each has the following parts.

▪ Learning Outcomes
▪ Pre-test
▪ What do You Need to Know?
▪ What Have You Learned?
▪ Feedback
▪ Summary
▪ References

To get the most from these Modules, you need to do the following:
1. Begin by reading and understanding the Learning Outcome/s. These tell you what
you should know and be able to do at the end of this Module.
ES 214a- ENGINEERING DATA ANALYSIS 5
2. Find out what you already know by taking the Pretest then check your answer against
the Answer Key. Then, go through the Lesson and review especially those items which you
failed to get.
3. Do the required Learning Activities. They begin with one or more Information Sheets.
An Information Sheet contains important notes or basic information that you need to know.
After reading the Information Sheet, test yourself on how much you learned by
means of the Self-check. Refer to the Answer Key for correction. Do not hesitate to go
back to the Information Sheet when you do not get all test items correctly. This will
ensure your mastery of basic information.
4. You must be able to apply what you have learned in another activity or in real life
situation.

Each Lesson also provides you with references and definition of key terms for your guide. They can
be of great help. Use them fully.

ES 214a- ENGINEERING DATA ANALYSIS 6


MODULE NO.
The Use and

1
Importance of Statistics, A Brief History of Statistics, Types of
Measurement, Statistical Symbols, Summation Notation, The
Nature of Statistics, and Sample and Population

Course Overview:
This course introduces different methods of data collection and the suitability of using a
particular method for a given situation. It includes a coverage and discussion of the relationship of
probability to statistics, probability distributions of random variables and their uses, linear functions
of random variables with the context of their application to data analysis and inference, estimation
techniques for unknown parameters, and hypothesis testing used in making inferences from sample
to population, inference for regression parameters and build models for estimating means and
predicting future values of key variables and under study. Statistically based experimental design
techniques and analysis of outcomes of experiments are discussed with the aid of statistical
software.
This is a 3-unit course which covers 3 hours of meeting per week (3 hours of lectures).
Therefore, you must complete all the activities or exercises written in this module in a specific length
of time. You are expected to complete all the quizzes, assignments, activity and pass the Midterm
and Final Exams. Your grade will depend on this Grading System: 10% (attendance) 20% (quizzes),
20% (assignment/ activity), 25% (midterm exam) and 25 % (final exam).

Learning Outcomes:
After completing this module, the students are expected to:
1. Know the use and importance of statistics
2. Learn a brief history of statistics
3. Classify data types of measurement
4. Differentiate measurement of scales
5. Familiarize the use of statistical symbols
6. Find the sum of given values by using Summation Notation
7. Learn the Nature of Statistics
8. Compute the sample size

ES 214a- ENGINEERING DATA ANALYSIS 7


Module Pre-Test:

Direction: Read the statement or question carefully and select the best answer from the given
choices in each item.

________ 1.It is the art and science of collecting, presenting, analyzing and interpreting data. These data
may be in sports, business, politics, education and practically all fields of human endeavor dealing with
statistics.
a. analysis
b. statistics
c. data
d. probability

________ 2. It refers to some techniques which are concerned with the presentation and
collection of data or information.
a. statistics
b. probability
c. collection
d. descriptive statistics
e.
________ 3. It is the manipulation of the data gathered using descriptive and inferential
statistics.
a. interpretation
b. analysis
c. grouped data
d. none of the above

_________ 4. It is the process and methods of gathering information by interview, questionnaire,


experiments, observation and documentary analysis.
a. Data presentation
b. collection
c. data collection
d. data distribution

________ 5. It refers to the elements of objects or individuals selected from the population.
a. population
b. parameter
c. sample
d. Sample size

ES 214a- ENGINEERING DATA ANALYSIS 8


Key Terms:

Statistics Analysis Data


Interpretation Data collection Data presentation
Descriptive statistics Inferential statistics Graphical presentation
Parameter sampling Sample

What Do You Need to Know?

 The use and importance of statistics


Statistics is the art and science of collecting, presenting, analyzing and interpreting data.
These data may be in sports, business, politics, education and practically all fields of human
endeavor dealing with statistics.
Statistics as another field of discipline is not only useful but every much important to all of
use. Take for instance, from the time a child is born or even during his conception, lots of data are
taken and analyzed. His weight, length, measure of the chest and many more. As the child grows
older, more data are taken regarding his characteristics and personality.
Analysis is the manipulation of the data gathered using descriptive and inferential statistics.
Data point to statistical facts, principles, opinions and various items of different sources.
Today, there are vast data associated with sports, business, politics, education , medicine,
traffic, stock market inflation, unemployment, popularity ratings, and thousands of other human
activities. The winning of an election and the movement of the typhoons are being predicted. The
discovery of new medicine which enhances the life expectancy of an individual and even going to
the moon and reaching the Mars are somehow the product of statistics. In a limited sense, every
individual or group of individuals is statistics.
Statistics may also be considered a measure or estimate of the characteristics of the
population parameters. It may also be considered as a method on how vast data is organized and
presented in a comprehensive manner.
Data collection is the process and methods of gathering information by interview, questionnaire,
experiments, observation and documentary analysis.
Data presentation takes the form of tables and graphs.
Graphical presentation points to the construction of bar graphs, frequency polygons, pie charts and
pictographs, among others.
Statistical methodology may be classified into two major functions: the descriptive and the
inferential statistics.
Descriptive statistics includes frequency distribution, measures of central tendency, measures of
central location, measures of dispersion or variation, graphs, skewness and kurtosis. Likewise, it
refers to some techniques which are concerned with the presentation and collection of data or
information.

ES 214a- ENGINEERING DATA ANALYSIS 9


Inferential statistics is the technique by which decision and conclusion are to made from the
population observed using only the representative samples. This statistics includes both parametric
and nonparametric tests which are more concerned with generalizing information or making
inference about the population through representative samples.
For instance each student entering as first year in a certain university has a folder which
contains the scores of the admission test, result of the physical examination, and scores based upon
interest and personality inventories. As a whole, these individual folders, shows mass information
about the students. To know about the whole class, all these data must be studied. To present these
data, frequency distribution will be used. To get their result in the admission test on mental ability
test, measures of central tendency and central location will be used.
Frequency distribution is the tabulation of data of measures grouped with class interval.

Likewise, measures of variation will be used to find out the variation among the students in
the examination result. We can draw graphs that would show the difference between male and
female or we can show how many students are above average, average and below average through
graphs.
These are some of the reasons for studying statistics other than collection and
presentation of data but knowing how to use the subject in a research task.

Knowledge of statistics is basic in understanding articles published in newspapers,


scientific magazines, and modern texts in science. A candidate in an election can be predicted by
means of poll survey or exit poll survey. Games of chance are the result of statistical probability.

 A Brief History of Statistics


Statistics has been used by the ancient chief in counting the number of effective warriors to
defeat the enemy since wars before were won by the number of trained warriors, the ruler also
figured out how much taxes will be collected to maintain their kingdom. These statistics are purely
descriptive in nature.
In the seventeenth towards the eighteenth centuries, mathematicians were asked by
gamblers to develop principles that would improve the chances of winning at cards, dices and coins.
Bernoulli and De Moivre were the two mathematicians who studied probability. In 1730s, De Moivre
developed the equation for the normal curve. During the 19th century, La Place and Gauss were the
two other mathematicians applied probability principles to astronomy.
In the early 19th century, a famous Belgian statistician in the name of Quetelet applied
statistics to investigation of social and education problems. Quetelet was able to develop statistical
theory on a general method of research to science.
Francis Galton had the greatest effect upon the introduction and use of statistics in the social
sciences. His contribution in the field of heredity and eugenics, psychology, anthropometry, and
statistics. The concept of correlation or the measure of agreement between two variables is credited
to him. Galton also contributed the development of centiles and percentiles. Pearson is another
mathematician who collaborated with Galton in developing many of the correlations and regression
formulas that are being used today.

ES 214a- ENGINEERING DATA ANALYSIS 10


James McKeen Cattell is a famous American psychologist who studied in Europe in 1880
and contacted Galton and other European statisticians. Upon his return to United States, he and his
students, including E. L Thorndike, apply statistical methods to psychology and educational
problems.
In the 20th century, new techniques and methods were applied to the study of small samples
by R. A. Fisher, and English statistician. Most of his contributions were applied in agriculture and
biological settings.
Today, statistics is the major tool used by researchers in agriculture, biological, business,
medical, behavioural and social sciences.

 Types of Measurement
The data can be classified into two types.
1. Continuous and
2. Discontinuous or Discrete data.
Continuous data are measures like feet, pounds, kilos, minutes, and meters. These kinds of
data can be made into measurement of varying degrees of precision, for example, 1 yard equals 3
feet, 1 foot equals 12 inches.
Discontinuous or discrete data are measurement expressed in whole units. Counting of
people, number of objects, number of cars passing by, number of houses, number of students,
workers, and so on.

 Measurement of Scales
According to Stevens, there are four types of scales that are used in sciences. These are the
nominal, ordinal, interval, and ratio.
Nominal scales are used as measures of identity. Examples of this are classification of
individuals into categories. Like gender, male and female; yes and no answers; in religion for
instance, Muslim and Christians; for political parties, LP, Laban, Lakas, and KNP; dwelling place,,
rural and urban; and more of such categories.
Ordinal scale is used in measurement like ranking of individuals or objects. Ordinal measures
reveal which person or object is large or small, harder or softer, responses like Strongly Agree,
Agree, No Opinion, Disagree, and Strongly Disagree.
Interval scales are numbers that reflect differences among items. Examples are scores in a
test, grades of students, age, blood pressures, Fahrenheit and Celsius thermometers.
Ratio scale is the highest type of scale. The basic difference between the interval and ratio
scale is that ratio scale are the measures of length, weight, loudness, width, and so on.

 Statistical Symbols
You should familiarize the use of the following symbols although notational usage varies from
one author to another author but the following are some of the common symbols used in
statistics:

ES 214a- ENGINEERING DATA ANALYSIS 11


∑ capital letter sigma denotes summation of, the sum of
f small letter f denotes frequencies
F capital letter F denotes cumulative frequencies
n small letter n denotes sample size
i small letter i denotes interval
N capital letter N denotes population size
X capital letter X denotes independent variable
Y capital letter Y denotes dependent variable
𝑋̅ denotes mean of the sample
M capital letter m denotes population mean

You will find very useful to familiarize the following expressions:


𝑥 = 𝑦 x equals y
𝑥1𝑦 x is not equal to y
𝑥 > 𝑦 x is greater than y
𝑥 < 𝑦 x is lesser than y
𝑥 3𝑦 x is greater than or equal to y
𝑥 £ 𝑦 x is lesser than or equal to y

The characteristics of the population are called Parameters while the characteristics of
the sample are called statistics. Consider the following different symbols on the characteristics,
parameters, and statistics.

Characteristics Parameters Statistics


Mean µ, mu 𝑥̅

Standard Deviation Ơ, sigma s

Number of Cases N n

Proportion P p

Pearson Product Moment R r


Correlation Coefficient

Variance S2 s2

ES 214a- ENGINEERING DATA ANALYSIS 12


 Summation Notation
Example 1.
If N=5 and the following observations are X1=2; X2=4; X3=3; X4=5; X5=6, find the sum of the
five values of Xi using summation notation.
Solution:
𝑁
= 𝑋1 + 𝑋2 + 𝑋3 + 𝑋4 + 𝑋5
∑ 𝑋𝑖 =2+4+3+5+6
𝑖=1 = 20

Example 2.
Suppose 𝑎 be a constant. Find the sum of the values when a constant has been added to
each. Use example 2, where N=3 and X1=5; X2=4; X3=1.

Solution:

𝑁
= (𝑋1 + 𝑎) + (𝑋2 + 𝑎) + (𝑋3 + 𝑎)
∑(𝑋𝑖 + 𝑎) =5+𝑎+4+𝑎+1+𝑎
𝑖=1 = 10 + 3𝑎

So we can say that the sum of the values of a variable plus a constant is equal to the
sum of the values of the variables plus N times the constant. Threfore;
𝑁 𝑁

∑(𝑋𝑖 + 𝑎) = ∑ 𝑋1 + 𝑁𝑎
𝑖=1 𝑖=1

 The Nature of Statistics


Statistical investigation can be classified into two major functions:
1. Descriptive Statistics
2. Inferential Statistics
Descriptive Statistics is the method of collecting and presenting data. It includes the
computation of measures of central tendency, measures of central location, likewise the measures
of dispersion or variability. It also includes the construction of tables and graphs.
Inferential Statistics is concerned with higher degree of critical judgment and advanced
mathematical modes such as using the different statistical tools both the parametric and
nonparametric tests. This is concerned with the analysis and interpretation of data in order to draw
conclusion and generalization from organized data. This also includes the testing of the significant
relationship between the dependent and the independent variables as well as the significant
differences between and among independent samples.

ES 214a- ENGINEERING DATA ANALYSIS 13


 Sample and Population
Population identifies the totality of objects under investigation. The researcher may use the
population as the subject of studies when it is small and manageable when employing statistical
methods. However, if the population is too large, the researcher may use the representative
sample.
Sampling is the method of getting a small part from the population that serves as the
representative of the population called sample.

If the population under study is too large to handle and will entail too much time, cost and
effort, taking samples is a very good alternative. It should be noted that if a small part of the
population is considered, sampling error should be expected. Thus in drawing conclusions about
the population from which a sample is drawn, the researcher should learn how to draw samples
that are truly representative of the population.

The problem that is commonly encountered is the sample size. It is not advisable to set a
certain percentage; instead the margin of error which is from 1% to 10% in social science
researchers should be considered.

Sample size computation formula:

𝑁
n = 1+ 𝑁𝑒 2
N = the population size
e2= the margin of error
n = the sample size

Example 1. Find the sample if the population size is 2500 at 95% accuracy.
Solution: At 95% accuracy, the corresponding percentage margin of error is 5% or .05
using the formula,
𝑁
n = 1+ 𝑁𝑒2

2500
= 1+ 2500 (0.05)2

2500
=
1+ 6.25

2500
= 7.25

n = 344.83 or 345

What have You Learned?


I. Direction: Read the following sentences carefully. Write the letter of your answer on a separate sheet of
paper

1. Define as point to statistical facts, principles, opinions and various items of different sources.
A. numbers B. data C. source D. items

ES 214a- ENGINEERING DATA ANALYSIS 14


2. It refers to some techniques which are concerned with the presentation and collection of
data or information.
A. statistics B. probability B. collection D. descriptive statistics

3. In the seventeenth towards the eighteenth centuries, __________were the two


mathematicians who studied probability.
A. Bernoulli and De Moivre B. Bernoulli and Gauss

C. De Moivre and La Place D. La Place and Gauss

4. _________ identifies the totality of objects under investigation.


A. Population B. Sample C. data D. collection

5. Sampling is the method of getting a small part from the population that serves as the
representative of the population called _______.

A. Population B. Sample C. data D. collection

II. Direction: Read each question carefully. Choose the letter of the correct answer inside the box
below. Write your answer on a separate answer sheet.

1. _________is the manipulation of the data gathered using descriptive and inferential
statistics.
2. ________are measurement expressed in whole units.
3. Capital letter ______ denotes summation of, or the sum of
4. Capital letter ____ denotes population size.
5. Symbolizes the sample size.

A- discontinuous or discrete data D- n


B- ∑ E- short circuit
C- N F- Analysis

III. Direction: Read each question carefully. Write T if the statement is correct and write F if the
statement is incorrect and give the right term or idea to make the statement correct. Write your
answer on a separate answer sheet.

1. Statisticians may also be considered a measure or estimate of the characteristics of the


population parameters.
2. In the early 19th century, a famous Belgian statistician in the name of Quetelet applied
statistics to investigation of social and education problems.
3. Francis Galton had the greatest effect upon the introduction and use of statistics in the social
sciences. The concept of correlation or the measure of agreement between two variables is
credited to him. He also contributed the development of centiles and percentiles.
4. Statistics is the art and science of collecting, presenting, analyzing and interpreting data.
These data may be in sports, business, politics, education and practically all fields of human
endeavor dealing with statistics.
5. Data presentation takes the form of tables and graphs.

ES 214a- ENGINEERING DATA ANALYSIS 15


IV- Direction: Read each question carefully. Find what is ask in the each problem. Write your
answer on a separate answer sheet.

1. If N=3 and the following observations are X1=5; X2=4; X3=1, find the sum of the three values
of Xi, using summation notation.
2. Suppose 𝑎 be a constant has been subtracted from each observation X1. Find the values
using the notation of N=4 and X1=4; X2=7; X3=1; X4=5.
3. A researcher is conducting an investigation regarding the factors affecting the performance
of 2oo teachers in the 1st district of Catarman, N., Samar. If the margin of error is 3%, how
many of the teachers should be taken as respondents?

Feedback:
1. From the knowledge and information given to you in this module, what particular benefit you
find more helpful to you?

2. Can you gather some data showing applications of statistics?

Summary:
Analysis is the manipulation of the data gathered using descriptive and inferential statistics.
Cumulative frequency is used in getting the value for the median, quartiles, deciles and percentiles.
Data point to statistical facts, principles, opinions and various items of different sources.
Data collection is the process and methods of gathering information by interview, questionnaire,
experiments, observation and documentary analysis.
Data presentation takes the form of tables and graphs.
Descriptive statistics includes frequency distribution, measures of central tendency, measures of central
location, measures of dispersion or variation, graphs, skewness and kurtosis. Likewise, it refers to some
techniques which are concerned with the presentation and collection of data or information.
Frequency distribution is the tabulation of data of measures grouped with class interval.
Graphical presentation points to the construction of bar graphs, frequency polygons, pie charts and
pictographs, among others.
Inferential statistics is the technique by which decision and conclusion are to made from the population
observed using only the representative samples. This statistics includes both parametric and nonparametric
tests which are more concerned with generalizing information or making inference about the population
through representative samples.
Interpretation makes clear results of the analysis using statistical methods to see whether significant
differences or relationships exists between variables.
Parameter is a characteristic of a population.

ES 214a- ENGINEERING DATA ANALYSIS 16


Population is the totality of all the actual observable characteristics of a set of objects or individuals.
Random sampling involves the selection of samples such that each sample of a given size has precisely the
probability of being selected. It includes the simple random, stratified, cluster and multistage sampling
techniques.
Sample refers to the elements of objects or individuals selected from the population.
Schedule is the extensive set of question and instruction used in personal interview.
Statistics is the art and science of collecting, presenting, analyzing and interpreting data. These data may be
in sports, business, politics, education and practically all fields of human endeavor dealing with statistics.

References:

Basic Concepts in Statistics. Luis A. Tattao (2012)

Statistics (Based on CMO 03 Series of 2007). R. Arao; M.T. Arce; A.R. Copo; A. Laddaran and; L.
Mejia (2012)

General Statistics Made Simple for Filipinos. F. Nocon; J. Torrecampo; Ma. M.Balacua and; W.
Daguia (2012)

Statistics Made Simple, Second Edition. Antonio S. Broto, 2006

P. Altares A. Copo Y. Gabuyo. Elementary Statistics with Computer Applications 2 nd Ed. (2012)

ES 214a- ENGINEERING DATA ANALYSIS 17


ANSWERS KEY

PRETEST

1… b. statistics

2. .. d. descriptive statistics

3. b. analysis

4. c. data collection

5. c. sample

WHAT HAVE YOU LEARNED?

I-

1. B. data
2. d. descriptive statistics
3. A. Bernoulli and De Moivre
4. A. Population
5. B. Sample

II-

1. A- discontinuous or discrete data

2. F- Analysis

3. B- ∑

4. C- N

5. D- n

III-

1. FALSE (ans. STATISTICS)


2. T
3. T
4. T
5. T

IV-

1. 10
2. 17- 4a
3. 169.49 0r 169

ES 214a- ENGINEERING DATA ANALYSIS 18


ES 214a- ENGINEERING DATA ANALYSIS 19

You might also like