You are on page 1of 35

ENGINEERING DATA ANALYSIS

University of Southeastern Philippines


COLLEGE OF ENGINEERING
Obrero, Davao City

MATH 212
ENGINEERING DATA ANALYSIS

DALIA M. RECONALLA, Ph.D


August 2020

1|Page
ENGINEERING DATA ANALYSIS

Welcome Message

Welcome learners to the 1st semester of SY 2020-2021.


While the country is still under the state of public health emergency due
to COVID-19 pandemic, regular face-to-face classes are still suspended.
The University, in support to the memorandum order issued by the
Commission on Higher Education on the implementation of Alternative
Flexible Learning Modality, adapt a guided self-paced learning modality using
a course pack. To cover the 54-hour requirements for this 3-unit course, you
will be provided with this course pack with self-directed activities and tasks
for you to perform at your own pace. Outputs of the tasks and activities will
be monitored, so be conscious of the submission schedule.

It is hoped that after completing all the tasks and activities in this
course pack, you will be able to demonstrate the expected outcomes and
gained insights that are useful in your field of specialization.

God bless!

Faculty Information:

Name: Dalia M. Reconalla


Email: dalia.reconalla@usep.edu.ph
Contact Number: 0906-209-6611
Office: College of Engineering
Contact Number: (082) 224-3334
Consultation Hours: By appointment - may be arranged through:
 Official email
 Facebook messenger/Facebook group chat
 Google Classroom
 UVE chat box
 Text or call

Getting help

For academic concerns (College/Adviser - Contact details)


For administrative concerns (College Dean - Contact details)
For UVE concerns (KMD - Contact details)
For health and wellness concerns (UAGC, HSD and OSAS - Contact
details)

2|Page
ENGINEERING DATA ANALYSIS

TABLE OF CONTENTS

CONTENTS PAGE

Cover page ………………………………… 1

Welcome Message ………………………………… 2

Table of Contents ………………………………… 3

USeP Vision, Mission and Goals ………………….. 4

USeP Graduate Attributes ………………………… 5

USeP Core Values ……………………………….... 5

Course Overview ………………………………… 6

Course Information ……………………………….... 7

Course Assessment ……………………………….. 8

Course Map ………………………………… 11

Module 1 Overview ………………………………… 12

Module 1 Outcomes……………..………………….. 12

Lesson 1 ………..……………………………….. 13

Application 1…………………………………………. 20

Lesson 2 ………………….……………. …………… 22

Application 2 …………………………………………. 24

Lesson 3 ……………………………………………… 26

Application 3 ………………………………………….. 31

Summary ………………………………………….. 32

Assessment …………………………………………… 33

References …………………………………... 35

3|Page
ENGINEERING DATA ANALYSIS

UNIVERSITY OF SOUTHEASTERN PHILIPPINES

VISION

Premier Research University in the ASEAN.

MISSION

USeP shall produce world-class graduates and relevant research and


extension through quality education and sustainable resource management.

GOALS

At the end of the plan period, the University of Southeastern Philippines


(USeP) aims to achieve five comprehensive and primary goals:

1. Recognized ASEAN Research University


2. ASEAN Competitive Graduates and Professionals
3. Vibrant Research Community
4. Proactive Research-based Economic Empowering
Extension Services
5. Capacity for Innovative Resource Generation

4|Page
ENGINEERING DATA ANALYSIS

INSTITUTIONAL GRADUATE ATTRIBUTES

LEADERSHIP SKILLS

Creates and inspires positive changes in the organization; exercises


responsibility with integrity and accountability in the practice of one’s
profession or vocation.

CRITICAL AND ANALYTICAL THINKING SKILLS

Demonstrates creativity, innovativeness, and intellectual curiosity in


optimizing available resources to develop new knowledge, methods,
processes, systems, and value-added technologies.

SERVICE ORIENTED

Demonstrates concern for others, practices professional ethics, honesty, and


exemplifies socio-cultural, environmental concern, and sustainability.

LIFELONG LEARNING

Demonstrates enthusiasm and passion for continuous personal and


professional development.

PROFESSIONAL COMPETENCE

Demonstrates proficiency and flexibility in the area of specialization and in


conveying information in accordance with global standards.

CORE VALUES OF THE UNIVERSITY

UNITY
STEWARDSHIP
EXCELLENCE
PROFESSIONALISM

5|Page
ENGINEERING DATA ANALYSIS

THE COURSE OVERVIEW

This course introduces students to the fundamental concepts of data


analysis. basic statistical techniques, probability, and predictive modeling, and
how they impact engineering. Essentially, the course also introduces the
design of experimental or observational studies aimed at enabling the
understanding, analyses and interpretation of data and to deliver datasets from
which robust conclusions can be drawn.

The course is organized into 6 modules. Each module contains reading


material of which students are advised to read them sequentially to
understand and connect the lessons. Lesson applications and assessments in
a form of problem sets or exercises are also provided where students are
required to answer and submit for monitoring and evaluation.

Good luck!

6|Page
ENGINEERING DATA ANALYSIS

COURSE INFORMATION

COURSE TITLE : Engineering Data Analysis


CREDIT : 3.0
SEMESTER :First
TIME FRAME : August 2020 – December 2020

COURSE DESCRIPTION

The course introduces different methods of data collection and the


suitability of using a particular methods for a given situation. It includes a
coverage and discussion of the relationship of probability to statistics,
probability distributions of random variables and their uses, linear functions of
random variables within the context of their application to data analysis and
inference, estimation techniques for unknown parameters, and hypothesis
testing used in making inferences from sample to population, inference for
future values of key variables under study. Finally, statistically based
experimental design techniques and analysis of outcomes of experiments are
discussed with the aid of statistical software.

COURSE OUTCOMES:

CO1: Apply statistical methods in the analysis of data.


CO2: Design experiments involving several factors.

7|Page
ENGINEERING DATA ANALYSIS

COURSE ASSESSMENT

Learning Evidence

Learning
Description and other Details
Evidence
Multiple choice examination having problem-
LE1 Results of major solving components through UVE, Edmodo,
examinations Google Classroom, or other platform that will
be agreed upon. Midterm and/or Final
examinations maybe conducted physically in
the classroom if the situation permits.

Problem solving assessment covering the


Results of Module contents of the module in the course pack.
LE2 Assessment Students will submit their assessment outputs
through UVE, Google Classroom, FB
messenger, Email or through courier or USEP
drop boxes.

AA1 Results of Problem solving type of quizzes through UVE,


Quizzes Google Classroom, FB messenger or other
platform that will be agreed upon.
AA2 Lesson Problem solving type of assessment through
Assessments UVE, Google Classroom, FB messenger or
other platform that will be agreed upon.
Students will submit their assessment outputs
through UVE, Google Classroom, FB
messenger, Email or through courier or USEP
drop boxes.

Measurement Rubrics

Beyond Unsatisfactory
Area to Expected Satisfactory (25%)
Expectation
Assess (75%) (50%)
(100%)
Solution to Used an Used an Used an Attempted to
Problems appropriate appropriate appropriate solve the
concept to concept to concept but problem but
come up with come up come up with used an
a correct with a an entirely inappropriate
solution and solution. wrong concept that
use other But a part solution that led to a wrong
relevant of the led to an solution.
strategy to solution led incorrect
arrive at a to an answer.
correct incorrect
8|Page
ENGINEERING DATA ANALYSIS

answer answer.

Grading System

Grade Source (Score or Rubric Percentage of


Assessment Activity Grade) Final Grade
Quizzes Score (with rubrics for problem 20%
solving)
Lesson Assessments Score (with rubrics for problem 10%
solving)
Major Examinations Score (with rubrics for problem 60%
solving)
Module Assessments Score (with rubrics for problem 10%
solving)
TOTAL 100%

Formula for Computation of Grade

Grade per assessment =

Passing grade: 75%

Submission of Requirements

o Digital submission

 For handwritten solution or answers, students are required to


take photos or scan their works and have these pasted in word
file and save with the following information written on the upper
left corner of each page with signature above the name.

Name, Course, Year and section


Module #, Lesson #

Example: Juan dela Cruz, BSCE, 1-2


Module 2, Lesson 1

 Documents can be sent through the professor’s official email,


FB messenger, or class USeP VLE account.

o University Dropbox or Courier Submission

 Hardcopy of assessment outputs must be placed in a long board


folder of any color properly fastened with complete information
written in the front cover of the folder:
College:

9|Page
ENGINEERING DATA ANALYSIS

Name of Student:
Course:
Subject:
Date of Submission:
Professor’s Name:

The professor in charge must be informed of the submission through


SMS or FB Messenger.

10 | P a g e
ENGINEERING DATA ANALYSIS

The Course Map

11 | P a g e
ENGINEERING DATA ANALYSIS

MODULE OVERVIEW

Welcome Learners! This module will introduce you to


limits of functions and the fundamental principles of
derivatives of functions. Describing differentiation
concept as a rate of change is also the focus of this
module.

This module is organized into four lessons as follows:


o Lesson 1: Methods of Data Collection
o Lesson 2: Planning and Conducting Survey
o Lesson 3:Planning and Conducting an
Experiment: Introduction to Design of
Experiment

MODULE OUTCOME

At the completion of the module, you should be able


to:

o Distinguish appropriate data collection method for


a particular study.
o Identify the type of statistical design appropriate
in a particular study.

12 | P a g e
ENGINEERING DATA ANALYSIS

Learning Outcome:

o Describe data analysis process.


o To distinguish observational studies and experimental studies.
o To determine which type of design to be used in a particular study.

Time Frame: 2 hours

Introduction

A primary goal of statistical studies is to collect data that can then be used to
make informed decisions. It should come as no surprise that the ability to make good
decisions depends on the quality of the information available. This lesson introduces
the data analysis process, types of data , and the different method of data collection.

Abstraction

1.1 Data Analysis Process

Statistics involves collecting, summarizing, and analyzing data. All three tasks are
critical. Without summarization and analysis, raw data are of little value, and even
sophisticated analyses can’t produce meaningful information from data that were not
collected in a sensible way.

Statistical studies are undertaken to answer questions about our world.


For instance:
1. Is a new flu vaccine effective in preventing illness? Is the use of bicycle
helmets on the rise?
2. Are injuries that result from bicycle accidents less severe for riders who wear
helmets than for those who do not?
3. How many credit cards do college students have?
4. Do engineering students pay more for textbooks than do education students?

Data collection and analysis allow researchers to answer such questions. The process
can be organized into the following six steps:

1. Understanding the nature of the problem. Effective data analysis requires


an understanding of the research problem. The goal of the research and what
questions we hope to answer. It is important to have a clear direction before
gathering data to ensure that questions of interest will be answered using the
data collected.

13 | P a g e
ENGINEERING DATA ANALYSIS

2. Deciding what to measure and how to measure it. The next step in the
process is deciding what information is needed to answer the questions of
interest.

Example 1: In a study of the relationship between the student’s


achievement in the courses Calculus 1 and English, you would need to collect
data test scores in Calculus 1 and in English.

Example 2: In a study of the relationship between preferred learning style


and intelligence of first year engineering students. How would you define
learning style and measure it and what measure of intelligence would you use?

It is important to carefully define the variables to be studied and to develop


appropriate methods for determining their values.

3. Data collection. The data collection step is crucial. The researcher must first
decide whether an existing data source is adequate or whether new data must be
collected. Even if a decision is made to use existing data, it is important to
understand how the data were collected and for what purpose, so that any
resulting limitations are also fully understood and judged to be acceptable. If new
data are to be collected, a careful plan must be developed, because the type of
analysis that is appropriate and the subsequent conclusions that can be drawn
depend on how the data are collected.

4. Data summarization and preliminary analysis. After the data are collected,
the next step usually involves a preliminary analysis that includes summarizing
the data graphically and numerically. This initial analysis provides insight into
important characteristics of the data and can provide guidance in selecting
appropriate methods for further analysis.

5. Formal data analysis. The data analysis step requires the researcher to select
and apply statistical methods.

6. Interpretation of results. Several questions should be addressed in this final


step. Some examples are:
a. What can we learn from the data?
b. What conclusions can be drawn from the analysis?
c. How can our results guide future research?

Illustration 1. The Admission Director of a university is interested in


learning why some applicants who were accepted for the first semester of
SY 2019-2020 failed to enrol at the university. The population of interest
to the director consists of all accepted applicants who did not enrol in the
first semester of SY 2019-2020. Because this population is large and it
may be difficult to contact all the individuals, the director might decide to
collect data from only 100 selected students.

14 | P a g e
ENGINEERING DATA ANALYSIS

From Illustration 1, deciding how to select the 100 students and what data should
be collected from each student are steps 2 and 3 in the data analysis process.
These 100 students constitute a sample.

Definition 1.

The entire collection of individuals or objects about which information is


desired is called the population of interest. A sample is a subset of the
population, selected for study.

Methods for organizing and summarizing data, such as the use of tables,
graphs, or numerical summaries, make up the branch of statistics called descriptive
statistics.

The second major branch of statistics, inferential statistics, involves


generalizing from a sample to the population from which it
was selected.

Definition 2.

Descriptive statistics is the branch of statistics that includes methods for


organizing and summarizing data.

Inferential statistics is the branch of statistics that involves generalizing from a


sample to the population from which the sample was selected and assessing the
reliability of such generalizations.

1.2 Types of Data

The individuals or objects in any particular population typically possess many


characteristics that might be studied.

A variable is any characteristic whose value may change from one individual or
object to another.

IIlustration 2. Consider a group of students currently enrolled in a Calculus


course. One characteristic of the students in the population is the brand of
calculator owned (Casio, Sharp, Hewlett-Packard, and so on). Another
characteristic is the number of textbooks purchased that semester, and yet
another is the distance from the university to each student’s permanent
residence.

For example, calculator brand is a variable, and so are number of textbooks


purchased and distance to the university.

15 | P a g e
ENGINEERING DATA ANALYSIS

Data result from making observations either on a single variable or simultaneously on


two or more variables.

Definition 3.

A data set consisting of observations on a single characteristic is a univariate


data set.

A univariate data set is categorical (or qualitative) if the individual


observations are categorical responses.

A univariate data set is numerical (or quantitative) if each observation is a


number.

Illustration 3. Illustration 2, calculator brand is a categorical variable, because


each student’s response to the query, “What brand of calculator do you own?” is
a category. The collection of responses from all these students forms a
categorical data set.

The other two variables, number of textbooks purchased and distance to the
university, are both numerical in nature. Determining the value of such a numerical
variable (by counting or measuring) for each student results in a numerical data set.

Bivariate data result from obtaining a category or value of pairs of numbers on two
different characteristics.

Multivariate data result from obtaining a category or value for each of two or more
attributes (so bivariate data are a special case of multivariate data).

Illustration 4. Both height (in inches) and weight (in pounds) might be
recorded for each student in a class. The resulting is called a bivariate data
set. If the researcher is interested in determining height, weight, age, and
systolic blood pressure for each student in the class, the resulting data set is
called a multivariate data set.

1.2.1 Types of Numerical Data

There are two different types of numerical data: discrete and continuous.

Illustration 5. Suppose the following are the data available :

a. Time student finish answering each item in a test.


b. Weight of the children age 1 to 5 years old.
c. Number of days absent in the class in a semester.
d. Number of defectives products.

16 | P a g e
ENGINEERING DATA ANALYSIS

Discrete data usually arise when observations are determined by counting. So that
data a and b are discrete data. The rest (c and d) are continuous data.

Definition 4.
A numerical variable results in discrete data if the possible values of the
variable correspond to isolated points on the number line.

A numerical variable results in continuous data if the set of possible values


forms an entire interval on the number line

1.3 Observational and Experimentation

Data collection is a vital step in the data analysis process. It is important to keep in
mind the questions hope to answer on the basis of the resulting data. Sometimes the
researcher is interested in answering questions about characteristics of a single
existing population or in comparing two or more well-defined populations. To
accomplish this, sample is selected from each population under consideration and use
the sample information to gain insight into characteristics of those populations.

Illustration 6. A safety engineer is studying industry workers to determine


whether gender and attitude toward safety are related.
This is study observational in nature. The researcher wants to observe
characteristics of workers in an industry, and then use the resulting information
to draw conclusions.

Sometimes the questions you are trying to answer deal with the effect of
certain explanatory variables on some response and cannot be answered using data
from an observational study.

Such questions are often of the form:


 What happens when ... ? or,
 What is the effect of ... ?

Illustration 6.

A professor may wonder what would happen to test scores if the required
laboratory time for a chemistry course were increased from 3 hours to 6 hours
per week. To answer such questions, the researcher conducts an experiment to
collect relevant data. The value of some response variable (test score in the
chemistry ) is recorded under different experimental conditions (3-hour lab and
6-hour lab).

In an experiment, the researcher manipulates one or more explanatory


variables, also sometimes called factors, to create the experimental conditions.

17 | P a g e
ENGINEERING DATA ANALYSIS

Definition 5.
A study is an observational study if the investigator observes characteristics of
a sample selected from one or more existing populations.

A study is an experiment if the investigator observes how a response variable


behaves when one or more explanatory variables, also called factors, are
manipulated.

The goal of an observational study is usually to draw conclusions about the


corresponding population or about differences between two or more populations.

The goal of an experiment is to determine the effect of the manipulated explanatory


variables (factors) on the response variable.

A well-designed experiment can result in data that provide evidence for a cause-and-
effect relationship. This is an important difference between an observational study
and an experiment. In an observational study, it is impossible to draw clear cause-and-
effect conclusions because we cannot rule out the possibility that the observed effect
is due to some variable other than the explanatory variable being studied. Such
variables are called confounding variables.

Definition 6.
A confounding variable is one that is related to both group membership and
the response variable of interest in a research study.

1.4 Sampling

Many studies are conducted in order to generalize from a sample to the


corresponding population. As a result, it is important that the sample be
representative of the population. To be reasonably sure of this, we must carefully
consider the way in which the sample is selected. Even when the sample is selected
properly, there may be uncertainty about whether the survey represents the population
from which the sample was selected.

Bias in sampling is the tendency for samples to differ from the corresponding
population in some systematic way. The most common types of bias encountered in
sampling situations are selection bias, measurement or response bias, and
nonresponse bias.

o Selection Bias . Tendency for samples to differ from the corresponding


population as a result of systematic exclusion of some part of the population.

o Measurement or Response Bias. Tendency for samples to differ from the


corresponding population because the method of observation tends to produce
values that differ from the true value. This problem often is due to the specific
wording of questions in a survey, the manner in which the respondent answers
the survey questions, and the fashion in which an interviewer phrases
questions during the interview.

18 | P a g e
ENGINEERING DATA ANALYSIS

o Survey Nonresponse Bias. Tendency for samples to differ from the


corresponding population because data are not obtained from all individuals
selected for inclusion in the sample.

Note: Bias is introduced by the way in which a sample is selected or by the way
in which the data are collected from the sample. Increasing the size of the sample,
although possibly desirable for other reasons, does nothing to reduce bias if the
method of selecting the sample is flawed or if the nonresponse rate remains high.

1.3.1 Sampling Methods

o Random Sampling. A simple random sample of size n is a sample


that is selected from a population in a way that ensures that every different possible
sample of the desired size has the same chance of being selected. When selecting a
random sample, researchers can choose to do the sampling with or without
replacement.

 Sampling with replacement. After each successive item is


selected for the sample, the item is ―replaced‖ back into the population and may
therefore be selected again at a later stage. In practice, sampling with replacement is
rarely used.

 Sampling without replacement. After being included in the


sample, an individual or object would not be considered for further selection.

o Stratified Random Sampling. Sampling method wherein the entire


population can be divided into a set of non-overlapping subgroups. In stratified
random sampling, separate simple random samples are independently selected from
each subgroup.

o Cluster Sampling. This involves dividing the population of interest


into non-overlapping subgroups, called clusters.

Note: Be careful not to confuse clustering and stratification. Even though both of
these sampling strategies involve dividing the population into subgroups, both the
way in which the subgroups are sampled and the optimal strategy for creating the
subgroups are different. In stratified sampling, we sample from every stratum,
whereas in cluster sampling, we include only selected whole clusters in the sample.
Because of this difference, to increase the chance of obtaining a sample that is
representative of the population, we want to create homogeneous groups for strata
and heterogeneous (reflecting the variability in the population) groups for clusters.

o Systematic Sampling. A procedure that can be used when


it is possible to view the population of interest as consisting of a list or some other
sequential arrangement.

o Convenience Sampling. Using an easily available or convenient group

19 | P a g e
ENGINEERING DATA ANALYSIS

to form a sample. Results from such samples are rarely informative, and it is a
mistake to try to generalize from a convenience sample to any larger population.

Application: Exercise #1

1. As part of a curriculum review, a certain engineering department would like


to select a simple random sample of 20 of last year’s 140 graduates to obtain
information on how graduates perceived the value of the curriculum.
Describe two different methods that might be used to select the sample.

2. Based on a study of 1570 students in a university between the ages of 18 and


21, researchers of a Medical School concluded that there was an association
of academic performance in English and Mathematics courses. Describe the
sample and the population of interest for this study.

3. For each of the situations described, state whether the sampling procedure is
simple random sampling, stratified random sampling, cluster sampling,
systematic sampling, or convenience sampling.

. a. All fourth-year students at a university are enrolled in 1 of 12 sections of a


research course. To select a sample of fourth year at this university, a
researcher selects four sections of the research course at random from the 12
sections and all students in the four selected sections are included in the
sample.

b. To obtain a sample of students, faculty, and staff at a university, a


researcher randomly selects 50 faculty members from a list of faculty, 100
students from a list of students, and 30 staff members from a list of staff.

c. A university researcher obtains a sample of students at his university by


using the 85 students enrolled in his Math 111 class.

d. To obtain a sample of the seniors at a particular high school, a researcher


writes the name of each senior on a slip of paper, places the slips in a box and
mixes them, and then selects 10 slips. The students whose names are on the
selected slips of paper are included in the sample.

e. To obtain a sample of those attending a basketball game, a researcher


selects the 24th person through the door. Then, every 50th person after that is
also included in the sample.

Closure
Well done! You have just finished Lesson 1 of this module. Should there
be some parts of the lesson which you need clarification, please ask your tutor during
your face-to-face or on-line interactions.

20 | P a g e
ENGINEERING DATA ANALYSIS

Now if you are ready, please proceed to Lesson 2 of this module which will
discuss the most widely used data collection procedure, the survey. Information from
surveys impact nearly every facet of our daily lives. Planning and conducting surveys
will be the focus of our next lesson.

21 | P a g e
ENGINEERING DATA ANALYSIS

Learning Outcome:

o Distinguish survey data collection techniques.


o Identify problems associated with surveys.

Time Frame: 2 hours

Introduction:

Many observational studies attempt to measure personal opinion or attitudes


using responses to a survey. In such studies, both the sampling method and the design
of the survey itself are critical to obtaining reliable information.
This lesson presents the sampling designs for survey and the issues
associated with survey as a method of obtaining data.

Abstraction:

2.1 Survey Basics

Designing an observational study to compare two populations on the basis of


some easily measured characteristic is relatively straightforward, with attention
focusing on choosing a reasonable method of sample selection. However, many
observational studies attempt to measure personal opinion or attitudes using responses
to a survey. In such studies, both the sampling method and the design of the survey
itself are critical to obtaining reliable information.

Definition 1. A survey is a voluntary encounter between strangers in which an


interviewer seeks information from a respondent by engaging in a special type of
conversation. This conversation might take place in person, over the telephone, or
even in the form of a written questionnaire, and it is quite different from usual social
conversations.

Roles and Responsibilities of Interviewer and Respondents

 The interviewer gets to decide what is relevant to the conversation and may
ask questions— possibly personal or even embarrassing questions. The
respondent, in turn, may refuse to participate in the conversation and may
refuse to answer any particular question. But having agreed to participate in
the survey, the respondent is responsible for answering the questions
truthfully.

The Respondent’s Tasks

22 | P a g e
ENGINEERING DATA ANALYSIS

Task 1: Comprehension. Comprehension is the single most important task


facing the respondent, and fortunately it is the characteristic of a survey question that
is most easily controlled by the question writer. Understandable directions and
questions are characterized by (1) a vocabulary appropriate to the population of
interest, (2) simple sentence structure, and (3) little or no ambiguity.

 Vocabulary is often a problem. As a rule, it is best to use the simplest


possible word that can be used without sacrificing clear meaning.
 Simple sentence structure also makes it easier for the respondent to
understand the question.
 Ambiguity can also arise from the placement of questions as well as
from their phrasing. One way to find out whether or not a question is
ambiguous is to field-test the question and to ask the respondents if
they were unsure how to answer a question.

Task 2: Retrieval from Memory. Retrieving relevant information from


memory to answer the question is not always an easy task, and it is not a problem
limited to questions of fact.

Task 3: Reporting the Response The task of formulating and reporting a


response can be influenced by the social aspects of the survey conversation. In
general, if a respondent agrees to take a survey, he or she will be motivated to answer
truthfully. Therefore, if the questions are not too difficult (taxing the respondent’s
knowledge or memory) and if there are not too many questions (taxing the
respondent’s patience), the answers to questions will be reasonably accurate.

Three things to consider in constructing surveys and writing survey questions:

1. Questions should be understandable by the individuals in the population being


surveyed. Vocabulary should be at an appropriate level, and sentence structure should
be simple.

2. Questions should, as much as possible, recognize that human memory is fickle.


Questions that are specific will aid the respondent by providing better memory cues.

3.Questions should not create opportunities for the respondent to feel threatened or
embarrassed.

In a perfect survey, the target population would be the same as the sampled
population. This type of survey rarely happens. There are always difficulties in
obtaining a sampling frame or being able to identify all elements within the target
population.

2.2 Data Collection Techniques for Survey

Having chosen a particular sample survey, how does one actually collect the
data?
The most commonly used methods of data collection in sample surveys are:

23 | P a g e
ENGINEERING DATA ANALYSIS

1. Interviews
a. Personal Interview. The procedure usually requires the interviewer
to ask prepared questions and to record the respondent’s answers. The
primary advantage of these interviews is that people will usually respond
when confronted in person. In addition, the interviewer can note specific
reactions and eliminate misunderstandings about the questions asked.

b. Telephone interview. Surveys conducted through telephone interviews are


frequently less expensive than personal interviews, owing to the
elimination of travel expenses. The investigator can also monitor the
interviews to be certain that the specified interview procedure is being
followed.

2. Self-administered questionnaire. These questionnaires usually are mailed to


the individuals included in the sample, although other distribution methods
can be used. The questionnaire must be carefully constructed if it is to
encourage participation by the respondents. It must undergo validity and
reliability testing.

3. Direct observation. Direct observation is used in many surveys that do not


involve measurements on people.

Application: Exercise #2

1. An experimenter wants to estimate the average water consumption per family


in a city. Discuss the relative merits of choosing individual families, dwelling
units (single-family houses, apartment buildings, etc.), and city subdivisions as
sampling units.

2. As part of a curriculum review, the civil engineering department would like to


select a simple random sample of 20 of last year’s 140 graduates to obtain
information on how graduates perceived the value of the curriculum. Describe
two different methods that might be used to select the sample.

3. For the given situation, decide what sampling method you would use. Provide
an explanation of why you selected a particular method of sampling.
The major state university in Region A is attempting to lobby the state
legislator for a bill that would allow the university to charge a higher tuition
rate than the other universities in the country. To provide a justification, the
university plans to conduct a mail survey of its alumni to collect information
concerning their current employment status. The university grants a wide
variety of different degrees and wants to make sure that information is
obtained about graduates from each of the degree types. A 5% sample of
alumni is considered sufficient.

24 | P a g e
ENGINEERING DATA ANALYSIS

Closure
Congratulations! You have successfully completed the tasks and activities for
Lesson 2. It is expected that you have gained insights about planning and conducting
survey as data collection method.
Now if you are ready, please proceed to Lesson 3 of this module which will
discuss planning and conducting an experiment.

25 | P a g e
ENGINEERING DATA ANALYSIS

Learning Outcome:

o Understand the concept of experimental design.


o Distinguish the methods of experimental design.

Time Frame: 2 hours

Introduction:

Sometimes the questions you are trying to answer deal with the effect of
certain explanatory variables on some response. Such questions are often of the form,
―What happens when . . . ?‖ or ―What is the effect of . . . ?‖ Experiments provide a way
to collect data to answer these types of questions.

This lesson present the key concept of experiment design, and the methods of
experimental design.

Activity
Suppose in an experiment, the researchers decide to use two room
temperature settings, 18°C and 24°C. Further suppose that there are 10 sections of
first-semester Calculus1 that have agreed to participate in the study. The experiment
is designed in this way:

Set the room temperature to 18°C in five of the rooms and to 24°C in the other
five rooms on test day, and then compare the exam scores for the 18°C group and the
24°C group. Suppose that the average exam score for the students in the 18°C group
was noticeably higher than the average for the 24°C group.

Analysis

Based on the information given in the activity, could you conclude that the
increased temperature resulted in a lower average score? Yes or No.
If no, are their any factors that affects or are related to the exam scores? Can you
enumerate them?

Abstraction:

3.1 . Concepts of Experimental Design

For example, an engineer may be considering two different workstation designs


and might want to know whether the choice of design affects work performance.
Experiments provide a way to collect data to answer these types of questions.

26 | P a g e
ENGINEERING DATA ANALYSIS

Before we describe the concepts of experimental design, the following terms are
defined:

Definition 1.

An experiment is a study in which one or more explanatory variables are


manipulated in order to observe the effect on a response variable.

An experimental condition is any particular combination of values for the


explanatory variables. Experimental conditions are also called treatments.

An experimental unit is the smallest unit to which a treatment is applied.

The explanatory variables are those variables that have values that are controlled
by the experimenter. Also called independent variable or factors.

The response variable is a variable that is not controlled by the experimenter and
that is measured as part of the experiment. Also called dependent variable.

In the language of experimental design, treatments are assigned at random to


experimental units, and replication means that each treatment is applied to more than
one experimental unit.

Illustration 1. Suppose we are interested in determining the effect of room


temperature on performance on a first-year Calculus 1 exam. In this case, the
explanatory variable is room temperature (it can be manipulated by the
experimenter). The response variable is exam performance (the variable that
is not controlled by the experimenter and that will be measured).

In general, we can identify the explanatory variables and the response variable easily
if we can describe the purpose of the experiment in the following terms:

The purpose is to assess the effect of ⏟ on ⏟ .

A well-designed experiment requires more than just manipulating the explanatory


variables; the design must also eliminate other possible explanations or the
experimental results will not be conclusive(Peck,R, Olsen, C. and Devore, J.,
2012).

In designing an experiment our goal is to determine the effects of the


explanatory variables on the chosen response variable. To do this, we must take into
consideration any extraneous variables that, although not of interest in the current
study, might also affect the response variable.

27 | P a g e
ENGINEERING DATA ANALYSIS

Definition 3.

An extraneous variable is one that is not one of the explanatory variables in the
study but is thought to affect the response variable.

A well-designed experiment copes with the potential effects of extraneous


variables by using random assignment to experimental conditions and sometimes also
by incorporating direct control and/or blocking into the design of the experiment.

Illustration 2. In illustration 1, the calculus test example, the textbook used is


an extraneous variable because part of the differences in test results might be
attributed to this variable. We could control this variable directly, by requiring
that all sections use the same textbook. Then any observed differences
between temperature groups could not be explained by the use of different
textbooks. The extraneous variable time of day might also be directly
controlled in this way by having all sections meet at the same time.

The effects of some extraneous variables can be filtered out by a process


known as blocking. Extraneous variables that are addressed through blocking are
called blocking variables. Blocking creates groups (called blocks) that are similar
with respect to blocking variables; then all treatments are tried in each block.

Illustration 3. In illustration 1, we might use instructor as a blocking variable.


If five instructors are each teaching two sections of calculus, we would make
sure that for each instructor, one section was part of the
20° group and the other section was part of the 27° group. With this design, if
we see a difference in exam scores for the two temperature groups, the
extraneous variable instructor can be ruled out as a possible explanation,
because all five instructors’ students were present in each temperature group.
(Had we controlled the instructor variable by choosing to have only one
instructor, that would be an example of direct control.

If one instructor taught all the 20° sections and another taught all the 27°
sections, we would be unable to distinguish the effect of temperature from the effect
of the instructor. In this situation, the two variables (temperature and instructor) are
said to be confounded.

Definition 4.
Two variables are confounded if their effects on the response variable cannot be
distinguished from one another.

In Illustration 1, Calculus test, on the factors related to exam scores is the


student ability, which cannot be controlled by the experimenter and which would be
difficult to use as blocking variables. These extraneous variables are handled by the
use of random assignment to experimental groups.

28 | P a g e
ENGINEERING DATA ANALYSIS

Random assignment can be effective only if the number of subjects or


observations in each experimental condition (treatment) is large enough for each
experimental group to reliably reflect variability in the population.

Replication is the design strategy of making multiple observations for each


experimental condition. Together, replication and random assignment allow the
researcher to be reasonably confident of comparable experimental groups.
Definition 5.

Random Assignment. Random assignment (of subjects to treatments or of


treatments to trials) to ensure that the experiment does not systematically
favor one experimental condition (treatment) over another.

Blocking. Using extraneous variables to create groups (blocks) that are


similar. All experimental conditions (treatments) are then tried in each block.

Direct Control. Holding extraneous variables constant so that their effects


are not confounded with those of the experimental conditions (treatments).

Replication. Ensuring that there is an adequate number of observations for


each experimental condition.

Experimental designs in which experimental units are assigned at random to


treatments or in which treatments are assigned at random to trials are called
completely randomized designs. When blocking is used, the design is called a
randomized block design.

Figure 1 shows a diagram highlighting important features of some common


experimental designs. The structure of an experiment that is based on random
assignment of experimental units to one of two treatments. The diagram can be easily
adapted for an experiment with more than two treatments.

Figure 1. Diagram of an experiment with random assignment of


experimental units to two treatments
Source: Peck, R., Olsen, C. and Devore, J.L. (2012): Introduction to Statistics and Data
Analysis.

29 | P a g e
ENGINEERING DATA ANALYSIS

2.2 Use of Control Group

Many experiments compare a group that receives a particular treatment to a control


group that receives no treatment. The use of a control group allows the experimenter
to assess how the response variable behaves when the treatment is not used. This
provides a baseline against which the treatment groups can be compared to determine
whether the treatment had an effect.

Illustration 4. Suppose that a mechanical engineer wants to know whether a


gasoline additive increases fuel efficiency (kilometres per liter). Such an
experiment might use a single car (to eliminate car-to-car variability) and a
sequence of trials in which 1 liter of gas is put in an empty tank, the car is driven
around a racetrack at a constant speed, and the distance travelled on the liter
of gas is recorded. To determine whether the additive increases gas mileage, it
would be necessary to include a control group of trials in which distance
travelled was measured when gasoline without the additive was used. The trials
would be assigned at random to one of the two experimental conditions (additive
or no additive)

Even though this experiment consists of a sequence of trials all with the same
car, random assignment of trials to experimental conditions is still important because
there will always be uncontrolled variability. For example, temperature or other
environmental conditions might change over the sequence of trials, the physical
condition of the car might change slightly from one trial to another, and so on.

Random assignment of experimental conditions to trials will tend to even out


the effects of these uncontrollable factors

2.3 The Use of Placebo

In experiments that use human subjects, use of a control group may not be
enough to determine whether a treatment really does have an effect. People
sometimes respond merely to the power of suggestion.

Illustration 5. Suppose a study is conducted to determine whether a particular


herbal supplement is effective in promoting weight loss. An experimental group
was identified to be the group that takes the herbal supplement and a control
group that takes nothing. It is possible that those who take the herbal
supplement and believe that they are taking something that will help them to
lose weight may be more motivated and may unconsciously change their eating
behaviour or activity level, resulting in weight loss.

If an experiment is to enable researchers to determine whether a treatment


really has an effect, comparing a treatment group to a control group may not be
enough. To address the problem, many experiments use what is called a placebo.

30 | P a g e
ENGINEERING DATA ANALYSIS

Definition

A placebo is something that is identical (in appearance, taste, feel, etc.) to the
treatment received by the treatment group, except that it contains no active
ingredients.

As long as the subjects did not know whether they were taking the placebo,
the placebo group would provide a better basis for comparison and would allow the
researchers to determine whether the treatment had any real effect over and above the
―placebo effect.

Application: Exercise #3

1. The head of the quality control department at a printing company would like to
carry out an experiment to determine which of three different glues results in the
greatest binding strength. Although they are not of interest in the current
investigation, other factors thought to affect binding strength are the number of
pages in the book and whether the book is being bound as a paperback or a
hardback.
a. What is the response variable in this experiment?
b. What explanatory variable will determine the experimental conditions?
c. What two extraneous variables are mentioned in the problem description? Are
there other extraneous variables that should be considered?

2. A study of college students showed a temporary gain of up to 9 IQ points after


listening to a Mozart’s music. This conclusion, dubbed the Mozart effect, has
since been criticized by a number of researchers who have been unable to
confirm the result in similar studies. Suppose that you wanted to see whether
there is a Mozart effect for students at your school.

a. Describe how you might design an experiment for this purpose.


b. Does your experimental design include direct control of any extraneous
variables? Explain.
c. Does your experimental design use blocking? Explain why you did or did not
include blocking in your design.
d. What role does random assignment play in your design?

Closure
Congratulations! You have successfully completed the tasks and activities for
Lesson 3. It is expected that you are knowledgeable about obtaining data, through
survey and experiments.
You are almost done with this module. The module summary and assessment
will follow.

31 | P a g e
ENGINEERING DATA ANALYSIS

SUMMARY

o Data collection and analysis process:


 Understanding the nature of the problem.
 Deciding what to measure and how to measure it.
 Data Collection
 Data summarization and preliminary analysis
 Formal data analysis
 Interpretation of results

o The entire collection of individuals or objects about which


information is desired is called the population of interest.
o A sample is a subset of the population, selected for study.
o A data set consisting of observations on a single characteristic is a
univariate data set.

o A univariate data set is categorical (or qualitative) if the


individual observations are categorical responses.

o A univariate data set is numerical (or quantitative) if each


observation is a number.
o Data collection techniques for survey
 Interviews
 Self-administered questionnaire
 Direct Observation
o An experiment is a study in which one or more explanatory
variables are manipulated in order to observe the effect on a
response variable.
o An experimental condition is any particular combination of
values for the explanatory variables. Experimental conditions are
also called treatments.

o An experimental unit is the smallest unit to which a treatment is


applied.

o The explanatory variables are those variables that have values


that are controlled by the experimenter. Also called independent
variable or factors

o The response variable is a variable that is not controlled by the


experimenter and that is measured as part of the experiment. Also
called dependent variable.

32 | P a g e
ENGINEERING DATA ANALYSIS

ASSESSMENT

1. Two surveys were conducted to measure the effectiveness of an advertising


campaign for a low-fat brand of ice cream. In one of the surveys, the
interviewers visited the home and asked whether the low-fat brand ice cream
was purchased. In the other survey, the interviewers asked the person to show
them the ice cream container when the interviewee stated he or she had
purchased low-fat ice cream.
a. Do you think the two types of surveys will yield similar results on the
percentage of households using the product?
b. What types of biases may be introduced into each of the surveys?

2. The ―A‖ City school district is planning a survey of 300 of its 15, 000 parents
or guardians who have students currently enrolled. They want to assess the
parents’ opinion about mandatory drug testing of all students participating in
any extracurricular activities, not just . An alphabetical listing of all parents or
guardians is available for selecting the sample. In each of the following
descriptions of the method of selecting the 300 participants in the survey,
identify the type of sampling method used (simple random sampling, stratified
sampling, or cluster sampling).

a. Each name is randomly assigned a number. The names with numbers 1


through 300 are selected for the survey.

b. The schools are divided into five groups according to grade level
taught at the school:
Grade 6 –7, 8–9, 10 –12. Three separate sampling frames are constructed,
one for each group. A simple random sample of 100 parents or guardians is
selected from each group.

c. The school district is also concerned that the parent or guardian’s opinion
may differ depending on the age and sex of the student. Each name is
randomly assigned a number. The names with numbers 1 through 300 are
selected for the survey. The parent is asked to fill out a separate survey for
each of their currently enrolled children.

3. The major private university in the region is attempting to lobby to the


Commission on Higher Education that l would allow the university to charge a
higher tuition rate than the other universities in the country. To provide a
justification, the university plans to conduct a mail survey of its alumni to
collect information concerning their current employment status. The university
grants a wide variety of different degrees and wants to make sure that
information is obtained about graduates from each of the degree types. A 5%
sample of alumni is considered sufficient.
Decide what sampling method you would use. Provide an explanation of why
you selected a particular method of sampling.

33 | P a g e
ENGINEERING DATA ANALYSIS

4. An experiment is planned to compare three types of schools—public, private-


nonparochial, and parochial—all with respect to the problem solving in
mathematics abilities of freshmen engineering. The researcher selects two
large cities in each of six provinces of Davao region for the study. In each
province, the researcher randomly selects one school of each of the three types
and randomly selects a single freshmen class within each school. The scores
on a standardized test are recorded for each of 20 students in each classroom.
The researcher is concerned about differences in family income levels among
the 30 schools, so she obtains the family income for each of the students who
participated in the study.
a. Identify the important features of the design.
b. Identify each of the following components of the experimental design.
i. factors
ii. factor levels
iii. blocks
iv. experimental unit
v. measurement unit
vi. replications
vii. treatments

34 | P a g e
ENGINEERING DATA ANALYSIS

References

Broto, A.S. (2007). Simplified Approach to Inferential Statistics(1st ed.). National .


Philippines.

Carambas, Zenaida U(2011). Basic probability and Statistics. Valencia Educational


Supply. Baguio City

Peck, R., Olsen, C. and Devore, J.L. (2012): Introduction to Statistics and Data
Analysis(4th edition). Brooks/Cole/Cengage Learning, 20 Channel
Center Street Boston, MA 02210, USA

Ott, R.L., Longnecker, M. (2010). An Introduction to Statistical Methods and Data


Amalysis(6th ed). Brooks/Cole, Cengage Learning, CA, USA.

Raussas, George(2003). Introduction to Probability and Statistical Inference.


Elseviere Science, USA

Walpole, RE, & Myers, RH.(1993). Probability and Statistics for Engineers and (5th
ed.). Macmillan Publishing Company, New York.

35 | P a g e

You might also like