You are on page 1of 25

PAMPANGA

STATE
AGRICULTURAL
UNIVERSITY

ENGINEERING DATA
ANALYSIS
MATH 4

TARUN, MARIA CAROLINA V.


CoECS - DGE
ACKNOWLEDGEMENT

First, I would like to give thanks to the Almighty God for the strength, protection,
guidance and ability to do and complete this module with a safe body despite of the
current pandemic that the country is encountering right now.

To the PSAU Flexible Learning Modality Task Force for providing webinars and
assistance in order to come up with a better and efficient module. Through their efforts
and dedication, transition from a face to face learning into a distance learning become
possible and compelling.

To the faculty and staffs of College of Engineering and Computer Studies for their
continuous updates and communication in order to complete the module. They helped us
worked actively to provide us with the protected academic time to pursue the completion
of this module.

To the Academic Group for their hard work to constantly request updates on the
status of our module. Without their help, this module wouldn’t be made into completion.
Their constant communication with the faculty members urge us to be more dedicated
and spend more time with completing of our module.

Nobody has been more important to me in the pursuit of this module than the
members of my family. I would like to thank my parents, whose love and guidance are
with me in whatever I pursue. They are my ultimate role models. To my sisters whose
love and support help me to dream more and aim higher. Most importantly, I wish to thank
my loving and supportive soon-to-be, my Du, who provide unending inspiration in the
completion of this module.

To everyone whose names are not listed above but with whom I have had the
pleasure to work during this and other related projects. This work is for all of you! May the
Almighty God richly bless all of you and to keep you all safe from the current situation that
we are facing nowadays! To God be the Glory!
FOREWORD

Giving, touching other’s lives, expanding the circle of our concern to include others,
being authentic, and being always open to receiving as well as giving. That’s not just a
children’s fairy tale – it’s a good description of many of the most amazing people I have
encountered.

Teaching science involves introducing students to the ways of talking and thinking
of the scientific community. The students are not only gaining genuine experience with
scientific sampling techniques, but also learning about the sensitive eco-system in their
local area

Teaching engineering in college requires a special understanding of sciences.


Effective teachers of sciences think about and beyond the content that they teach,
seeking explanations and making connections to other topics, both inside and outside the
course. Students meet curriculum and achievement expectations when they work with
teachers who know what science is important for each topic that they teach.

The topic of each module is an area of mathematics that is difficult for students to
learn, challenging to teach, and critical for students’ success as learners and in their future
lives and careers.

Drawing on their experiences as teachers, researchers, and mathematicians, the


author have identified the big ideas that are at the heart of each module’s topic. A set of
essential understandings-mathematical points that capture the essence of the topic-
fleshes out big idea. Taken collectively, the big ideas and essential understanding give a
view of a mathematics that is focused, connected, and useful to teachers.

This module was developed during the pandemic days wishing my students to
grasp knowledge and skills needed as a part of college curriculum. This was developed
in order to teach mathematics in an easier and efficient way of learning despite the
distance learning education that is being implemented across the country. I offer sincere
thanks and appreciation to everyone who has helped me make this module possible and
real.
UNIVERSITY VMGO AND CORE VALUES

Vision: To be the Premiere Science and Agroecological University

Mission: Mainstream science and practice of agroecological and industrial


technologies through distinctive instruction, research, extension and
entrepreneurship for people and nature.

Core Values: People-centeredness – relevant and socially-responsive services


Systems thinking – integrated, collaborative, and multi-disciplinary
approach to local and global issues and concerns
Accountability – responsibility, trustworthiness, and efficiency in
implementing programs
Unity – solidarity, teamwork, and harmony under the Almighty
`

Pampanga State Agricultural University


Magalang, Pampanga
COLLEGE OF ENGINEERING AND COMPUTER STUDIES

Course Guide in MATH 4 (Engineering Data Analysis)


2nd Semester, AY 2020-2021

I. COURSE DETAILS

Course Code and Title MATH 4 (Engineering Data Analysis)


Course Description This course is designed for undergraduate engineering students with emphasis on problem solving
related to societal issues that engineers and scientists are called upon to solve. It introduces
different methods of data collection and the suitability of using a particular method for a given
situation.
The relationship of probability to statistics is also discussed, providing students with the tools they
need to understand how "chance" plays a role in statistical analysis. Probability distributions of
random variables and their uses are also considered, along with a discussion of linear functions of
random variables within the context of their application to data analysis and inference. The course
also includes estimation techniques for unknown parameters; and hypothesis testing used in
making inferences from sample to population; inference for regression parameters and build
models for estimating means and predicting future values of key variables under study. Finally,
statistically based experimental design techniques and analysis of outcomes of experiments are
discussed with the aid of statistical software.
Number of Units/Hours 3 Units(Lecture) – 3 hours per week
per Week
Instructor/Contact ENGR. MARIA CAROLINA V. TARUN Consultation 2:00 – 4:00 PM
Information 09169328664 Hours and Mode Consultation via
mariacarolina_tarun@psau.edu.ph of Consultation zoom/email/messenger/google
meet/iskwela.

Page 1 of 7
`

Pampanga State Agricultural University


Magalang, Pampanga
COLLEGE OF ENGINEERING AND COMPUTER STUDIES
Program Outcomes At the end of this course, the students should be able to:

 Apply knowledge of mathematics, physical sciences, and engineering sciences to the practice of
geodetic engineering;
 Design and conduct experiments to test hypotheses and verify assumptions, as well as to
organize, analyze and interpret data, draw valid conclusions, and develop mathematical models
for processes;
 Design, improve, innovate, and to supervise systems or procedures to meet desired needs
within realistic constraints, in accordance with standards; and
 Identify, formulate, and solve geodetic engineering problems.

Course Outcomes At the end of this course, the students should be able to:

 Apply statistical methods in the analysis of data; and


 Design experiments involving several factors;

Page 2 of 7
`

Pampanga State Agricultural University


Magalang, Pampanga
COLLEGE OF ENGINEERING AND COMPUTER STUDIES

II. COURSE OUTLINE

Schedule Topics Learning Experience Assessment Deadline


Tool

Week 1 Module 1: Obtaining Data Online Consultation Discussion Forum End of week 1
1.1 Methods of Data Collection Lectures via LMS
1.2 Planning and Conducting And Supplementary Reading
Surveys Materials
Offline Assessment To be
1.3 Planning and Conducting
announced
Experiments: Introduction to
Design of Experiments

Week 2 Module 2: Probability Online Consultation Discussion Forum End of week 2


Lectures via LMS
2.1. Sample Space and Relationships And Supplementary Reading
among Events Materials Offline To be
2.2. Counting Rules Useful in Probability Assessment announced
2.3. Rules of Probability

Page 3 of 7
`

Pampanga State Agricultural University


Magalang, Pampanga
COLLEGE OF ENGINEERING AND COMPUTER STUDIES

Week 3 Module 3: Discrete Probability Online Consultation Discussion Forum End of week 3
Distributions Lectures via LMS
3.1. Random Variables and their And Supplementary Reading
Probability Distribution Materials
3.2. Cumulative Distribution Offline To be
Functions Assessment announced
3.3. Expected Values of Random
Variables
3.4. The Binomial Distribution
3.5. The Poisson Distribution

Week 4 Module 4: Joint Probability Online Consultation Discussion Forum End of week 4
Distribution Lectures via LMS
4.1 Two or Random Variables And Supplementary Reading
4.2 Linear Functions of Random Materials Offline To be
Variables Assessment announced
4.3 General Functions of Random
Variables
Week 5 Module 5: Sampling Distributions and Point Online Consultation Discussion Forum End of week 5
Estimation of Parameters Lectures via LMS
And Supplementary Reading
5.1 Point Estimation Materials Offline To be
5.2 Sampling Distribution and the Assessment announced
Central Limit Theorem
5.3 General Concept of Point Midterm To be
Estimation Examination announced

Page 4 of 7
`

Pampanga State Agricultural University


Magalang, Pampanga
COLLEGE OF ENGINEERING AND COMPUTER STUDIES
Week 6 Module 6 Statistical Intervals Online Consultation Discussion Forum End of week 6
Lectures via LMS
6.1 Confidence Intervals: Single And Supplementary Reading
Sample Materials Offline To be
6.2 Confidence Intervals: Multiple Assessment announced
Samples
6.3 Prediction Intervals
6.4 Tolerance Intervals
Week 7 Module 7: Test of Hypothesis for a Single Online Consultation Discussion Forum End of week 7
Sample Lectures via LMS
And Supplementary Reading
7.1 Hypothesis Testing Materials Offline To be
7.2 Test on the Mean of a Normal Assessment announced
Distribution, Variance Known
7.3 Test on the Mean of a Normal To be
Distribution, Variance Unknown Final Examination announced
7.4 Test on the Variance and
Statistical Deviation of a Normal
Distribution
7.5 Test on a Population Proportion
Week 8 Module 8: Statistical Inference of Two Online Consultation Discussion Forum End of week 8
Samples Lectures via LMS
And Supplementary Reading
8.1 Inference on the Difference in Materials Offline To be
Means of Two Normal Assessment announced
Distributions, Variables known
8.2 Inference on the Difference in
Means of Two Normal
Distributions, Variances
Unknown
8.3 Inference on the Variance of
Two Normal Distributions
Page 5 of 7
`

Pampanga State Agricultural University


Magalang, Pampanga
COLLEGE OF ENGINEERING AND COMPUTER STUDIES
8.4 Inference on Two Population
Proportions

Week 9 Module 9: Simple Linear Regression and Online Consultation Discussion Forum End of week 9
Correlation Lectures via LMS
And Supplementary Reading
9.1 Empirical Models Materials Offline To be
9.2 Regression: Modelling Linear Assessment announced
Relationships – The Least
Squares Approach
9.3 Hypothesis Tests in Simple Linear Final Examination To be
Regression announced
9.4 Prediction of New Observations
9.5 Adequacy of the Regression
Model
9.6 Correlation

III. COURSE MATERIALS

The main reference materials for the course are the following:

1. Data Analysis. Statistical and Computational Methods for Scientist and Engineers. 4th Ed. Siegmund Brandt
2. Statistics and Data Analysis for Financial Engineers with R examples. 2nd Ed. David Ruppert. David S. Matteson.

Page 6 of 7
`

Pampanga State Agricultural University


Magalang, Pampanga
COLLEGE OF ENGINEERING AND COMPUTER STUDIES

IV. PROJECT AND FINAL ASSESSMENT/EXAMINATION:

1. Final Project will be announced.


2. Assignments, Worksheets and Unit Exams are announced.
3. Final Assessment/Examination on MATH 4 (Engineering Data Analysis) is scheduled on May , 2021.

V. CRITERIA FOR GRADING

1. Studentship (Promptness of Submission of Requirements) - 10%


2. Class Standing (Projects, Assignments, Reporting) – 60%
3. Major Exams – (30%)

Prepared by:

Engr. Maria Carolina V. Tarun


Instructor I

Page 7 of 7
PAMPANGA STATE AGRICULTURAL UNIVERSITY
Magalang, Pampanga

COLLEGE OF ENGINEERING AND COMPUTER STUDIES


DEPARTMENT OF GEODETIC ENGINEERING

LEARNING CONTRACT

A contract is hereby executed between the Professor of this course _______________________ and the
student enrolled as follows:
Whereas, actual class session are integral part of the course;
Agreement is hereby made such as:
1. The professor will provide the course syllabus at the start of the semester;
2. The professor will facilitate and moderate the class teaching-learning process and provide the
necessary knowledge relative to the course;
3. The student in return will receive the syllabus; familiarize himself/herself with the contents,
requirements and conforms with the provisions in the syllabus.
4. The student will abide to the policies of the program relative to the requirements on attendance and
other course requirements as specified in the syllabus, otherwise the student will fail the course.
5. That this contract is effective upon signing until the end of the current semester.

Signed:
________________________________ ENGR. MARIA CAROLINA V. TARUN, ICE
Signature over Printed Name of Student Faculty, CoECS

_______________________________ _______________________________
Date Signed Date Signed
Witness:

ROSEMARIE S. MACMAC, MBM, GE, REB, REA


Department Chair
_____________________________________________________________________________________
CERTIFICATION
This is to certify that I received the syllabus of this course ______________ on_______________, for the
______ sem, SY __________.

________________________________
Signature over Printed Name of Student
TABLE OF CONTENTS
Page
COVER PAGE ................................................................................................. i
ACKNOWLEDGEMENT ................................................................................. ii
FOREWORD ................................................................................................... iii
UNIVERSITY VMGO AND CORE VALUES..................................................... iv
COURSE GUIDE.............................................................................................. v
LEARNING CONTRACT ................................................................................. vi
TABLE OF CONTENTS ................................................................................... vii

MODULE 1: Obtaining Data ........................................................................... 1

MODULE 2: Probability .................................................................................. 1

MODULE 3: Discrete Probability Distributions ............................................ 1

MODULE 4: Joint Probability Distribution ................................................... 1

MODULE 5: Sampling Distributions and Point Estimation of Parameters 1

MODULE 6: Statistical Intervals .................................................................... 1

MODULE 7: Test of Hypothesis for a Single Sample .................................. 1

MODULE 8: Statistical Inference of Two Samples....................................... 1

MODULE 9: Single Linear Regression and Correlation .............................. 1


MODULE 1
OBTAINING DATA

At the end of this module, you are expected to:

1. Define data collection;


2. Know the process of obtaining data;
3. Understand the methods obtaining data;
4. Understand planning of surveys; and
5. Enumerate the process of conducting survey experiments.

INTRODUCTION
Engineering Data Analysis (EDA) is an indispensable analysis tool for the
engineering team of the industries to analyze processes, integration, and yield
(conversion rate) effectively in order to enhance the competitiveness of the company.

DATA COLLECTION
Data collection is defined as the procedure of collecting, measuring and analyzing
accurate insights for research using standard validated techniques. A researcher can
evaluate their hypothesis on the basis of collected data. In most cases, data collection is
the primary and most important step for research, irrespective of the field of research.
The approach of data collection is different for different fields of study, depending on the
required information.
The most critical objective of data collection is ensuring that information-rich and
reliable data is collected for statistical analysis so that data-driven decisions can be made
for research.

OBTAINING DATA
Statistics may be defined as the science that deals with the collection,
organization, presentation, analysis, and interpretation of data in order be able to draw
judgments or conclusions that help in the decision-making process. The two parts of this
definition correspond to the two main divisions of Statistics. These are Descriptive
Statistics and Inferential Statistics.
Descriptive Statistics deals with the procedures that organize, summarize and
describe quantitative data. It seeks merely to describe data. Inferential Statistics deals
with making a judgment or a conclusion about a population based on the findings from a
sample that is taken from the population.
Before proceeding to the discussion of the different methods of obtaining data, let
us have first definition of some statistical terms:
1. Population or Universe refers to the totality of objects, persons, places, things
used in a particular study. All members of a particular group of objects (items)
or people (individual), etc. which are subjects or respondents of a study.
2. Sample is any subset of population or few members of a population.
3. Data are facts, figures and information collected on some characteristics of a
population or sample. These can be classified as qualitative or quantitative
data.
4. Ungrouped (or raw) data are data which are not organized in any specific
way. They are simply the collection of data as they are gathered.
5. Grouped Data are raw data organized into groups or categories with
corresponding frequencies. Organized in this manner, the data is referred to as
frequency distribution.
6. Parameter is the descriptive measure of a characteristic of a population.
7. Statistic is a measure of a characteristic of sample.
8. Constant is a characteristic or property of a population or sample which is
common to all members of the group.
9. Variable is a measure or characteristic or property of a population or sample
that may have a number of different values. It differentiates a particular member
from the rest of the group. It is the characteristic or property that is measured,
controlled, or manipulated in research. They differ in many respects, most
notably in the role they are given in the research and in the type of measures
that can be applied to them

METHODS OF DATA COLLECTION


Data Collection is an important aspect of any type of research study. Inaccurate
data collection can impact the results of a study and ultimately lead to invalid results.
Data collection methods for impact evaluation vary along a continuum. At the one
end of this continuum are quantitative methods and at the other end of the continuum are
Qualitative methods for data collection.
The main sources of the data collections methods are “Data”. Data can be
classified into two types, namely primary data and secondary data. The primary
importance of data collection in any research or business process is that it helps to
determine many important things about the company, particularly the performance. So,
the data collection process plays an important role in all the streams. Depending on the
type of data, the data collection method is divided into two categories namely,
 Primary Data Collection methods
 Secondary Data Collection methods

PRIMARY DATA COLLECTION METHODS


Primary data or raw data is a type of information that is obtained directly from the
first-hand source through experiments, surveys or observations. The primary data
collection method is further classified into two types. They are:
 Quantitative Data Collection Methods
 Qualitative Data Collection Methods

QUANTITATIVE DATA COLLECTION METHODS


The Quantitative data collection methods, rely on random sampling and
structured data collection instruments that fit diverse experiences into predetermined
response categories. They produce results that are easy to summarize, compare, and
generalize.
Quantitative research is concerned with testing hypotheses derived from theory
and/or being able to estimate the size of a phenomenon of interest. Depending on the
research question, participants may be randomly assigned to different treatments. If this
is not feasible, the researcher may collect data on participant and situational
characteristics in order to statistically control for their influence on the dependent, or
outcome, variable. If the intent is to generalize from the research participants to a larger
population, the researcher will employ probability sampling to select participants. Typical
quantitative data gathering strategies include:
a. Experiments/clinical trials.
b. Observing and recording well-defined events (e.g., counting the number of
patients waiting in emergency at specified times of the day).
c. Obtaining relevant data from management information systems.
d. Administering surveys with closed-ended questions (e.g., face-to face and
telephone interviews, questionnaires etc).

1. Interview Method
The method of collecting data in terms of oral or verbal responses. It is achieved
in three ways, such as:

 Personal Interview – In this method, a person known as an interviewer


is required to ask questions face to face to the other person. The
personal interview can be structured or unstructured, direct
investigation, focused conversation, etc.
 Telephonic Interview – In this method, an interviewer obtains information
by contacting people on the telephone to ask the questions or views
orally.
 Computer Assisted Personal Interviewing (CAPI) – is a form of personal
interviewing, but instead of completing a questionnaire, the interviewer
brings along a laptop or hand-held computer to enter the information
directly into the database. This method saves time involved in
processing the data, as well as saving the interviewer from carrying
around hundreds of questionnaires. However, this type of data collection
method can be expensive to set up and requires that interviewers have
computer and typing skills.

2. Questionnaire Method
In this method, the set of questions are mailed to the respondent. They should
read, reply and subsequently return the questionnaire. The questions are printed in the
definite order on the form. A good survey should have the following features:
 Short and simple
 Should follow a logical sequence
 Provide adequate space for answers
 Avoid technical terms
 Should have good physical appearance such as colour, quality of the
paper to attract the attention of the respondent

This method is achieve in two ways:


 Paper-pencil-questionnaires – can be sent to a large number of people
and saves the researcher time and money. People are more truthful
while responding to the questionnaires regarding controversial issues in
particular due to the fact that their responses are anonymous. But they
also have drawbacks. Majority of the people who receive questionnaires
don't return them and those who do might not be representative of the
originally selected sample.

 Web based questionnaires – A new and inevitably growing methodology


is the use of Internet based research. This would mean receiving an e-
mail on which you would click on an address that would take you to a
secure web-site to fill in a questionnaire. This type of research is often
quicker and less detailed. Some disadvantages of this method include
the exclusion of people who do not have a computer or are unable to
access a computer.Also the validity of such surveys are in question as
people might be in a hurry to complete it and so might not give accurate
responses.

Questionnaires often make use of Checklist and rating scales. These devices help
simplify and quantify people's behaviors and attitudes. A checklist is a list of behaviors,
characteristics, or other entities that the researcher is looking for. Either the researcher
or survey participant simply checks whether each item on the list is observed, present or
true or vice versa. A rating scale is more useful when a behavior needs to be evaluated
on a continuum. They are also known as Likert scales.

3. Schedule Method
Schedule is the tool or instrument used to collect data from the respondents while
interview is conducted. Schedule contains questions, statements (on which opinions are
elicited) and blank spaces/tables for filling up the respondents. The features of schedules
are:
 The schedule is presented by the interviewer. The questions are asked and the
answers are noted down by him.
 The list of questions is a more formal document, it need not be attractive.
 The schedule can be used in a very narrow sphere of social research.

4. Survey Method
The essence of survey method can be explained as “questioning individuals on a
topic or topics and then describing their responses”. In business studies survey method
of primary data collection is used in order to test concepts, reflect attitude of people,
establish the level of customer satisfaction, and conduct segmentation research and a
set of other purposes. Survey method can be used in both, quantitative, as well as,
qualitative studies.
Survey method pursues two main purposes:
 Describing certain aspects or characteristics of population and/or
 Testing hypotheses about nature of relationships within a population.
Survey method can be broadly divided into three categories: mail survey,
telephone survey and personal interview. The descriptions of each of these methods are
briefly explained on the following table.

QUALITATIVE DATA COLLECTION METHODS


Qualitative data collection methods play an important role in impact evaluation
by providing information useful to understand the processes behind observed results and
assess changes in people’s perceptions of their well-being. Furthermore qualitative
methods can be used to improve the quality of survey-based quantitative evaluations by
helping generate evaluation hypothesis; strengthening the design of survey
questionnaires and expanding or clarifying quantitative evaluation findings. These
methods are characterized by the following attributes:
 they tend to be open-ended and have less structured protocols (i.e., researchers
may change the data collection strategy by adding, refining, or dropping
techniques or informants)
 they rely more heavily on interactive interviews; respondents may be interviewed
several times to follow up on a particular issue, clarify concepts or check the
reliability of data
 they use triangulation to increase the credibility of their findings (i.e., researchers
rely on multiple data collection methods to check the authenticity of their results)
 generally their findings are not generalizable to any specific population, rather each
case study produces a single piece of evidence that can be used to seek general
patterns among different studies of the same issue

Regardless of the kinds of data involved, data collection in a qualitative study takes
a great deal of time. The researcher needs to record any potentially useful data
thoroughly, accurately, and systematically, using field notes, sketches, audiotapes,
photographs and other suitable means. The data collection methods must observe the
ethical principles of research.
The qualitative methods most commonly used in evaluation can be classified in
three broad categories:
1. In-depth Interview
In-depth interviews are a qualitative data collection method that allows for the
collection of a large amount of information about the behavior, attitude and perception of
the interviewees.
During in-depth interviews, researchers and participants have the freedom to
explore additional points and change the direction of the process when necessary. It is
an independent research method that can adopt multiple strategies according to the
needs of the research.

2. Observation Methods
Observation, as the name implies, is a way of collecting data through observing.
Observation data collection method is classified as a participatory study, because the
researcher has to immerse herself in the setting where her respondents are, while taking
notes and/or recording. Observation is used in the social sciences as a method for
collecting data about people, processes, and cultures. Thus, Observation is a technique
that involves systematically selecting, watching and recording behaviour and
characteristics of living beings, objects or phenomena
Observation as a data collection method can be structured or unstructured. In
structured or systematic observation, data collection is conducted using specific variables
and according to a pre-defined schedule. Unstructured observation, on the other hand, is
conducted in an open and free manner in a sense that there would be no pre-determined
variables or objectives.

3. Document Review
A qualitative research project may require review of documents such as:
 Course syllabi
 Faculty journals
 Meeting minutes
 Strategic plans
 Newspapers
Depending on the research question, the researcher might utilize: Rating scale,
Checklist, Content analysis, and Matrix analysis.

SECONDARY DATA COLLECTION METHODS


Secondary data is data collected by someone other than the actual user. It means
that the information is already available, and someone analyses it. The secondary data
includes magazines, newspapers, books, journals, etc. It may be either published data or
unpublished data.
Published data are available in various resources including
 Government publications
 Public records
 Historical and statistical documents
 Business documents
 Technical and trade journals
Unpublished data includes
 Diaries
 Letters
 Unpublished biographies, etc.

PLANNING AND CONDUCTING SURVEYS


 Well designed and conducted surveys use chance, random samples, and have no
sources of bias.
 Population: Entire group of individuals about which we want info.
 Sample: Part of the population from which we collect information.
 Bias: Factor which can favor certain outcomes.
 Sampling Methods include Simple Random Sampling (SRS), Stratified Random
Sampling, and Cluster Sampling.

The population in a statistical study is the entire group of individuals, scores,


measurements, etc. about which we want information. A sample is the part of the
population from which we actually collect information and is used to draw conclusions
about the whole. Random Selection is a process of gathering a representative sample for
a particular study. Random means the people are chosen by chance, each person has
the same probability of being chosen. When you have a truly random sample, you reduce
the chance that the results are due to factors of the participants in the study.

SOURCES OF BIAS IN SAMPLING AND SURVEYS:


Convenience Samples use a selection of individuals that are easiest to reach, and
Voluntary Response Samples where respondents decide if they want to be included, are
common methods of data collection that will usually produce biased results. These
sampling methods will usually favor one part of a population over another.
For example, if the High School guidance office wanted to know if students are
interested in an AP Statistics elective, would the district get accurate information if the
counselors asked the Calculus teachers to survey their students? Why would more
accurate results be gathered in an English or History class? Would asking students to
stop by the office at the end of the day to fill out a questionnaire regarding testing policies
in the district yield valid results? What could be changed to make this a more valid
sample?

A sample chosen by chance allows neither favoritism by the sampler nor self-
selection by respondents. All individuals have an equal chance to be chosen. A Simple
Random Sample allows all members of a population an equal chance of being selected,
avoiding bias. Drawing names from a hat works for small populations (students in a
classroom) but would not be practical when conducting a national survey. Computer-
generated Random Digits can be used when working with large populations. A Table of
Random Digits is a long string of the digits 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, where each entry in
the table is equally likely to be any of the 10 digits and the entries are independent of
each other. Systematic Sampling selects a starting point and then selects every kth (such
as 50th) element in the population.
Other sampling methods include Stratified Random Sampling, and Cluster
Sampling. Both involve the formation of subgroups before collecting data. Stratified
Random Sampling subdivides the population into at least two different subgroups (strata)
so that subjects within the same subgroup share the same characteristics (gender, age)
then draw a sample from each. Ex. The Orange County DMV plans to test an on line
registration system by using a sample consisting of 20 randomly selected men and 20
randomly selected women. Cluster Sampling divides the population into sections
(clusters), randomly select some of those clusters, and then chooses all members of the
selected clusters. Ex. Pre-election polls randomly select 30 precincts from a large number
of precincts, then survey all members from each of the selected precincts.

PLANNING AND CONDUCTING EXPERIMENTS: INTRODUCTION TO


DESIGN OF EXPERIMENTS
Design of Experiments (or DOE) is a formal, structured method for conducting
experiments that provides much better results. A key tool used in DOE is the Design
Matrix, which is a map that shows how to run the experiment. Creating a structured,
statistical map for the experiment makes it easier to analyze the data and ensures
conclusions that are more reliable. All experiments are designed experiments, it is just
that some are poorly designed and some are well-designed.
Robustness is a concept that enters into statistics at several points. At the analysis,
stage robustness refers to a technique that isn't overly influenced by bad data. Even if
there is an outlier or bad data you still want to get the right answer. Regardless of who or
what is involved in the process - it is still going to work. We will come back to this notion
of robustness later in the course.

Every experiment design has inputs. Back to the cake baking example: we have
our ingredients such as flour, sugar, milk, eggs, etc. Regardless of the quality of these
ingredients we still want our cake to come out successfully. In every experiment there are
inputs and in addition, there are factors (such as time of baking, temperature, geometry
of the cake pan, etc.), some of which you can control and others that you can't control.
The experimenter must think about factors that affect the outcome. We also talk about
the output and the yield or the response to your experiment. For the cake, the output
might be measured as texture, flavor, height, size, or flavor.
The practical steps needed for planning and conducting an experiment include:
recognizing the goal of the experiment, choice of factors, choice of response, choice of
the design, analysis and then drawing conclusions. This pretty much covers the steps
involved in the scientific method.
1. Recognition and statement of the problem
2. Choice of factors, levels, and ranges
3. Selection of the response variable(s)
4. Choice of design
5. Conducting the experiment
6. Statistical analysis
7. Drawing conclusions, and making recommendations

What this course will deal with primarily is the choice of the design. This focus includes
all the related issues about how we handle these factors in conducting our experiments.

We usually talk about "treatment" factors, which are the factors of primary interest to
you. In addition to treatment factors, there are nuisance factors which are not your primary
focus, but you have to deal with them. Sometimes these are called blocking factors,
mainly because we will try to block on these factors to prevent them from influencing the
results. There are other ways that we can categorize factors:

1. Experimental vs. Classification Factors


Experimental Factors
These are factors that you can specify (and set the levels) and then assign at
random as the treatment to the experimental units. Examples would be temperature,
level of an additive fertilizer amount per acre, etc.SampleText
Classification Factors
These can't be changed or assigned, these come as labels on the
experimental units. The age and sex of the participants are classification factors
which can't be changed or randomly assigned. But you can select individuals from
these groups randomly.

2. Quantitative vs. Qualitative Factors


Quantitative Factors
You can assign any specified level of a quantitative factor. Examples: percent
or pH level of a chemical.
Qualitative Factors
These factors have categories which are different types. Examples might be
species of a plant or animal, a brand in the marketing field, gender, - these are not
ordered or continuous but are arranged perhaps in sets
GUIDED QUESTIONS 1
I. Identify which type of sampling is used: random, systematic, convenience, stratified, or
cluster.
1. A policy sobriety checkpoint stops and interviews every 5th driver.
3. An exit poll randomly selects specific polling stations and all voters are
surveyed as they leave the premises.
4. An engineering student measures the strength of fingers used to push
buttons by testing family members.
5. An IRS researcher investigates cheating on income taxes by surveying all
waiters and waitresses at 20 randomly selected restaurants.
6. A marketing expert for MTV is planning a survey in which 500 people will be
randomly selected from each age group of 10-19, 20- 29, …
7. A teacher surveyed all of his students to obtain a sample consisting of the
number of credit cards students possess.
8. A poll of 1550 adults, subjects were selected by using a computer to
randomly generate phone numbers that were called.

II. Think about your own field of study and jot down several of the factors that are pertinent
in your own research area? Into what categories do these fall?

III. Data Collection Planning


IV.

You might also like