You are on page 1of 92

I.

Obtaining Data
Math221E – Engineering Data Analysis
Science
Science is based on the empirical method for
making observations - for systematically
obtaining information. It consists of methods for
making observations.
Observations are the basic empirical "stuff" of
science.
Statistics
Statistics is a set of methods for processing &
analyzing numbers for helping reduce the
uncertainty inherent in decision making.
It is the science that deals with the collection,
organization, summarization, presentation,
analysis and interpretation of data to come up
with meaningful information.
Statistics
It is a set of concepts, rules, and procedures that
help us to:
▪ organize numerical information in the form of
tables, graphs, and charts;
▪ understand statistical techniques underlying
decisions that affect our lives and well-being; and
▪ make informed decisions.
Methods of Data
Collection
Data Collection
Data collection is the process of gathering
and measuring information on variables of
interest, in an established systematic
fashion that enables one to answer stated
research questions, test hypotheses, and
evaluate outcomes.
Types of data
1. Primary Data
are those which are collected afresh and for the
first time, and thus happen to be original in
character
2. Secondary Data
are those which have already been collected by
someone else and which have already been
passed through the statistical process
Methods of Data Collection:
Primary Data
1. Observation
2. Interview
3. Questionnaire
4. Case Study
5. Survey
Observation method
Observation method is a method under
which data from the field is collected with
the help of observation by the observer or by
personally going to the field.
ADVANTAGES DISADVANTAGES
• Subjective bias • Time consuming
eliminated • Limited information
• Current information • Unforeseen factors
• Independent to
respondent’s variable
Types of Observation
1. Structured Observation
when observation is done by characterizing style of
recording the observed information, standardized
conditions of observation, definition of the units to
be observed, selection of pertinent data of
observation.
Example: An auditor performing inventory
analysis in store
2. Unstructured Observation
when observation is done without any thought
before observation.
Example: Observing children playing with new
toys.
Types of Observation
1. Participant
ADVANTAGES
when the observer is • Observation of natural behavior
member of the group • Closeness with the group
which he is observing. • Better understanding

2. Non-participant
when observer is ADVANTAGES
observing people • Objectivity and neutrality
• More willingness of the
without giving any respondent
information to them.
Types of Observation
1. Controlled
when observation takes place according to
definite pre-arranged plans, with experimental
procedure. It is generally done in laboratory
under controlled condition.
2. Uncontrolled
when the observation takes place in natural
condition. It is done to get spontaneous picture
of life and persons.
Interview Method
This method of collecting data involves
presentation or oral verbal stimuli and reply
in terms of oral-verbal responses.
Interview Method is an oral verbal
communication where interviewer asks
questions (which are aimed to get
information required for study) to
respondent.
Types of Interviews
• Personal interviews. The interviewer
asks questions generally in a face to face
contact to the other person or persons.
• Structured interviews. In this case, a set
of pre- decided questions are there.
• Unstructured interviews. In this case, we
don’t follow a system of pre-determined
questions.
Types of Interviews
• Focused interviews. Attention is focused
on the given experience of the respondent
and its possible effects.
• Clinical interviews. Concerned with
broad underlying feelings or motivations
or with the course of individual’s life
experience, rather than with the effects of
the specific experience, as in the case of
focused interview.
Types of Interviews
• Group interviews. A group of 6 to
individuals is interviewed.
• Qualitative and quantitative interviews.
Divided on the basis of subject matter i.e.
whether qualitative or quantitative.
• Individual interviews. Interviewer meets
a single person and interviews him.
Types of Interviews
• Depth interviews. It deliberately aims to
elicit unconscious as well as other types
of material relating especially to
personality dynamics and motivations.
• Telephonic interviews. Contacting
samples on telephone.
Questionnaire Method
This method of data collection is quite
popular, particularly in case of big
enquiries.
The questionnaire is mailed to respondents
who are expected to read and understand
the questions and write down the reply in
the space meant for the purpose in the
questionnaire itself. The respondents have
to answer the questions on their own.
Questionnaire Method
ADVANTAGES DISADVANTAGES
• Low cost even if the • Low rate of return of duly
geographical area is too filled questionnaire.
large • Slowest method of data
• Answers are in respondents collection.
word so free from bias. • Difficult to know if the
• Adequate time to think for expected respondent have
answers. filled the form or it is filled
• Non approachable by someone else.
respondents may be
conveniently contacted.
• Large samples can be used
so results are more reliable
Case Study Method
Case study method is essentially an intensive
investigation of the particular unit under
consideration. The object of the case study
method is to locate the factors that account for
the behavior-patterns of the given unit as an
integrated totality.
ADVANTAGES DISADVANTAGES
• Less costly and less time-consuming • Subject to selection bias
• Advantageous when exposure data • Generally do not allow
is expensive or hard to obtain. calculation of incidence
• Advantageous when studying (absolute risk).
dynamic populations in which
follow-up is difficult.
Survey Method
Survey method is one of the common
methods of diagnosing and solving of social
problems is that of undertaking surveys.
ADVANTAGES DISADVANTAGES
• Relatively easy to • Respondents may not feel
administer encouraged to provide accurate,
• Can be developed in less honest answers
time (compared to other • Surveys with closed-ended
data collection methods) questions may have a lower validity
• Cost-effective, but cost rate than other question types.
depends on survey mode • Data errors due to question non-
responses may exist.
Sources of Secondary Data
• Publications of Central, state , local
government
• Technical and trade journals
• Books, Magazines, Newspaper
• Reports & publications of industry ,bank,
stock exchange
• Reports by research scholars, Universities,
economist
• Public Records
Factors to be Considered Before
Using Secondary Data
• Reliability of data. Who, when, which
methods, at what time etc.
• Suitability of data. Object, scope, and
nature of original inquiry should be
studied, as if the study was with different
objective then that data is not suitable for
current study
• Adequacy of data. Level of accuracy
Selection of Proper Method for
Collection of Data
• Nature, Scope and object of inquiry
• Availability of Funds
• Time Factor
• Precision Required
Planning and
Conducting
Surveys
Designing A Survey
Surveys can take different forms. They can
be used to ask only one question or they can
ask a series of questions. We can use
surveys to test out people’s opinions or to
test a hypothesis.
Designing A Survey
1. Determine the goal of your survey. What question do
you want to answer?
2. Identify the sample population. Whom will you
interview?
3. Choose an interviewing method. face-to-face
interview, phone interview, self-administered paper
survey, or internet survey.
4. Decide what questions you will ask in what order,
and how to phrase them.
5. Conduct the interview and collect the information.
6. Analyze the results by making graphs and drawing
conclusions.
Example
Martha wants to
construct a survey
that shows which
sports students at
her school like to
play the most.
Step 1. Goal
The goal of the survey is to find the answer
to the question: “Which sports do students
at Martha’s school like to play the most?”
Step 2. Population
A sample of the population would include a
random sample of the student population
in Martha’s school. A good strategy would be
to randomly select students (using dice
or a random number generator) as they
walk into an all-school assembly.
Step 3. Methods
Face-to-face interviews are a good choice in
this case.
Interviews will be easy to conduct since the
survey consists of only one question which
can be quickly answered and recorded, and
asking the question face to face will help
eliminate non-response bias.
Step 4. Question
“What sport do you like to play the most?”
Step 5. Information
Sport
• Baseball - 12
• Basketball - 10
• Football - 15
• Soccer -10
Example
Juan wants to
construct a survey
that shows how
many hours per
week the average
student at his school
works.
Step 1. Goal
The goal of the survey is to find the answer
to the question “How many hours per week
do you work?”
Step 2. Population
Juan suspects that older students might
work more hours per week than younger
students. He decides that a stratified
sample of the student population would be
appropriate in this case.
The strata are grade levels 9th through 12th.
He would need to find out what proportion
of the students in his school are in each
grade level, and then include the same
proportions in his sample.
Step 3. Methods
Face-to-face interviews are a good choice in
this case since the survey consists of two
short questions which can be quickly
answered and recorded.
Step 4. Question
“In what grade level are you?”
“How many hours per week do you work?”
Step 4. Information
Planning and Conducting
Experiments:
Introduction to Design of
Experiments
Design of Experiment (DOE)
• The focus is on the generation or
collection of data.
• Involves tests in which purposeful
changes are made to the input variables
of a process or system so that an
observation can be made and be able to
identify the reasons for changes in the
output responses.
Design of Experiment (DOE)
Usual objectives:
1. Which variables are most influential on the
response.
2. Where to set the influential variables so that
the response is almost always near the
desired value (optimization).
3. Where to set the influential variables so that
variability in the response is small.
4. Where to set the influential variables so that
the effects of uncontrollable variables are
minimized.
Applications of Experimental Design
1. In process development:
• Improve process yield
• Reduce variability and closer conformance to
nominal or target requirements
• Reduce development time
• Overall cost
Applications of Experimental Design
2. In engineering design:
• Evaluation and comparison of basic design
configurations
• Evaluation of material alternatives
• Selection of design parameters so that the
product will work well under a wide variety
of field conditions
• Determination of ley product design
parameters that impact production
performance
How to apply DOE?
1. Plan the experiment
• Define the objectives
• Identify response variables
• Identify relevant factors
• Classify and prioritize factors
• Design the experiment
2. Execute the experiment.
3. Statistically analyze and interpret the data.
4. Take action based on data.
Example
Suppose you are
asked to determine
the type of rubber
best suited in the
manufacture of a
motorcycle tire.
Apply the procedure
presented to come
up with an
experimental design
How to apply DOE?
1. Plan the experiment
• Define the objectives
• Identify response variables
• Identify relevant factors
• Classify and prioritize factors
• Design the experiment
2. Execute the experiment.
3. Statistically analyze and interpret the data.
4. Take action based on data.
Step 1: Define the Objective
• Experiments should be planned around
specific and clearly defined objectives.
• It makes little (or none at all) sense to
plan or conduct an experiment without
knowing why.
Step 1: Define the Objective
• To determine which type of rubber
(Rubber A or Rubber B) is best suited for
use in motorcycle tire production
How to apply DOE?
1. Plan the experiment
• Define the objectives
• Identify response variables
• Identify relevant factors
• Classify and prioritize factors
• Design the experiment
2. Execute the experiment.
3. Statistically analyze and interpret the data.
4. Take action based on data.
Step 2: Identify Response Variables
• What variables provide direct
information to address the objective of
the experiment?
• How would you measure them?
Step 2: Identify Response Variables
• The amount of wear on tires.
• How would you measure tire wear?
• Visual inspection
• Weigh the tire before and after
• Tread depth gauge
• Insert a coin in the tread
• Cross section thickness
• Steel belt exposure (yes or no)
• Which metrology would you pick?
How to apply DOE?
1. Plan the experiment
• Define the objectives
• Identify response variables
• Identify relevant factors
• Classify and prioritize factors
• Design the experiment
2. Execute the experiment.
3. Statistically analyze and interpret the data.
4. Take action based on data.
Step 3: Identify Relevant Factors
List additional factors that might affect the response.

It is usually not possible to list ALL factors; this is a


“best effort” exercise.
Always remember the 4 M’s and E when using the
fishbone diagram!
Example: Basketball Fishbone Diagram
Step 3: Identify Relevant Factors
Identify relevant factors: Category
Category Factor Category Factor
Man Method
Man Method
Man Method
Material Measurement
Material Measurement
Material Measurement
Machine Environment
Machine Environment
Machine Environment
How to apply DOE?
1. Plan the experiment
• Define the objectives
• Identify response variables
• Identify relevant factors
• Classify and prioritize factors
• Design the experiment
2. Execute the experiment.
3. Statistically analyze and interpret the data.
4. Take action based on data.
Step 4: Classify & Prioritize Factors
• Process vs. Noise Factors
• Controllable vs. Uncontrollable Factors
• Degree of influence on response variable
Process vs. Noise Factors
• A process factor is a factor directly under
study per the objective.
• A noise factor may affect the response but
is not under study.
Step 4: Classify & Prioritize Factors
• Process factor: type of rubber
• Noise factor: type of motorcycle, driver,
course, air pressure, etc.
Controllable vs. Uncontrollable Factor
A controllable factor is a factor in which the
values can be pre-determined.
An uncontrollable factor is a factor in which
the values cannot be pre-determined
Step 4: Classify & Prioritize Factors
Controllable Uncontrollable
Indoor temperature Outdoor temperature
Operators Operator’s mood
Variation around a
Settings on machines
machine set point
Batch ID of raw Properties of raw
material material in house
Step 4: Classify & Prioritize Factors
Degree of Influence
Prioritize the factors by their expected
impact on the response.
• High
• Moderate
• Low
Experiments are more complicated to design
and execute with more factors
Review: Steps 1-4
1. Clearly define objectives of the
experiment
2. Identify response variables and how they
will be measured
3. Identify relevant factors that may affect
response variables
4. Classify and prioritize factors
How to apply DOE?
1. Plan the experiment
• Define the objectives
• Identify response variables
• Identify relevant factors
• Classify and prioritize factors
• Design the experiment
2. Execute the experiment.
3. Statistically analyze and interpret the data.
4. Take action based on data.
Step 5: Design the Experiment
• Which noise factors to include?
• Factor settings
• Factor combinations
• Sample size
• Run order
Step 5: Design the Experiment
Which noise factors to include?
Uncontrollable noise factors
• Vary beyond our control.
• Can only be monitored throughout the
experiment.
Controllable noise factors may either be:
• Fixed at a single level throughout the
experiment.
• Varied according to the experimental plan.
Noise Factors: Motorcycle Example
Consider these two possible designs:
1. All controllable noise factors fixed.
• Perform all replicates with same motorcycle,
driver, course, air pressure, etc.
2. All controllable noise factors varied.
• Perform each replicate with a different
motorcycle, driver, course, air pressure, etc.
Step 5: Design the Experiment
Factor Settings
For each factor varied in the experiment,
one must determine:
• Number of levels
• Settings for each level
Step 5: Design the Experiment
Factor Settings
Number of levels
• Depends on purpose of experiment
• Screening: Many factors, few levels (usually two)
• Modeling: Few factors, more levels
Available resources
• Number of operators
• Available machines
• Time
• Raw material
Step 5: Design the Experiment
Factor Settings
Settings for each level
• Example: Tire Air Pressure
• Too Narrow: 39 and 40 PSI - Not far enough
apart to observe a difference
• Too Wide: 5 and 85 PSI - Invalid results (flat
tire and blow out)
• About Right: 35 and 45 PSI
• Process knowledge and engineering
judgment is required for best results.
Step 5: Design the Experiment
Factor Combinations
• Confounding
• Blocking
Combinations: Motorcycle Example
• Suppose you have chosen to include the
following factors and levels in an
experiment:
• Rubber type: A and B
• Motorcycles: Kawasaki and Harley
• Drivers: professional and amateur
• Routes: racetrack and highway
• How would you determine the factor
combinations?
Combinations: Motorcycle Example
Experimental Strategy 1:
• Rubber Type A:
• Professional drives a Kawasaki 400 miles
around a race track.
• Route is repeated 8 times with new tires on
each route.
• Rubber Type B:
• Amateur drives a Harley 400 miles from
Phoenix, Arizona to Mexico and back.
• Trip is repeated 8 times with new tires on each
trip
Design 1: Factor Combinations
Design Matrix
Confounding: Motorcycle Example
• Since the effects of rubber type,
motorcycle type, driving style, and
surfaces cannot be separated in this
experiment, we say that these effects are
confounded.
• An experiment must be planned carefully
to avoid confounding between any
important factors
• Can you think of a better way to design
this experiment?
Design 2: Factor Combinations
Design Matrix
• Utilize both
drivers, routes,
and motorcycles
for both rubber
types A and B.
Blocking
• Blocking is a technique to make a
more precise comparison in the
presence of variability.
• Units that belong to the same block
are more similar than units of
different blocks.
• Compare treatments or processes
within each block.
• Example of blocks: FabLots,
Assembly Lots, Material Lots,
Equipment, Operators, Days, etc

• Example of blocks: FabLots, Assembly Lots, Material


Lots, Equipment, Operators, Days, etc
Motorcycle Example: Blocking
• Design 3: Factor Combinations
• Include multiple drivers, motorcycles, and
routes.
• Consider each motorcycle trip a block.
• Randomly assign a tire of each type to the
front and back wheel of each motorcycle.
• Rotate tires at half way point of trip.
• What’s the benefit of the block design?
Step 5: Design the Experiment
Sample Size
• Replication
Replication
• Replication: Performing the same
treatment combination more than once.
• Required to determine whether the
experimental results are real or could
have occurred purely by chance.
False Replication
• Taking repeated measurements on the
same unit.
• Taking measurements from n units run
consecutively from a single set up.
Sample Size
• Sample size: The number of times an
experiment is replicated.
• Two most important items that
determine sample size:
• Sigma (s):The amount of variability of
different units treated alike.
• Delta (d):The size of the effect desired to
detect
Variation and Sample Size

• Means of A and B are the same in both scenarios.


• Variation is smaller in scenario 2.
• Which scenario would require larger sample sizes?
Delta and Sample Size

• Variation is the same in both scenarios.


• Delta is smaller in scenario 2.
• Which scenario would require larger sample
sizes?
Sample Size
• The larger the variation within group (s),
the larger the sample size.
• The smaller the differences between
groups (d), the larger the sample size.
Step 5: Design the Experiment
Run order
• Randomization
Run Order
•To ensure reasonable validity from the
results of an experiment, all unknown
sources of variation should affect each
experimental run equally.
•The most effective way of achieving this is
to carry out the experimental runs in a
completely randomized order.
How to apply DOE?
1. Plan the experiment
• Define the objectives
• Identify response variables
• Identify relevant factors
• Classify and prioritize factors
• Design the experiment
2. Execute the experiment.
3. Statistically analyze and interpret the data.
4. Take action based on data.
Execute the experiment
• Carefully execute the experiments to
assure clean data.
• Follow the order given on the worksheet.
• Record anything unusual that happens.
• Except for specified changes, do
everything the same way each time.
• How can you goof up an experiment?
Reasons for Poor Experimental Results
• Too much noise
• Incapable or unstable metrology
• Scope too limited
• Confounded effects
• Some uncontrolled factor changed
• Inadequate sample size
• Incorrect factor settings

You might also like