EDA

Engineering Data Analysis It is the characteristic or property that is measured, controlled, or
Mr. MARK JAVE C. GUALBERTO, RME manipulated in research. They differ in many respects, most notably in
the role they are given in the research and in the type of measures that
Lecturer I can be applied to them.
Chapter 1:OBTAINING DATA
Methods of Data Collection MATH 403- ENGINEERING
Planning and Conducting Surveys Methods of Data Collection
Planning and Conducting Experiments: Introduction to Design of
Experiments Collection of the data is the first step in conducting statistical inquiry.
It simply refers to the data gathering, a systematic method of collecting
and measuring data from different sources of information in order to
Obtaining Data provide answers to relevant questions. This involves acquiring
Introduction information published literature, surveys through questionnaires or
Statistics may be defined as the science that deals with the collection, interviews, experimentations, documents and records, tests or
organization, presentation, analysis, and interpretation of data in order examinations and other forms of data gathering instruments.
be able to draw judgments or conclusions that help in the decision- The person who conducts the inquiry is an investigator, the one who
making process. The two parts of this definition correspond to the two helps in collecting information is an enumerator and information is
main divisions of Statistics. These are Descriptive Statistics and collected from a respondent. Data can be primary or secondary.
Inferential Statistics. Descriptive Statistics, which is referred to in the According to Wessel, “Data collected in the process of investigation
first part of the definition, deals with the procedures that organize, are known as primary data.”
summarize and describe quantitative data. It seeks merely to describe
data. Inferential Statistics, implied in the second part of the definition, These are collected for the investigator’s use from the primary source.
deals with making a judgment or a conclusion about a population Secondary data, on the other hand, is collected by some other
based on the findings from a sample that is taken from the population. organization for their own use but the investigator also gets it for his
use. According to M.M. Blair, “Secondary data are those already in
existence for some other purpose than answering the question in
Obtaining Data hand.”
Statistical Terms
Before proceeding to the discussion of the different methods Planning and Conducting Surveys
of obtaining data, let us have first definition of some statistical terms:
A survey is a method of asking respondents some well-constructed
Population or Universe refers to the totality of objects, persons, questions. It is an efficient way of collecting information and easy to
places, things used in a particular study. All members of a particular administer wherein a wide variety of information can be collected. The
group of objects (items) or people (individual), etc. which are subjects researcher can be focused and can stick to the questions that interest
or respondents of a study. him and are necessary in his statistical inquiry or study However
Sample is any subset of population or few members of a population. surveys depend on the respondents honesty, motivation, memory and
his ability to respond. Sometimes answers may lead to vague data.
Data are facts, figures and information collected on some Surveys can be done through face-to-face interviews or self-
characteristics of a population or sample. These can be classified as administered through the use of questionnaires. The advantages of
qualitative or quantitative data. face-to-face interviews include fewer misunderstood questions, fewer
incomplete responses, higher response rates, and greater control over
Ungrouped (or raw) data are data which are not organized in any the environment in which the survey is administered; also, the
specific way. They are simply the collection of data as they are researcher can collect additional information if any of the respondents’
gathered. answers need clarifying. The disadvantages of face-to-face interviews
are that they can be expensive and time-consuming and may require
Grouped Data are raw data organized into groups or categories with a large staff of trained interviewers. In addition, the response can be
corresponding frequencies. Organized in this manner, the data is biased by the appearance or attitude of the interviewer.
referred to as frequency distribution.
Planning and Conducting Surveys
Parameter is the descriptive measure of a characteristic of a Self-administered surveys are less expensive than interviews. It can
population be administered in large numbers and does not require many
interviewers and there is less pressure on respondents. However, in
self-administered surveys, the respondents are more likely to stop
Statistic is a measure of a characteristic of sample participating mid-way through the survey and respondents cannot ask
to clarify their answers. There are lower response rates than in
Constant is a characteristic or property of a population or sample personal interviews.
which is common to all members of the group.
Variable is a measure or characteristic or property of a population or

sample that may have a number of different values. It differentiates a
particular member from the rest of the group.
Planning and Conducting Surveys A stratified sample is obtained by taking samples from each stratum
In choosing the respondents, sampling techniques are necessary. or sub-group of a population. When a sample is to be taken from a
Sampling is the process of selecting units (e.g., people, organizations) population with several strata, the proportion of each stratum in the
from a population of interest. Sample must be a representative of the sample should be the same as in the population.
target population. The target population is the entire group a
researcher is interested in; the group about which the researcher Stratified sampling techniques are generally used when the
wishes to draw conclusions. There are two ways of selecting a sample. population is heterogeneous, or dissimilar, where certain
These are the non-probability sampling and the probability sampling. homogeneous, or similar, sub-populations can be isolated (strata).
Simple random sampling is most appropriate when the entire
population from which the sample is taken is homogeneous
Non-Probability Sampling
Non-probability sampling is also called judgment or subjective Cluster Sampling
sampling. This method is convenient and economical but the
inferences made based on the findings are not so reliable. The most Cluster sampling is a sampling technique where the entire population
common types of non-probability sampling are the convenience is divided into groups, or clusters, and a random sample of these
sampling, purposive sampling and quota sampling. clusters are selected. All observations in the selected clusters are
In convenience sampling, the researcher use a device in obtaining the included in the sample.
information from the respondents which favors the researcher but can
cause bias to the respondents.
In purposive sampling, the selection of respondents is predetermined Planning and Conducting Experiments: Introduction to Design of
according to the characteristic of interest made by the researcher. Experiments
Randomization is absent in this type of sampling. The products and processes in the engineering and scientific
disciplines are mostly derived from experimentation. An experiment is
There are two types of quota sampling: proportional and non- a series of tests conducted in a systematic manner to increase the
proportional. understanding of an existing process or to explore a new product or
process. Design of Experiments, or DOE, is a tool to develop an
In proportional quota sampling the major characteristics of the experimentation strategy that maximizes learning using minimum
population by sampling a proportional amount of each is represented. resources. Design of Experiments is widely and extensively used by
For instance, if you know the population has 40% women and 60% engineers and scientists in improving existing process through
men, and that you want a total sample size of 100, you will continue maximizing the yield and decreasing the variability or in developing
sampling until you get those percentages and then you will stop. new products and processes. It is a technique needed to identify the
"vital few" factors in the most efficient manner and then directs the
Non-proportional quota sampling is a bit less restrictive. In this process to its best setting to meet the ever-increasing demand for
method, a minimum number of sampled units in each category is improved quality and increased productivity.
specified and not concerned with having numbers that match the
proportions in the population. The methodology of DOE ensures that all factors and their interactions
are systematically investigated resulting to reliable and complete
Probability Sampling information. There are five stages to be carried out for the design of
In probability sampling, every member of the population is given an experiments. These are planning, screening, optimization,
equal chance to be selected as a part of the sample. There are several robustness testing and verification.
probability techniques. Among these are simple random sampling,
stratified sampling and cluster sampling. Planning
It is important to carefully plan for the course of experimentation before
embarking upon the process of testing and data collection. At this
Simple random sampling is the basic sampling technique where a stage, identification of the objectives of conducting the experiment or
group of subjects (a sample) is selected for study from a larger group investigation, assessment of time and available resources to achieve
(a population). Each individual is chosen entirely by chance and each the objectives. Individuals from different disciplines related to the
member of the population has an equal chance of being included in product or process should compose a team who will conduct the
the sample. Every possible sample of a given size has the same investigation.
chance of selection; i.e. each member of the population is equally
likely to be chosen at any stage in the sampling process. Screening
Screening experiments are used to identify the important factors that
Stratified Sampling affect the process under investigation out of the large pool of potential
There may often be factors which divide up the population into sub- factors. Screening process eliminates unimportant factors and
populations (groups / strata) and the measurement of interest may attention is focused on the key factors. Screening experiments are
vary among the different subpopulations. This has to be accounted usually efficient designs which require few executions and focus on
for when a sample from the population is selected in order to obtain a the vital factors and not on interactions.
sample that is representative of the population. This is achieved by
stratified sampling. Optimization
After narrowing down the important factors affecting the process, then
determine the best setting of these factors to achieve the objectives of
the investigation.
Robustness Testing
Once the optimal settings of the factors have been determined, it is COMBINATION
important to make the product or process insensitive to variations It is an arrangement of a set of objects or things where order does not
resulting from changes in factors that affect the process but are count.
beyond the control of the analyst. Such factors are referred to as noise
or uncontrollable factors that are likely to be experienced in the
application environment. It is important to identify such sources of ODDS - It is the ratio of the probability of an event’s occurring to the
variation and take measures to ensure that the product or process is probability of its not occurring
made robust or insensitive to these factors.
Verification
PROBABILITY OF MUTUALLY EXCLUSIVE EVENTS
This final stage involves validation of the optimum settings by
conducting a few follow up experimental runs. This is to confirm that Two or more events are mutually exclusive if they cannot occur
the process functions as expected and all objectives are achieved. simultaneously or they cannot occur at the same time or they
don’t have common outcome.
PROBABILITY PROBABILITY OF INCLUSIVE EVENTS
MATH 403 – ENGINEERING DATA ANALYSIS
Two or more events are said to be inclusive, when one or the
Probability is the likelihood or chance of an event occurring. other or both can occur. In other words, two events are said to
be inclusive if they have a common outcome.
SAMPLE SPACE, EVENT, AND ELEMENT PROBABILITY OF INDEPENDENT EVENTS
Sample Space is the set of all possible outcomes or results of a Two events are independent if the occurrence or non-occurrence
random experiment. of one has no effect on the probability of the occurrence of the
is represented by letter S. other.
Event is the subset of sample space.

is represented by letter E.
PROBABILITY OF DEPENDENT EVENTS
Element each outcome in the sample space.
Two events are dependent if the occurrence or non-occurrence
Null Space is a subset of the sample space that contains no elements of one has effect on the probability of the occurrence of the
and is denoted by the symbol Ø also called empty space. other.
VENN DIAGRAM is a rectangle (the universal set) that includes circles

depicting the subsets.
INTERSECTION OF EVENTS
The intersection of two events A and B is denoted by the symbol A ∩
B.
It is the event containing all elements that are common to A and B.
UNION OF EVENTS
The Union of Events A and B is the event containing all the elements
that belong to A or to B or to both and is denoted by the symbol A ∪
B.
COMPLIMENT OF AN EVENT
Compliment of an event A with respect to S is the set of all elements
of S that are not in A and is denoted by Ac.
Permutation is arrangement of a set of objects or things in a specific

or definite order.
RING OR CYCLIC PERMUTATION

The number of ways of arranging n different things around a circle.
DISCRETE PROBABILITY DISTRIBUTION CUMULATIVE BINOMIAL PROBABILITY refers to the probability
that the binomial random variable falls within a specified range (e.g.,
RANDOM VARIABLES is greater than or equal to a stated lower limit and less than or equal
to a stated upper limit).
• Variable is a characteristic or attribute that can assume
different values A POISSON RANDOM VARIABLE is the number of successes that
• Random variable assigns a numerical values to each result from a Poisson experiment.
outcome of a chance event
• It is always represented by capital letters The probability distribution of a Poisson random variable is called a
POISSON DISTRIBUTION
DISCRETE RANDOM VARIABLES
▪ It can be represented by:
• Discrete random variables can take on either a finite or at ·Table
most a ·Graph
• countably infinite set of discrete values. ·Formula
• The outcome can assume only a specific number of
outcomes. A CUMULATIVE POISSON PROBABILITY refers to the probability
that the Poisson random variable is greater than some specified
PROBABILITY DISTRIBUTION of a random variable is the lists of all lower limit and less than some specified upper limit.
values of random variables (X) and their probabilities {P(X)}.
DISCRETE PROBABILITY DISTRIBUTION
Consist of the values a discrete random variable (X) can assume and
the corresponding probabilities {P(X)} of the values:It can be
represented by:
·Table
·Graph
·Formula
The EXPECTED VALUE of a discrete random variable is the weighted

average of all possible values that this random variable can take on.
Also called as mean, mathematical expectation, expectation, first

moment. It is the sum of the product of the random variable to its
probability.
Variance and Standard Deviation measures or describes how far a

set of assumed values of random variables is “spread out”.
Small variance or standard deviation means that the assumed values

or data points tend to be very close to the mean.
Higher variance or standard deviation means that the assumed values

or data points are spread out from the mean.
A binomial random variable is the number of successes “r” in “n”

repeated trials of a binomial experiment.
The probability distribution of a binomial random variable is called

BINOMIAL DISTRIBUTION.
▪ It can be represented by:

·Table
·Graph
·Formula

EDA

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

EDA

Uploaded by

Copyright:

Available Formats

Engineering Data Analysis It is the characteristic or property that is measured, controlled, or

Variable is a measure or characteristic or property of a population or

Event is the subset of sample space.

VENN DIAGRAM is a rectangle (the universal set) that includes circles

It is the event containing all elements that are common to A and B.

Permutation is arrangement of a set of objects or things in a specific

RING OR CYCLIC PERMUTATION

DISCRETE PROBABILITY DISTRIBUTION

The EXPECTED VALUE of a discrete random variable is the weighted

Also called as mean, mathematical expectation, expectation, first

Variance and Standard Deviation measures or describes how far a

Small variance or standard deviation means that the assumed values

Higher variance or standard deviation means that the assumed values

A binomial random variable is the number of successes “r” in “n”

The probability distribution of a binomial random variable is called

▪ It can be represented by:

You might also like