Professional Documents
Culture Documents
STAT 012 Engineering Data Analysis IMs
STAT 012 Engineering Data Analysis IMs
INSTRUCTIONAL MATERIALS
FOR
STAT 20023
ENGINEERING DATA ANALYSIS
Approved by:
VISION
MISSION
Ensuring inclusive and equitable quality education and promoting lifelong learning opportunities
through a re-engineered polytechnic university by committing to:
PHILOSOPHY
As a state university, the Polytechnic University of the Philippines believes that:
• Education is an instrument for the development of the citizenry and for the enhancement
of nation building;
• Meaningful growth and transformation of the country are best achieved in an
atmosphere of brotherhood, peace, freedom, justice and a nationalist-oriented education
imbued with the spirit of humanist internationalism.
TEN PILLARS
GOALS
PROGRAM OBJECTIVES
COURSE DESCRIPTION
This course is designed for undergraduate engineering students with emphasis on problem
solving related to social issues that engineers and scientists are called upon to solve. It
introduces different methods of data collection and the suitability of using a particular method for
a given situation. The relationship of probability to statistics is also discussed, providing
students with the tools they need to understand how “chance” plays a role in statistical analysis.
Probability distributions of random variables and their uses are also considered, along with a
discussion of linear functions of random variables within the context of their application of data
analysis and inference. The course also includes estimation techniques for unknown
parameters; and hypothesis testing used in making inferences from sample to population;
inference for regression parameters and build models for estimating means and predicting
future values of key variables under study. Finally, statistically based experiment design
techniques and analysis of outcomes of experiments are discussed with the aid of statistical
software.
COURSE OBJECTIVES
COURSE REQUIREMENTS
1. Students are required to attend the online class sessions regularly. The University
guidelines on attendance will be implemented.
2. The course is expected to have a minimum of four (3) quizzes and two (2) major
examination (Midterm and Final Examination).
3. Other requirements such as written outputs, exercises, assignments and the likes will be
given throughout the sessions. These shall be submitted on the due dates set by the
teacher.
GRADING SYSTEM
The grading system will determine if the student passed or failed the course. There will be two
grading periods: Midterm and Final Period. Each period has components of: 70% Class
SUBJECT:STAT 20023 ENGINEERING DATA ANALYSIS vi
COMPILED BY: MARIZ N. LANSAK, ,LPT
Standing + 30% Major Examination. Final Grade will be the average of the two periodical
grades.
Midterm Finals
Class Standing 70% Class Standing 70%
• Quizzes • Quizzes
• Activities • Activities
• Attendance • Project
Mid-term Examination 30% • Attendance
Final Examination 30%
FINAL GRADE = (Midterm + Finals) /2
COURSE GUIDE
Week Topic Learning Outcomes Methodology Resources Assessment
1 Orientation on Make the students at ease • Presentation of Course Syllabus
course syllabus; and eager to learn. the grading PUP Handbook
grading system; system; class
class management
management policies and the
policies; rest of the
syllabus. Montgomery,
• Clarification of Douglas C.
expectations. (2003). Applied
Obtaining Data, Obtain data using different Statistics and
and its method methods of data collection. Lecture Probability for
of data Engineers. 3rd Ed.
collection.
2 Planning and Understand and Lecture Montgomery, Assignment,
Conducting differentiate design of Discussion Douglas C. Seatwork
Surveys and experiments. Recitation (2003). Applied
Experiments Conduct survey and apply Problem Solving Statistics and
the method. Probability for
. Engineers. 3rd Ed.
3 Probability List sample space of Lecture Montgomery, Assignment
events. Discussion Douglas C. Quiz
Understand relationship Recitation (2003). Applied
among events. Problem Solving Statistics and
Apply counting rules in Probability for
getting probability. Engineers. 3rd Ed.
Discrete Understand the concept of Lecture Montgomery, Assignment
4-5 Probability discrete probability Discussion Douglas C. Seatwork
Distribution distribution. Recitation (2003). Applied
List the expected values of Problem Solving Statistics and
random variables. Probability for
Graph data appropriately. Engineers. 3rd Ed.
6-7 Continuous Understand the concept of Lecture Montgomery, Assignment
Probability discrete probability Discussion Douglas C. Seatwork
Distribution distribution. Recitation (2003). Applied
List the expected values of Problem Solving Statistics and
random variables. Probability for
Graph data appropriately. Engineers. 3rd Ed.
8 Statistical Find the confidence Lecture Montgomery, Assignment
Intervals intervals of single and Discussion Douglas C. Seatwork
multiple samples. Recitation (2003). Applied
Compute for the prediction Problem Solving Statistics and
and tolerance intervals. Probability for
Engineers. 3rd Ed.
18
FINAL EXAMINATION
Lesson 2 Probability 6
Unit 1 : Sample Space and Events 6
Unit 2 : Probability 8
Unit 3 : Rules of Probability 11
Overview
An engineer is someone who solves problems of interest to society by the efficient application of
scientific principles. Engineers accomplish this by either refining an existing product or process
or by designing a new product or process that meets customers ‘needs. The engineering, or
scientific, method is the approach to formulating and solving these problems.
Learning Objectives:
Course Materials:
Activities/Assessments:
• Differentiate the design experiments.
• Give examples of research study for different methods of data presented.
References:
1.) Montgomery, Douglas C. (2003). Applied Statistics and Probability for Engineers.
2.) DeCoursey(2003). Statistics and Probability for Engineering Application.
3). https://www.biostathandbook.com
Overview
Well-constructed data summaries and displays are essential to good statistical thinking,
because they can focus the engineer on important features of the data or provide insight about
the type of model that should be used in solving the problem.
Learning Objectives:
After successful completion of this lesson, you should be able to:
1. Collect data
2. Construct and interpret visual data displays, including the stem-and-leaf display,
the histogram, and the box plot
Course Materials:
Activities/Assessments:
References:
1.) Montgomery, Douglas C. (2003). Applied Statistics and Probability for Engineers.
2.) DeCoursey(2003). Statistics and Probability for Engineering Application.
3). https://www.biostathandbook.com
Overview:
To model and analyze a random experiment, we must understand the set of possible outcomes
from the experiment. In this introduction to probability, we make use of the basic concepts of
sets and operations on sets.
Learning Objectives:
Course Materials:
Random experiment an experiment that can result in different outcomes, even though it is
repeated in the same manner every time.
The set of all possible outcomes of a random experiment is called the sample space (S) of the
experiment. The sample space is denoted as S. A sample space is discrete if it consists of a
finite or countable infinite set of outcomes. A sample space is continuous if it contains an
interval (either finite or infinite) of real numbers.
List all the sample space of the following.
➢ Set of even nos.
S = {2, 4, 6, 8, …}
➢ Experiment in which the thickness is measured until a connector fails to meet the
specifications.
S = { n, yn, yyn, yyyn, yyyyn, and so forth }
If items are replaced before the next one is selected, the sampling is referred to as with
replacement. Then the possible ordered outcomes are
Swith = {aa, ab, ac, ba, bb, bc, ca, cb, cc}
The unordered description of the sample space is {{a, a}, {a, b}, {a, c}, {b, b}, {b, c}, {c,
c}}.
Shock Resistance
High 10 4
Low 1 5
Let A denote the event that a sample has high shock resistance, and let B denote the
event that a sample has high scratch resistance.
Then, the number of samples in
SUBJECT:STAT 20023 ENGINEERING DATA ANALYSIS 7
COMPILED BY: MARIZ N. LANSAK, ,LPT
A ∩ B = 10
A’ = 9
A ∪ B = 15
Activities/Assessments:
1. Measurement of time needed to complete a chemical reaction, (118 hrs.)
E1 = {x| 1≤ x < 10} E2 = {x| 3< x < 118}
Find:
1. E1 ∪ E2
2. E1 ∩ E2
3. E1’
4. E1’∪ E2
5. E1 ∩ E2’
2. Samples of furniture are analyzed for surface and edge finish. The results from 303
samples are summarized as follows:
Edge Finish
Excellent 10 2
Good 10 8
Let A denote the event that a sample is excellent surface finish, and let B denote the
event that a sample is excellent edge finish.
Determine the number of samples in
1. A’ ∩ B = ________
2. B’ = ____________
3. A ∪ B = _________
References:
1.) Montgomery, Douglas C. (2003). Applied Statistics and Probability for Engineers.
2.) DeCoursey(2003). Statistics and Probability for Engineering Application.
3). https://www.biostathandbook.com
Unit 2 – Probability
Overview:
In this lesson, we introduce probability for discrete sample spaces—those with only a finite (or
countably infinite) set of outcomes. The restriction to these sample spaces enables us to
simplify the concepts and the presentation without excessive mathematics.
Learning Objectives:
Course Materials:
Probability (P)
Suppose that a batch contains six parts with part numbers {a, b, c, d, e, f}.
*(It has 30 outcomes)
Suppose that two parts are selected without replacement.
Let E denote the event that the part number of the first part selected is a.
Then E can be written as E = {ab, ac, ad, ae, af}.
P(E) = 5/ 30 = 1/6
Also, if E2 denotes the event that the second part selected is a,
E2 = {ba, ca, da, ea, fa} and with equally likely outcomes
P(E2) = 5/30 or 1/6
Activities/Assessments:
2. Samples of furniture are analyzed for surface and edge finish. The results from 303
samples are summarized as follows:
Edge Finish
Excellent 10 2
Good 10 8
Let A denote the event that a sample has excellent edge finish, and let B denote the
event that a sample has excellent surface finish.
Determine the probability of polycarbonate plastic are
1. Excellent surface finish = ___________
2. Neither excellent edge finish nor excellent surface fish = _______
3. Good edge finish = __________
4. Good surface finish and excellent edge = _______
5. Either good surface finish or good edge finish = _______
References:
1.) Montgomery, Douglas C. (2003). Applied Statistics and Probability for Engineers.
2.) DeCoursey(2003). Statistics and Probability for Engineering Application.
3). https://www.biostathandbook.com
Overview:
Joint events are generated by applying basic set operations to individual events. Unions of
events, such as A U B; intersections of events, such as A ∩ B; and complements of events,
such as A’, are commonly of interest. The probability of a joint event can often be determined
from the probabilities of the individual events that comprise it. Basic set operations are also
sometimes helpful in determining the probability of a joint event.
Learning Objectives:
Course Materials:
Rules of Probability
1. Addition Rules expression for the probability of the union of two events.
Example 1. The wafers were classified as either in the “center’’ or at the “edge’’ of the
sputtering tool that was used in manufacturing, and by the degree of contamination.
Table 2-2 shows the proportion of wafers in each category.
(A) What is the probability that a wafer was either at the edge or that it contains four or
more particles? Let E1 wafer contains four or more particles, E2 wafer is at the edge.
The requested probability is P(E1 U E2). P(E1) = 0.15 and P(E2) = 0.28. Also from the
table, P(E1 ∩ E2) = 0.04. Therefore,
P(E1 U E2) = 0.15 + 0.28 – 0.04 = 0.39
(B) What is the probability that a wafer contains less than two particles or that it is both
at the edge and contains more than four particles? Let E 1 wafer contains less than two
particles, E2 wafer is both from the edge and contains more than four particles.
Now, P(E1) = 0.60 and P(E2) = 0.03. From the table, there are no wafers in the
intersection and P(E1 ∩ E2) = 0. Therefore,
P(E1 U E2) = 0.60 + 0.03 = 0.63
3. Multiplication Rule: expression for the probability of the intersection of two events.
Activities/Assessments:
References:
1.) Montgomery, Douglas C. (2003). Applied Satistics and Probability for Engineers.
2.) DeCoursey(2003). Satistics and Probability for Engineering Application.
3). https://www.biostathandbook.com
Overview:
Many physical systems can be modeled by the same or similar random experiments and
random variables. The distribution of the random variables involved in each of these common
systems can be analyzed, and the results of that analysis can be used in different applications
and examples. In this lesson, we present the analysis of several random experiments and
discrete random variables that frequently arise in applications
Learning Objectives:
Course Materials:
There is a chance that a bit transmitted through a digital transmission channel is received in
error. Let X equal the number of bits in error in the next four bits transmitted. The possible
values for X are {0, 1, 2, 3, 4}. Based on a model for the errors that is presented in the following
section, probabilities for these values will be determined. Suppose that the probabilities are
No. of errors 0 1 2 3 4
Probabilities 0.6561 0.2916 0.0486 0.0036 0.0001
Activities/Assessments:
Errors in an experimental transmission channel are found when the transmission is checked by
a certifier that detects missing pulses. The number of errors found in an eight bit byte is a
random variable with the following distribution:
References:
1.) Montgomery, Douglas C. (2003). Applied Statistics and Probability for Engineers.
2.) DeCoursey(2003). Statistics and Probability for Engineering Application.
3). https://www.biostathandbook.com
Overview:
A trial with only two possible outcomes is used so frequently as a building block of a random
experiment that it is called a Bernoulli trial. The terms success and failure are just labels. It is
usually assumed that the trials that constitute the random experiment are independent. This
implies that the outcome from one trial has no effect on the outcome to be obtained from any
other trial. Furthermore, it is often reasonable to assume that the probability of a success in
each trial is constant.
Learning Objectives:
Course Materials:
Binomial Distribution
Activities/Assessments:
For each scenario described below, state whether or not the binomial distribution is a
reasonable model for the random variable and why. State any assumptions you make.
(a) A production process produces thousands of temperature transducers. Let X denote
the number of nonconforming transducers in a sample of size 30 selected at random
from the process.
(b) From a batch of 50 temperature transducers, a sample of size 30 is selected without
replacement. Let X denote the number of nonconforming transducers in the sample.
(c) Four identical electronic components are wired to a controller that can switch from a
failed component to one of the remaining spares. Let X denote the number of
components that have failed after a specified period of operation.
(d) Let X denote the number of accidents that occur along the federal highways in
Arizona during a one-month period.
A particularly long traffic light on your morning commute is green 20% of the time that you
approach it. Assume that each morning represents an independent trial.
(a) Over five mornings, what is the probability that the light is green on exactly one day?
(b) Over 20 mornings, what is the probability that the light is green on exactly four days?
(c) Over 20 mornings, what is the probability that the light is green on more than four
days?
(d) Compute for the mean, variance, standard deviation of item (c).
References:
1.) Montgomery, Douglas C. (2003). Applied Statistics and Probability for Engineers.
2.) DeCoursey(2003). Statistics and Probability for Engineering Application.
3). https://www.biostathandbook.com
Overview:
Because the range is any value in an interval, the model provides for any precision in length
measurements. However, because the number of possible values of the random variable X is
uncountably infinite, X has a distinctly different distribution from the discrete random variables
studied previously. The range of X includes all values in an interval of real numbers; that is, the
range of X can be thought of as a continuum. A number of continuous distributions frequently
arise in applications.
Learning Objectives:
Course Materials:
A probability density function f(x) can be used to describe the probability distribution of a
continuous random variable X. If an interval is likely to contain a value for X, its probability is
large and it corresponds to large values for f(x). The probability that X is between a and b is
determined as the integral of f(x) from a to b.
Activities/Assessments:
References:
Overview:
Well-constructed data summaries and displays are essential to good statistical thinking,
because they can focus the engineer on important features of the data or provide insight about
the type of model that should be used in solving the problem.
Learning Objectives:
Course Materials:
Random variables with different means and variances can be modeled by normal probability
density functions with appropriate choices of the center and width of the curve.
A.
B.
Activities/Assessments:
2. The line width of for semiconductor manufacturing is assumed to be normally distributed with
a mean of 0.5 micrometer and a standard deviation of 0.05 micrometer.
(a) What is the probability that a line width is greater than 0.62 micrometer?
(b) What is the probability that a line width is between 0.47 and 0.63 micrometer?
(c) The line width of 90% of samples is below what value?
References:
1.) Montgomery, Douglas C. (2003). Applied Statistics and Probability for Engineers.
2.) DeCoursey(2003). Statistics and Probability for Engineering Application.
3). https://www.biostathandbook.com
Overview:
The field of statistical inference consists of those methods used to make decisions or to draw
conclusions about a population. These methods utilize the information contained in a sample
from the population in drawing conclusions. This chapter begins our study of the statistical
methods used for inference and decision making. In practice, the engineer will use sample data
to compute a number that is in some sense a reasonable value (or guess) of the true mean.
This number is called a point estimate. We will see that it is possible to establish the precision of
the estimate.
Learning Objectives:
Course Materials:
Unbiased Estimators
An estimator should be “close” in some sense to the true value of the unknown parameter.
SUBJECT:STAT 20023 ENGINEERING DATA ANALYSIS 27
COMPILED BY: MARIZ N. LANSAK, ,LPT
When an estimator is unbiased, the bias is zero; that is
Sometimes there are several unbiased estimators of the sample population parameter. For
example, suppose we take a random sample of size from a normal population and obtain the
data x1 = 12.8, x2 = 9.4, x3 = 8.7, x4 = 11.6, x5 = 13.1, x6 = 9.8, x7 = 14.1, x8 = 8.5, x9 = 12.1, x10 =
10.3.
Sample Mean
Sample Median
When the numerical value or point estimate of a parameter is reported, it is usually desirable to
give some idea of the precision of estimation. The measure of precision usually employed is the
standard error of the estimator that has been used.
An article in the Journal of Heat Transfer (sec. C, 96, 1974, p. 59) described
a new method of measuring the thermal conductivity of Armco iron. Using a
temperature of 100°F and a power input of 550 watts, the following 10
measurements of thermal were obtained:
41.60, 41.48, 42.34, 41.95, 41.86, 42.18, 41.72, 42.26, 41.81, 42.05
Sometimes we find that biased estimators are preferable to unbiased estimators because they
have smaller mean square error. That is, we may be able to reduce the variance of the
estimator considerably by introducing a relatively small amount of bias. As long as the reduction
in variance is greater than the squared bias, an improved estimator from a mean square error
viewpoint will result.
Activities/Assessments:
Data on pull-off force (pounds) for connectors used in an automobile engine application are as
follows: 79.3, 75.1, 78.2, 74.1, 73.9, 75.0, 77.6, 77.3, 73.8, 74.6, 75.5, 74.0, 74.7, 75.9, 72.9,
73.8, 74.2, 78.1, 75.4, 76.3, 75.3, 76.2, 74.9, 78.0, 75.1, 76.8.
(a) Calculate a point estimate of the mean pull-off force of all connectors in the
population. State which estimator you used and why.
(b) Calculate a point estimate of the pull-off force value that separates the weakest 50%
of the connectors in the population from the strongest 50%.
(c) Calculate point estimates of the population variance and the population standard
deviation.
(d) Calculate the standard error of the point estimate found in part (a). Provide an
interpretation of the standard error.
(e) Calculate a point estimate of the proportion of all connectors in the population whose
pull-off force is less than 73 pounds.
References:
1.) Montgomery, Douglas C. (2003). Applied Statistics and Probability for Engineers.
2.) DeCoursey(2003). Statistics and Probability for Engineering Application.
3). https://www.biostathandbook.com
Overview:
Statistical inference is concerned with making decisions about a population based on the
information contained in a random sample from that population.
Learning Objectives:
The sampling distribution of a statistic depends on the distribution of the population, the size of
the sample, and the method of sample selection.
If we are sampling from a population that has an unknown probability distribution, the sampling
distribution of the sample mean will still be approximately normal with mean μ and variance σ2/2
if the sample size n is large.
References:
1.) Montgomery, Douglas C. (2003). Applied Statistics and Probability for Engineers.
2.) DeCoursey(2003). Statistics and Probability for Engineering Application.
3). https://www.biostathandbook.com
Overview:
Learning Objectives:
Course Materials:
A confidence interval estimate for μ is an interval of the form l ≤ μ ≤ u, where the endpoints l
and u are computed from the sample data.
Types of Interval Estimate:
1. A confidence interval bounds population or distribution parameters (such as the mean
viscosity).
2. A tolerance interval bounds a selected proportion of a distribution.
3. A prediction interval bounds future observations from the population or distribution.
c08
The required quantities are zα/2 = 1.96, n = 53, s = 0.3486, and x̅ = 0.5250. The
approximate 95%
Activities/Assessments:
The yield of a chemical process is being studied. From previous experience yield is known to
be normally distributed. The past five days of plant operation have resulted in the following
percent yields: 91.6, 88.75, 90.8, 89.95, and 91.3. Find a 95% two-sided confidence interval on
the true mean yield.
References:
Overview:
Statistical hypothesis testing and confidence interval estimation of parameters are the
fundamental methods used at the data analysis stage of a comparative experiment, in which the
engineer is interested, for example, in comparing the mean of a population to a specified value.
Learning Objectives:
Course Materials:
Many problems in engineering require that we decide whether to accept or reject a statement
about some parameter. The statement is called a hypothesis, and the decision-making
procedure about the hypothesis is called hypothesis testing. This is one of the most useful
aspects of statistical inference, since many types of decision-making problems, tests, or
experiments in the engineering world can be formulated as hypothesis-testing problems.
➢ Types of Hypotheses:
• The null hypothesis, H0, is the hypothesis to be tested. It has a statement of equality,
such as =. It states that there is no significant difference between a parameter and a
statistic.
• The alternative hypothesis, HA is a hypothesis that has a statement of inequality, such
as >, <, or ≠. It states a specific difference between a parameter and a statistic.
Each hypothesis is the complement of the other. The claim could be in the null
hypothesis or the alternative hypothesis, but not both.
➢ Level of Significance
• The level of significance, denoted by α, is the probability of rejecting the null
hypothesis when it is true.
➢ Rejection Region
• The rejection region is an interval of values for which H0 will be rejected.
• This region depends on the level of significance α.
• If the calculated test statistic falls within the rejection region, H0 will be rejected and HA is
accepted.
1. One-Tailed
a. One-tailed Left Tailed
H1: parameter < value
➢ Test Statistic
• Z-Test used when the variables follow a normal distribution and the population variance
is known. It is also applicable when the population variance is not known but the sample
size is greater than or equal to 30.
• T-Test used when the population variance is unknown and the sample size is small, n<30.
Activities/Assessments:
Explain the importance of hypothesis testing.
References:
1.) Montgomery, Douglas C. (2003). Applied Satistics and Probability for Engineers.
2.) DeCoursey(2003). Satistics and Probability for Engineering Application.
Overview:
In order to draw a conclusion from the process of hypothesis testing, an appropriate test statistic
is needed. A certain requirement should consider before using them.
Learning Objectives:
Course Materials:
where:
z = z-value
x̅ = sample mean
μ = hypothesized population mean
σ = population standard deviation
n = sample size
Example1. A random sample of death in the Philippines during the past year showed an
average life span of 68.2 years, with the standard deviation of 8.9 years. Does this seem
to indicate that the average life span today is less than 70 years? Use 0.05 level of
significance.
Step 1. Formulate null and alternative hypotheses
H0 : , H1 :
Step 2. Determine the level of significance and critical value
α=0.05
Critical region: z < 1.645
Step 3. Calculate the test statistic
Test Criteria
Given: x̅ = 68.2
μ = 70
Example 2. For many school years, a teacher has recorded his students’ grades, and the mean
for these students’ grades is 82 with standard deviation of 12. The current class of 70 students
seems to be better than the average in ability. The teacher wants to show that according to their
average the current class is superior to the previous classes. Does the class mean of 85.2 and
standard deviation of 6.25 present sufficient evidence to support the teacher’s claim that the
current class is superior? Use 0.05 level of significance.
Step 1. Formulate null and alternative hypotheses
H0 : , H1 :
Step 2. Determine the level of significance and critical value
α=0.05
Critical region: z < 1.96
Step 3. Calculate the test statistic
Test Criteria
Given: x̅ = 82
μ = 85.2
= 6.25
n = 70
z = x̅ - μ = 82 – 85.2 = - 3.2 = -1.8 = -2.4
0.75
Step 4. Make a decision
Compare the computed z and the z value (critical region)
-2.4 < 1.96
Reject H0
Step 5. Conclusion
The current class is superior.
where:
t = t-value
x̅ = sample mean
SUBJECT:STAT 20023 ENGINEERING DATA ANALYSIS 39
COMPILED BY: MARIZ N. LANSAK, ,LPT
μ = hypothesized population mean
s = sample standard deviation
n = sample size
Activities/Assessments:
Test the following:
1. The life in hours of a battery is known to be approximately normally distributed, with
standard deviation = 1.25 hours. A random sample of 10 batteries has a mean life of x̅
= 40.5 hours. Is there evidence to support the claim that battery life exceeds 40 hours?
Use α =0.05.
2. Medical researchers have developed a new artificial heart constructed primarily of
titanium and plastic. The heart will last and operate almost indefinitely once it is
implanted in the patient’s body, but the battery pack needs to be recharged about every
four hours. A random sample of 50 battery packs is selected and subjected to a life test.
The average life of these batteries is 4.05 hours. Assume that battery life is normally
distributed with standard deviation =0.2 hour. Is there evidence to support the claim
that mean battery life exceeds 4 hours? Use α = 0.05.
References:
1.) Montgomery, Douglas C. (2003). Applied Satistics and Probability for Engineers.
2.) DeCoursey(2003). Satistics and Probability for Engineering Application.
3). https://www.biostathandbook.com
Overview:
Another way of determining the significance of a test is by applying the p-value method or the
probability value method. In this part, let’s test hypothesis using p-value method.
Learning Objectives:
Course Materials:
Activities/Assessments:
2. The mean water temperature downstream from a power plant cooling tower discharge
pipe should be no more than 100°F. Past experience has indicated that the standard
deviation of temperature is =2°F. The water temperature is measured on nine
randomly chosen days, and the average temperature is found to be 98°F with standard
deviation 0f 0.98,
(a) Should the water temperature be judged acceptable with α =0.05?
(b) What is the P-value for this test?
References:
1.) Montgomery, Douglas C. (2003). Applied Statistics and Probability for Engineers.
2.) DeCoursey(2003). Statistics and Probability for Engineering Application.
3). https://www.biostathandbook.com
Unit 4- Proportion
Overview:
Following the data in binomial distribution, this test of proportion is the best suited to test for the
significance of an experiment.
Learning Objectives:
Course Materials:
Step 5. Conclusion
The process is capable.
Activities/Assessments:
References:
1.) Montgomery, Douglas C. (2003). Applied Statistics and Probability for Engineers.
2.) DeCoursey(2003). Statistics and Probability for Engineering Application.
3). https://www.biostathandbook.com
Overview:
Engineers and scientists are often interested in comparing two different conditions to determine
whether either condition produces a significant effect on the response that is observed. These
conditions are sometimes called treatments.
Learning Objectives:
Course Materials:
Two catalysts are being analyzed to determine how they affect the mean
yield of a chemical process. Specifically, catalyst 1 is currently in use,
but catalyst 2 is acceptable. Since catalyst 2 is cheaper, it should be
adopted, providing it does not change the process yield. A test is run in
the pilot plant and results in the data shown in Table 10-1. Is there any
difference between the mean yields? Use α = 0.05, and assume equal
variances.
Step 5. Conclusion
That is, at the 0.05 level of significance, we do not have strong evidence to
conclude that catalyst 2 results in a mean yield that differs from the mean yield when
catalyst 1 is used.
In semiconductor manufacturing, wet chemical etching is often used to remove silicon from the
backs of wafers prior to metalization. The etch rate is an important characteristic in this process
and known to follow a normal distribution. Two different etching solutions have been compared,
using two random samples of 10 wafers for each solution. The observed etch rates are as
follows (in mils per minute):
Overview:
A special case of the two-sample t-tests occurs when the observations on the two populations of
interest are collected in pairs. Each pair of observations, say (X1, X2), is taken under
homogeneous conditions, but these conditions may change from one pair to another.
Learning Objectives:
Course Materials:
If there is no difference between tips, the mean of the differences should be zero. This test
procedure is called the paired t-test.
Step 5. Conclusion
We conclude that the strength prediction methods yield different results.
Specifically, the data indicate that the Karlsruhe method produces, on the average,
higher strength predictions than does the Lehigh method.
References:
1.) Montgomery, Douglas C. (2003). Applied Statistics and Probability for Engineers.
2.) DeCoursey(2003). Statistics and Probability for Engineering Application.
3). https://www.biostathandbook.com
Overview:
We now consider the case where there are two binomial parameters of interest, say, p1 and p2,
and we wish to draw inferences about these proportions. We will present large-sample
hypothesis testing and confidence interval procedures based on the normal approximation to
the binomial.
Learning Objectives:
Course Materials:
Suppose that two independent random samples of sizes n1 and n2 are taken from two
populations, and let X1 and X2 represent the number of observations that belong to the class of
interest in samples 1 and 2, respectively. Furthermore, suppose that the normal approximation
Two different types of injection-molding machines are used to form plastic parts. A part is
considered defective if it has excessive shrinkage or is discolored. Two random samples, each
of size 300, are selected, and 15 defective parts are found in the sample from machine 1 while 8
defective parts are found in the sample from machine 2. Is it reasonable to conclude that both
machines produce the same fraction of defective parts, using α = 0.05?
References:
1.) Montgomery, Douglas C. (2003). Applied Statistics and Probability for Engineers.
2.) DeCoursey(2003). Statistics and Probability for Engineering Application.
3.) https://www.biostathandbook.com
Overview:
Many problems in engineering and science involve exploring the relationships between two or
more variables. Regression analysis is a statistical technique that is very useful for these types
of problems. For example, in a chemical process, suppose that the yield of the product is related
to the process-operating temperature. Regression analysis can be used to build a model to
predict yield at a given temperature level. This model can also be used for process optimization,
such as finding the level of temperature that maximizes yield, or for process control purposes.
Learning Objectives:
Course Materials:
Regression Analysis
The collection of statistical tools that are used to model and explore relationships between
variables that are related in nondeterministic manner is called regression analysis.
Basics of regressions
Which is the RESPONSE and which is the PREDICTOR?
The response or dependent variable varies with different values of the regressor/predictor.
The predictor values are fixed: we observe the response for these fixed values
The focus is in explaining the response variable in association with one or more predictors
(A) Draw a scatter diagram of the data. Does a simple linear regression model seem
appropriate here? (b) Fit the simple linear regression model using the method of least squares.
(c) Estimate the mean chloride concentration for a watershed that has 1% roadway area.
(d) Find the fitted value corresponding to x = 0.47.
References:
1.) Montgomery, Douglas C. (2003). Applied Statistics and Probability for Engineers.
2.) DeCoursey(2003). Statistics and Probability for Engineering Application.
3.) https://www.biostathandbook.com
Unit 2- Correlation
Overview:
In statistics, correlation or dependence is any statistical relationship, whether causal or not, between
two random variables or bivariate data. In the broadest sense correlation is any statistical
association, though it commonly refers to the degree to which a pair of variables are linearly related.
Learning Objectives:
Course Materials:
Correlation means association - more precisely it is a measure of the extent to which two
variables are related. There are three possible results of a correlational study: a positive
correlation, a negative correlation, and no correlation.
A positive correlation is a relationship between two variables in which both variables move
in the same direction. Therefore, when one variable increases as the other variable increases,
or one variable decreases while the other decreases
A negative correlation is a relationship between two variables in which an increase in one
variable is associated with a decrease in the other.
A zero correlation exists when there is no relationship between two variables.
SUBJECT:STAT 20023 ENGINEERING DATA ANALYSIS 54
COMPILED BY: MARIZ N. LANSAK, ,LPT
Scatter Plot
A correlation can be expressed visually. This is done by drawing a scatter plot. It is a graphical
display that shows the relationships or associations between two numerical variables (or co-
variables), which are represented as points (or dots) for each pair of score. It also indicates the
strength and direction of the correlation between the co-variables.
References:
1.) Montgomery, Douglas C. (2003). Applied Statistics and Probability for Engineers.
2.) DeCoursey(2003). Statistics and Probability for Engineering Application.
3.) https://www.biostathandbook.com
Overview:
Many applications of regression analysis involve situations in which there are more than one
regressor variable. A regression model that contains more than one regressor variable is called a
multiple regression model.
Learning Objectives:
Course Materials:
where Y represents the tool life, x1 represents the cutting speed, x2 represents the tool angle,
and ∈ is a random error term. This is a multiple linear regression model with two regressors.
The term linear is used because it is a linear function of the unknown parameters β0, β1 and β2.
The regression model describes a plane in the three-dimensional space of Y, x1, and x2.
The electric power consumed each month by a chemical plant is thought to be related to the
average ambient temperature (x1), the number of days in the month (x2), the average product
purity (x3), and the tons of product produced (x 4). The past year’s historical data are available
and are presented in the table. Fit a multiple linear regression model to these data.
References:
1.) Montgomery, Douglas C. (2003). Applied Statistics and Probability for Engineers.
2.) DeCoursey(2003). Statistics and Probability for Engineering Application.
3.) https://www.biostathandbook.com
Overview:
Experiments are a natural part of the engineering and scientific decision-making process. Suppose,
for example, that a civil engineer is investigating the effects of different curing methods on the mean
compressive strength of concrete. The experiment would consist of making up several test
specimens of concrete using each of the proposed curing methods and then testing the compressive
strength of each specimen. The data from this experiment could be used to determine which curing
method should be used to provide maximum mean compressive strength.
Learning Objectives:
Course Materials:
Experimental design methods are also useful in engineering design activities, where new
products are developed and existing ones are improved. Some typical applications of
statistically designed experiments in engineering design include
1. Evaluation and comparison of basic design configurations
2. Evaluation of different materials
3. Selection of design parameters so that the product will work well under a wide variety of field
conditions (or so that the design will be robust)
4. Determination of key product design parameters that affect product performance
ANOVA or Analysis of Variance is used to test the difference between 3 or more means.
The compressive strength of concrete is being studied, and four different mixing techniques are
being investigated. The following data have been collected. Test the hypothesis that mixing
techniques affect compressive strength of the concrete. Use α 0.05.
References:
1.) Montgomery, Douglas C. (2003). Applied Satistics and Probability for Engineers.
2.) DeCoursey(2003). Satistics and Probability for Engineering Application.
3.) https://www.biostathandbook.com
Overview:
SUBJECT:STAT 20023 ENGINEERING DATA ANALYSIS 61
COMPILED BY: MARIZ N. LANSAK, ,LPT
An experiment is just a test or series of tests. Experiments are performed in all engineering and
scientific disciplines and are an important part of the way we learn about how systems and
processes work. The validity of the conclusions that are drawn from an experiment depends to a
large extent on how the experiment was conducted. Therefore, the design of the experiment plays a
major role in the eventual solution of the problem that initially motivated the experiment. The factorial
experimental design will be introduced as a powerful technique for this type of problem. Generally, in
a factorial experimental design, experimental trials (or runs) are performed at all combinations of
factor levels.
Learning Objectives:
Course Materials:
FACTORIAL EXPERIMENTS
When several factors are of interest in an experiment, a factorial experimental designshould be
used. ANOVA or Analysis of Variance is used to test the difference between 3 or more means.
Computation is as follows:
*Note: Use of SPSS will follow for further discussion as well as with 2k design.
Activities/Assessments:
An engineer who suspects that the surface finish of metal parts is influenced by the type of paint
used and the drying time. He selected three drying times—20, 25, and 30 minutes— and used
two types of paint. Three parts are tested with each combination of paint type and drying time.
State and test the appropriate hypotheses using the analysis of variance. Use α 0.05. The data
are as follows:.
References:
1.) Montgomery, Douglas C. (2003). Applied Statistics and Probability for Engineers.
2). DeCoursey(2003). Statistics and Probability for Engineering Application.
3.) https://www.biostathandbook.com