Professional Documents
Culture Documents
DoE Regression Models 3jan19 v20
DoE Regression Models 3jan19 v20
Design of Experiments
1 © 2018 – Confidential
Regression Models
2 © 2018 – Confidential
Agenda
1 Introduction
3 Parameters Estimation
4 Examples
3 © 2018 – Confidential
Introduction
o In many problems two or more variables are related, and it is of interest to model and
explore this relationship
o Example: in a chemical process the yield of product is related to the operating temperature
o The chemical engineer may want to build a model relating yield to temperature and then use
the model for:
o prediction
o process optimization
o process control
o In general, suppose that there is a single dependent variable or response y that depends on k
independent or regressor variables, for example, x1, x2, . . . , xk. The relationship between
these variables is characterized by a mathematical model called a regression model
o Regression methods are frequently used to analyze data from unplanned experiments, such
as might arise from observation of uncontrolled phenomena or historical records
o Regression methods are also very useful in designed experiments where something has “gone
wrong”
4 © 2018 – Confidential
Agenda
1 Introduction
3 Parameters Estimation
4 Examples
5 © 2018 – Confidential
Linear Regression Models
o suppose that we wish to develop an empirical model relating the viscosity of a polymer to the
temperature and the catalyst feed rate. A model that might describe this relationship is:
Y: Viscosity
X1: Temperature
X2: Catalyst feed rate
6 © 2018 – Confidential
Multiple Linear Regression
o In general, the response variable y may be related to k regressor variables.
o The model is called a multiple linear regression model with k regressor variables.
o The parameter βj represents the expected change in response y per unit change in x j when all
the remaining independent variables x i (i ≠ j) are held constant.
7 © 2018 – Confidential
Approximation to Linear Regression
o Models that are more complex in appearance may often still be analyzed by
multiple linear regression techniques
o For example, consider adding an interaction term to the first-order model in two
variables, say:
o If we let x3 = x1x2 and β3 = β12, then above equation can be written as:
8 © 2018 – Confidential
Another Example
o Consider the second-order response surface model in two variables:
o If we let x3 = x12, x4 = x22, x5 = x1x2, β3 = β11, β4 = β22, and β5 = β12, then this becomes:
9 © 2018 – Confidential
Agenda
1 Introduction
3 Parameters Estimation
4 Examples
10 © 2018 – Confidential
Least Squares
o Least Squares is typically used to estimate the regression coefficients in a multiple
linear regression model.
o The method of least squares chooses the β’s in the equation above so that the sum
of the squares of the errors, ϵi, is minimized
11 © 2018 – Confidential
Least Squares (Cont’d)
o The method of least squares chooses the β’s in the equation so that the sum of the
squares of the errors, ϵi, is minimized
o The function L is to be minimized with respect to β0, β1, . . . , βk. The least squares
estimators, must satisfy:
12 © 2018 – Confidential
Least Squares (Cont’d)
o Simplifying, we get the least squares normal equations
13 © 2018 – Confidential
Matrix Notation
o In matrix notation, the model maybe written as:
Where:
o y is (n x 1) vector
o X is (n x p) matrix of levels of independent variables
o β is (p x 1) vector
o ϵ is (n x 1) vector
14 © 2018 – Confidential
Matrix Notation (Cont’d)
o We wish to find the vector of least squares estimators, that minimizes
Simplifying to:
Hence:
15 © 2018 – Confidential
Matrix Notation (Cont’d)
o It is easy to see that the matrix form of the normal equations is identical to the
scalar form
o In scalar notation
16 © 2018 – Confidential
Estimating σ2
o The difference between the actual observation yi and the corresponding fitted
value ŷi is the residual
17 © 2018 – Confidential
Estimating σ2 (cont’d)
o The last equation is called the error or the residual sum of squares. It can be
shown that
o An unbiased estimator of σ2
18 © 2018 – Confidential
Agenda
1 Introduction
3 Parameters Estimation
4 Example
19 © 2018 – Confidential
Example
o Sixteen observations on the viscosity of a polymer (y) and two process variables—
reaction temperature (x1) and catalyst feed rate (x2)—are shown. We will fit a
multiple linear regression model:
20 © 2018 – Confidential
Example (Cont’d)
o The X’X matrix
21 © 2018 – Confidential
Example (Cont’d)
o Hence the least squares fit, with the regression coefficients reported to two
decimal places, is
22 © 2018 – Confidential
Example (Cont’d)
23 © 2018 – Confidential
Example (Cont’d)
24 © 2018 – Confidential
Example (Cont’d)
25 © 2018 – Confidential
Example (Cont’d)
Residual plots
26 © 2018 – Confidential
Example (Cont’d)
Residual plots
27 © 2018 – Confidential
Agenda
1 Introduction
3 Parameters Estimation
4 Examples
28 © 2018 – Confidential
Regression Analysis for a 23 Factorial Design
o A chemical engineer is investigating the yield of a process. Three process variables
are of interest: temperature, pressure, and catalyst concentration. Each variable
can be run at a low and a high level, and the engineer decides to run a 2 3 design
with four center points. The design and the resulting yields are shown below,
where we have shown both the natural levels of the design factor and the +1, -1
coded variable notation normally employed in 2k factorial designs to represent the
factor levels.
o Suppose that the engineer decides to fit a main effects only model, say
29 © 2018 – Confidential
Regression Analysis for a 23 Factorial Design (Cont’d)
o The 23 is an orthogonal design, and even with the added center runs it is still
orthogonal. Therefore
o Because the design is orthogonal, the X’X matrix is diagonal, the required inverse is
also diagonal, and the vector of least squares estimates of the regression
coefficients is
30 © 2018 – Confidential
Relationship to Effect Estimates
o As we have made use of on many occasions, the regression coefficients are closely
connected to the effect estimates that would be obtained from the usual analysis
of a 23 design. For example, the effect of temperature is
o That is, the regression coefficient is exactly one-half the usual effect estimate. This
will always be true for a 2k design. This example demonstrates that the effect
estimates from a 2k design are least squares estimates.
31 © 2018 – Confidential
Relationship to Effect Estimates (Cont’d)
32 © 2018 – Confidential
Also Good For…
o A 23 factorial design with a missing observation
33 © 2018 – Confidential
Agenda
1 Introduction
3 Parameters Estimation
4 Examples
34 © 2018 – Confidential
Other Topics
o Hypothesis testing in multiple regression
o Confidence intervals
35 © 2018 – Confidential
Residuals Analysis
36 © 2018 – Confidential
Agenda
1 Introduction
2 ϵ Assumptions
37 © 2018 – Confidential
Introduction
o As emphasized in designed experiments, model adequacy checking is an important part of
the data analysis procedure.
o This is equally important in building regression models, the residual plots used with designed
experiments should always be examined for a regression model.
o In addition to the residual plots, other model diagnostics are frequently useful in regression.
This is out of the scope of this presentation
o Scaled residuals
o Prediction Error Sum of Squares (PRESS)
38 © 2018 – Confidential
Residuals Definition
Residual = Observed value - Predicted value
e = Y - Y’
o Properties
o ē=0
39 © 2018 – Confidential
Agenda
1 Introduction
2 ϵ Assumptions
40 © 2018 – Confidential
The ϵ term is assumed to be
o Normally distributed
o Independent
If these are true, then the observed residuals should behave in a similar fashion
ei = yi - ŷi
41 © 2018 – Confidential
Agenda
1 Introduction
2 ϵ Assumptions
42 © 2018 – Confidential
Residual Plots
o A residual plot is a scatter plot
Fitted Values
43 © 2018 – Confidential
Residual Plots
o A residual plot is a scatter plot (Controlled design experiment)
Fitted Values
44 © 2018 – Confidential
Residual Plots
o Variance increases with X. Hence the assumptions of the model are violated
Fitted Values
45 © 2018 – Confidential
Residual Plots
o Assumptions of the model are violated
Fitted Values
46 © 2018 – Confidential
Residual Plots
o Residuals are affected by time which is not accommodated by the model. Hence model violates
assumptions.
Fitted Values
47 © 2018 – Confidential
Residual Plots
o Other types of residual plots. Only a does not violate the assumptions
48 © 2018 – Confidential