You are on page 1of 48

Regression & Residual Analysis

Design of Experiments

1 © 2018 – Confidential
Regression Models

2 © 2018 – Confidential
Agenda

1 Introduction

2 Linear Regression Models

3 Parameters Estimation

4 Examples

5 Fitting Regression Models in Designed Experiments

6 Other Topics of Interest

3 © 2018 – Confidential
Introduction
o In many problems two or more variables are related, and it is of interest to model and
explore this relationship

o Example: in a chemical process the yield of product is related to the operating temperature

o The chemical engineer may want to build a model relating yield to temperature and then use
the model for:
o prediction
o process optimization
o process control

o In general, suppose that there is a single dependent variable or response y that depends on k
independent or regressor variables, for example, x1, x2, . . . , xk. The relationship between
these variables is characterized by a mathematical model called a regression model

o Regression methods are frequently used to analyze data from unplanned experiments, such
as might arise from observation of uncontrolled phenomena or historical records

o Regression methods are also very useful in designed experiments where something has “gone
wrong”

4 © 2018 – Confidential
Agenda

1 Introduction

2 Linear Regression Models

3 Parameters Estimation

4 Examples

5 Fitting Regression Models in Designed Experiments

6 Other Topics of Interest

5 © 2018 – Confidential
Linear Regression Models
o suppose that we wish to develop an empirical model relating the viscosity of a polymer to the
temperature and the catalyst feed rate. A model that might describe this relationship is:

Y: Viscosity
X1: Temperature
X2: Catalyst feed rate

o This is a multiple linear regression model with two independent variables


o We often call the independent variables predictor variables or regressors
o The term linear is used because the equation is a linear function of the unknown parameters
β0, β1, and β2

6 © 2018 – Confidential
Multiple Linear Regression
o In general, the response variable y may be related to k regressor variables.

o The model is called a multiple linear regression model with k regressor variables.

o The parameters βj, j = 0, 1, . . . , k, are called the regression coefficients.

o The parameter βj represents the expected change in response y per unit change in x j when all
the remaining independent variables x i (i ≠ j) are held constant.

7 © 2018 – Confidential
Approximation to Linear Regression
o Models that are more complex in appearance may often still be analyzed by
multiple linear regression techniques

o For example, consider adding an interaction term to the first-order model in two
variables, say:

o If we let x3 = x1x2 and β3 = β12, then above equation can be written as:

8 © 2018 – Confidential
Another Example
o Consider the second-order response surface model in two variables:

o If we let x3 = x12, x4 = x22, x5 = x1x2, β3 = β11, β4 = β22, and β5 = β12, then this becomes:

9 © 2018 – Confidential
Agenda

1 Introduction

2 Linear Regression Models

3 Parameters Estimation

4 Examples

5 Fitting Regression Models in Designed Experiments

6 Other Topics of Interest

10 © 2018 – Confidential
Least Squares
o Least Squares is typically used to estimate the regression coefficients in a multiple
linear regression model.

o The method of least squares chooses the β’s in the equation above so that the sum
of the squares of the errors, ϵi, is minimized

11 © 2018 – Confidential
Least Squares (Cont’d)
o The method of least squares chooses the β’s in the equation so that the sum of the
squares of the errors, ϵi, is minimized

o The function L is to be minimized with respect to β0, β1, . . . , βk. The least squares
estimators, must satisfy:

12 © 2018 – Confidential
Least Squares (Cont’d)
o Simplifying, we get the least squares normal equations

o There are p = k+1 normal equations.


o The solution to the normal equations will be the least squares estimators of the
regression coefficients β̂0, β̂1,..., β̂k.

13 © 2018 – Confidential
Matrix Notation
o In matrix notation, the model maybe written as:

Where:

o y is (n x 1) vector
o X is (n x p) matrix of levels of independent variables
o β is (p x 1) vector
o ϵ is (n x 1) vector

14 © 2018 – Confidential
Matrix Notation (Cont’d)
o We wish to find the vector of least squares estimators, that minimizes

o L may be expressed as:

o The least squares estimators must satisfy:

Simplifying to:

Hence:

15 © 2018 – Confidential
Matrix Notation (Cont’d)
o It is easy to see that the matrix form of the normal equations is identical to the
scalar form

o The fitted regression model is

o In scalar notation

16 © 2018 – Confidential
Estimating σ2
o The difference between the actual observation yi and the corresponding fitted
value ŷi is the residual

o It is also usually necessary to estimate σ2. To develop an estimator of this


parameter, consider the sum of squares of the residuals

17 © 2018 – Confidential
Estimating σ2 (cont’d)
o The last equation is called the error or the residual sum of squares. It can be
shown that

o An unbiased estimator of σ2

18 © 2018 – Confidential
Agenda

1 Introduction

2 Linear Regression Models

3 Parameters Estimation

4 Example

5 Fitting Regression Models in Designed Experiments

6 Other Topics of Interest

19 © 2018 – Confidential
Example
o Sixteen observations on the viscosity of a polymer (y) and two process variables—
reaction temperature (x1) and catalyst feed rate (x2)—are shown. We will fit a
multiple linear regression model:

20 © 2018 – Confidential
Example (Cont’d)
o The X’X matrix

21 © 2018 – Confidential
Example (Cont’d)

o Hence the least squares fit, with the regression coefficients reported to two
decimal places, is

o ŷ = 1566.08 + 7.62x1 + 8.58x2

22 © 2018 – Confidential
Example (Cont’d)

23 © 2018 – Confidential
Example (Cont’d)

Other residual measures


Out of scope ?

24 © 2018 – Confidential
Example (Cont’d)

Normal probability plot of residuals

25 © 2018 – Confidential
Example (Cont’d)

Residual plots

26 © 2018 – Confidential
Example (Cont’d)

Residual plots

27 © 2018 – Confidential
Agenda

1 Introduction

2 Linear Regression Models

3 Parameters Estimation

4 Examples

5 Fitting Regression Models in Designed Experiments

6 Other Topics of Interest

28 © 2018 – Confidential
Regression Analysis for a 23 Factorial Design
o A chemical engineer is investigating the yield of a process. Three process variables
are of interest: temperature, pressure, and catalyst concentration. Each variable
can be run at a low and a high level, and the engineer decides to run a 2 3 design
with four center points. The design and the resulting yields are shown below,
where we have shown both the natural levels of the design factor and the +1, -1
coded variable notation normally employed in 2k factorial designs to represent the
factor levels.

o Suppose that the engineer decides to fit a main effects only model, say

29 © 2018 – Confidential
Regression Analysis for a 23 Factorial Design (Cont’d)
o The 23 is an orthogonal design, and even with the added center runs it is still
orthogonal. Therefore

o Because the design is orthogonal, the X’X matrix is diagonal, the required inverse is
also diagonal, and the vector of least squares estimates of the regression
coefficients is

o The fitted regression model is

30 © 2018 – Confidential
Relationship to Effect Estimates
o As we have made use of on many occasions, the regression coefficients are closely
connected to the effect estimates that would be obtained from the usual analysis
of a 23 design. For example, the effect of temperature is

o Notice the regression coefficient for x1 is

o That is, the regression coefficient is exactly one-half the usual effect estimate. This
will always be true for a 2k design. This example demonstrates that the effect
estimates from a 2k design are least squares estimates.

31 © 2018 – Confidential
Relationship to Effect Estimates (Cont’d)

32 © 2018 – Confidential
Also Good For…
o A 23 factorial design with a missing observation

o Inaccurate levels in design factors

33 © 2018 – Confidential
Agenda

1 Introduction

2 Linear Regression Models

3 Parameters Estimation

4 Examples

5 Fitting Regression Models in Designed Experiments

6 Other Topics of Interest

34 © 2018 – Confidential
Other Topics
o Hypothesis testing in multiple regression

o Confidence intervals

o Prediction of new response observations

o Regression model diagnostics (we will partially cover residuals analysis)

35 © 2018 – Confidential
Residuals Analysis

36 © 2018 – Confidential
Agenda

1 Introduction

2 ϵ Assumptions

3 Examples of Residual Plots

37 © 2018 – Confidential
Introduction
o As emphasized in designed experiments, model adequacy checking is an important part of
the data analysis procedure.

o This is equally important in building regression models, the residual plots used with designed
experiments should always be examined for a regression model.

o In general, it is always necessary to:


o Examine the fitted model to ensure that it provides an adequate approximation to the
true system
o Verify that none of the least squares regression assumptions are violated.

o In addition to the residual plots, other model diagnostics are frequently useful in regression.
This is out of the scope of this presentation
o Scaled residuals
o Prediction Error Sum of Squares (PRESS)

38 © 2018 – Confidential
Residuals Definition
Residual = Observed value - Predicted value
e = Y - Y’

o Properties

o ē=0

39 © 2018 – Confidential
Agenda

1 Introduction

2 ϵ Assumptions

3 Examples of Residual Plots

40 © 2018 – Confidential
The ϵ term is assumed to be
o Normally distributed

o Independent

o Homoscedastic (the same variance at every X)

If these are true, then the observed residuals should behave in a similar fashion

ei = yi - ŷi

41 © 2018 – Confidential
Agenda

1 Introduction

2 ϵ Assumptions

3 Examples of Residual Plots

42 © 2018 – Confidential
Residual Plots
o A residual plot is a scatter plot

Fitted Values

43 © 2018 – Confidential
Residual Plots
o A residual plot is a scatter plot (Controlled design experiment)

Fitted Values

44 © 2018 – Confidential
Residual Plots
o Variance increases with X. Hence the assumptions of the model are violated

o Perhaps, try weighted regression

Fitted Values

45 © 2018 – Confidential
Residual Plots
o Assumptions of the model are violated

Fitted Values

46 © 2018 – Confidential
Residual Plots
o Residuals are affected by time which is not accommodated by the model. Hence model violates
assumptions.

Fitted Values

47 © 2018 – Confidential
Residual Plots
o Other types of residual plots. Only a does not violate the assumptions

48 © 2018 – Confidential

You might also like