relationship between two or more variables. For example, an analyst may want to know if

there is a relationship between road accidents and the age of the driver. Regression

analysis forms an important part of the statistical analysis of the data obtained from

designed experiments and is discussed briefly in this chapter. Every experiment analyzed in

DOE++ includes regression results for each of the responses. These results, along with the

results from the analysis of variance (explained in the One Factor Designs and General Full

Factorial Designs chapters), provide information that is useful to identify significant factors

in an experiment and explore the nature of the relationship between these factors and the

response. Regression analysis forms the basis for all DOE++ calculations related to the

sum of squares used in the analysis of variance. The reason for this is explained

inAppendix B. Additionally, DOE++ also includes a regression tool to see if two or more

variables are related, and to explore the nature of the relationship between them.

This chapter discusses simple linear regression analysis while a subsequent

chapterfocuses on multiple linear regression analysis.

A linear regression model attempts to explain the relationship between two or more

variables using a straight line. Consider the data obtained from a chemical process where

the yield of the process is thought to be related to the reaction temperature (see the table

below).

And a scatter plot can be obtained as shown in the following figure. In the scatter plot

yield,

is plotted for different temperature values,

.

It is clear that no line can be found to pass through all points of the plot. Thus no functional

relation exists between the two variables

and . However, the scatter plot does give an

indication that a straight line may exist such that all the points on the plot are scattered

randomly around this line. A statistical relation is said to exist in this case. The statistical

relation between

and

may be expressed as follows:

The above equation is the linear regression model that can be used to explain the

relation between

and

that is seen on the scatter plot above. In this model, the

mean value of

(abbreviated as

(which are observed as yield from the chemical process

from time to time and are random in nature) are assumed to be the sum of the

mean value,

because there is just one independent variable, , in the model. In

regression models, the independent variables are also referred to as

regressors or predictor variables. The dependent variable,

, is also

referred to as the response. The slope,

, of the

line

are called regression coefficients. The slope,

,

can be interpreted as the change in the mean value of

for a unit change

in .

The random error term,

the mean value,

value of

is also

dependent variable

of

. Since

. Therefore, at any given value of , say

follows a normal distribution with a mean

at any given

, the

