You are on page 1of 5

Regression analysis is a statistical technique that attempts to explore and model the

relationship between two or more variables. For example, an analyst may want to know if
there is a relationship between road accidents and the age of the driver. Regression
analysis forms an important part of the statistical analysis of the data obtained from
designed experiments and is discussed briefly in this chapter. Every experiment analyzed in
DOE++ includes regression results for each of the responses. These results, along with the
results from the analysis of variance (explained in the One Factor Designs and General Full
Factorial Designs chapters), provide information that is useful to identify significant factors
in an experiment and explore the nature of the relationship between these factors and the
response. Regression analysis forms the basis for all DOE++ calculations related to the
sum of squares used in the analysis of variance. The reason for this is explained
inAppendix B. Additionally, DOE++ also includes a regression tool to see if two or more
variables are related, and to explore the nature of the relationship between them.
This chapter discusses simple linear regression analysis while a subsequent
chapterfocuses on multiple linear regression analysis.

Simple Linear Regression Analysis


A linear regression model attempts to explain the relationship between two or more
variables using a straight line. Consider the data obtained from a chemical process where
the yield of the process is thought to be related to the reaction temperature (see the table
below).

Yield data observations of a chemical process at different values of reaction temperature.

This data can be entered in DOE++ as shown in the following figure:

Data entry in DOE++ for the observations.

And a scatter plot can be obtained as shown in the following figure. In the scatter plot
yield,
is plotted for different temperature values,
.

Scatter plot for the data.

It is clear that no line can be found to pass through all points of the plot. Thus no functional
relation exists between the two variables
and . However, the scatter plot does give an
indication that a straight line may exist such that all the points on the plot are scattered
randomly around this line. A statistical relation is said to exist in this case. The statistical
relation between
and
may be expressed as follows:

The above equation is the linear regression model that can be used to explain the
relation between
and
that is seen on the scatter plot above. In this model, the
mean value of

(abbreviated as

) is assumed to follow the linear relation:

The actual values of


(which are observed as yield from the chemical process
from time to time and are random in nature) are assumed to be the sum of the
mean value,

, and a random error term,

The regression model here is called a simple linear regression model


because there is just one independent variable, , in the model. In
regression models, the independent variables are also referred to as
regressors or predictor variables. The dependent variable,
, is also
referred to as the response. The slope,

, and the intercept,

, of the

line
are called regression coefficients. The slope,
,
can be interpreted as the change in the mean value of
for a unit change
in .
The random error term,

, is assumed to follow the normal distribution with a

mean of 0 and variance of


the mean value,
value of
is also
dependent variable
of

. Since

is the sum of this random term and

, which is a constant, the variance of


. Therefore, at any given value of , say
follows a normal distribution with a mean

and a standard deviation of

at any given
, the