Regression 2006-03-01

QMIN Regression 2006-03-01: 1-1
1 GLM: Regression
1.1 Simple Regression
1.1.1 Background
Simple regression involves predicting one quantitative variable (called a dependent
variable) from another quantitative variable (called the independent or predictor variable). The
terms dependent and independent imply predictability but do not necessarily imply causality.
The most common notation in regression is to let Y denote the dependent variable and X, the
independent variable. The phrase “regress (name of dependent variable) on (name of
independent variable)” is often used. For example, “regress receptor levels on age” denotes that
receptor level is the dependent variable and age is the independent variable.
In simple regression, one fits a straight line through the data points and then uses that line
to predict values of the dependent variable from values of the independent variable. The
fundamental equation for the predicted value is
Yˆ = α + βX
where Yˆ is the predicted value of the dependent variable, α is the intercept of the line (i.e., the
place where it crosses the vertical axis which is the same as the predicted value of Y when X = 0),
and β is the slope of the line.
We can write a similar equation for an observed value of Y. Because one will never be
able to predict all of the observed Ys with perfect accuracy, there will be prediction errors. A
prediction error is the difference between an observed value for the dependent variable and its
predicted value, i.e., Y − Yˆ . Letting E denote a prediction error, then the equation for an
observed Y is
Y = Yˆ + E = α + β X + E .
In simple regression, the estimates of parameters α and β are those that minimize the sum
of squared prediction errors. That is, calculate the prediction error for the first observation and
square it, then do the same for the all other observations in the sample. Finally, add together all
the squared prediction errors. The result is termed the error sum of squares. Parameters that
minimize the error sum of squares are called least squares estimates.
We illustrate simple regression by considering the problem of change in the number of a
particular receptor in human cortex with age. To investigate this, a researcher obtains cortex
from a series of post mortems, extracts protein, and then performs a binding assay for the
receptor.
1.1.2 How to Do It.
1.1.2.1 Step 1: Check the Data

The very first step in a simple regression should be to examine the data with an eye
towards a possible nonlinear relationship and outliers. We discuss nonlinearity later, so here we
concentrate on outlier detection. This is an essential step because even a single outlier can give
very misleading results, especially with the moderate sample sizes used in experimental
neuroscience. The best method to assess outliers for this simple case is to follow the procedures
outlined in Section X.X and construct a scatter plot. Figure 1.1 illustrates the plot, along with the
regression line. (We discuss this line later).
In the present example, there do not appear to be any disconnected data points. (Later we
shall demonstrate the effects of outliers).
Figure 1.1 Example of a scatter plot and the regression line (line of best bit).
1.1.2.2 Step 2: Compute the Regression

The overall orientation of the data points in Figure 1.1 along with the slope of the
regression line suggest that the density of receptors decreases with age. There are, however, only
27 observations in this data set. Could such a pattern result simply from chance? Only a
rigorous statistical test can answer this question.
All general statistical packages contain at least one routine for computing regressions. In
the present case, we used the PROC REG routine in SAS. The dependent variable is called
Receptor, which is the concentration of bound receptor in a binding assay per unit of protein; we
refer to variable Receptor as reflecting the number of receptors. The independent variable is
simply called Age.
The mathematical model behind simple regression fits a straight line through the data
points. For the present case, the population equation behind the regression is
Receˆ ptor = α + β ⋅ Age . (X.1)
Here, the hat (^) over Receˆ ptor denotes that this is the predicted value of Receptor.
Observed values of Receptor Number in the cortex, however, will not always be equal to
their predicted values. Hence, simple regression adds an error term when it writes the equation
for observed values of the dependent variable:
Receptor = Receˆ ptor + Error = α + β ⋅ Age + E .
Note that we abbreviate error as E. Regression procedures obtain estimates of the population
parameters α and β by minimizing the sum of squared error, the summation being taken over all
observations in the data set.
1.1.2.3 Step 3: Interpret the Results

The output from the regression procedure is given in Figure 1.2. In general, regression
procedure will output two different tables. The first is called an analysis of variance table and its
purpose is to assess the overall fit of the GLM. The second is a table of parameter estimates that
assess the contribution of each independent variable to prediction. Because we have only one
independent variable in the model, both tables will contain the same information, albeit
expressed in different forms. Other output depends on the computer package and on options
chosen by a user.
All regression software will calculate the squared multiple correlation or R2. R2 is the
square of the correlation between the predicted and the observed values of the dependent
variable. Hence, it is a measure of the proportion of variance in the dependent variable explained
by the predicted values (i.e., the model). With only a single independent variable, R2 equals the
square of the ordinary Pearson product-moment correlation between Receptor and Age. Its
value, rounded to .27, tells us that about 27% of the variance in receptor numbers is attributable
to age.
Regression can capitalize on chance peculiarities in the data, so R2 is, on average, an
upwardly biased estimate of the population parameter. Hence, an adjusted R2 is also printed.
The actual degree to which R2 is biased is generally unknown, so the adjustment is only
approximate. The difference between R2 and adjusted R2 is a function of sample size and the
number of independent variables in the model. Small samples and large numbers of independent
variables will give greater discrepancies between R2 and adjusted R2 than large sample with few
independent variables. R2 is reported more often than adjusted R2.
Figure 1.2 Output from a simple regression predicting the quantity of receptors in human
cortex as a function of age.
The REG Procedure
Model: MODEL2
Dependent Variable: receptor Receptor Binding fmol/mg
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 1 7.55991 7.55991 9.02 0.0060
Error 25 20.95872 0.83835
Corrected Total 26 28.51863
Root MSE 0.91561 R-Square 0.2651
Dependent Mean 4.90630 Adj R-Sq 0.2357
Coeff Var 18.66202
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
Intercept 1 9.43824 1.51942 6.21 <.0001
age 1 -0.06052 0.02015 -3.00 0.0060
With only a single independent variable, the analysis of variance table can be skipped
because it will lead to the same inference as the table of parameter estimates. The estimate of the
intercept (i.e., the estimate of α in Equation X.1) is 9.44. If the relationship between receptor
number and age were linear throughout the lifespan, then this is the predicted receptor
concentration at birth. (Although this is the correct mathematical interpretation of the intercept,
one should never extrapolate the regression line beyond the range of the data at hand. Hence, it
is best to regard the receptor concentration at birth as an unknown best examined by empirical
data.)
The slope of the regression line (i.e., the estimate of parameter β in Equation X.1) is -.06.
The minus sign implies a negative or inverse relationship—increasing age results in lower
receptor numbers. The value of the estimate implies that a one-year increase in age is associated
with a .06 reduction in receptor concentration (measured in this hypothetical example as fmoles
per mg of protein). Taking the estimates of α and β and placing them into Equation X.1 gives
Receˆ ptor = 9.44 - .06 ⋅ Age .
These numbers define the regression line in Figure 1.1. This line is sometimes referred to as the
line of best fit because the estimates are based on minimizing the sum of squared error.
The standard error for this parameter estimate is an estimated standard error, so the
appropriate test statistic for the hypothesis that β = 0 is the t statistic. The value here (-3.00) is
large and its associated p value (.006) is less than .05. Hence, we would conclude that there is
evidence for a change in receptor number with age. In write-ups of a regression result, be careful
of the degrees of freedom, the column labeled DF in Figure 1.2 because there is only one
parameter being tested. The actual degrees of freedom for the t test equal the defrees of freedom
for error in the model, in this case, 26.
Note that there is also a test that the intercept is 0. Usually—but not always—this test is
unimportant. You may also have noticed that the p value of the regression coefficient is identical
to the fourth decimal place to that of the F test in the ANOVA table. This is not coincidence. It
is due to the fact that there is only a single independent variable in the model.
1.1.2.4 Step 4. Communicating the Results

Depending on the importance of the analysis, a graph similar to the one in Figure 1.1 may
be the best way to communicate the results. A simple regression has two measures of effect
size—the raw regression coefficient and R2. We recommend that both be included in the write
up. Naturally, the p value or some statement about statistical significance is also required. A
very convenient way of presenting the statistical information is to include the regression equation
and other statistics in the graph. In the present case, one could add the following line to Figure
1.1:
Yˆ = 9.44 − .06 ⋅ Age, R 2 = .27, p = .006 .
As always, we recommend publishing figures only for statistically significant results or for
theoretically or empirically meaningful results.
1.1.3 Assumptions
At this point it is useful to examine the assumptions underlying regression analysis
because they will apply to the other sections in this chapter and beyond.
1.1.3.1 Linearity
Simple regression assumes that the relationship between the IV and the DV is linear.
Figure 1.1 illustrates two different forms of a nonlinear relationship.
Figure 1.3 Examples of nonlinear relationships.
Usually, the effect of fitting a straight line to data with a strong nonlinear relationship is
to reduce power. This is especially true when the relationship is U-shaped or inverted U-shaped
because the slope of the linear term will be close to 0. Hence, the danger is that it might fool the
analyst into concluding that there was no relationship.
The best way to diagnose linearity is to construct a scatterplot along the lines of Figure
1.1or Figure 1.3 and visually inspect it. If you suspect the relationship may be nonlinear, then
you can test that using polynomial regression (see Section 1.3.3).
1.1.3.2 Normality of Residuals

The assumption that the residuals are normally distributed is necessary for the validity of
the F statistic. Small departures from this assumption are of little concern. Large departures,
however, can create problems. Usually, careful screening of the data prior to analysis can avoid
these problems.
There are several options for assessing this assumption. In the regression, one could
calculate the residuals and then plot a histogram or perform a statistical test (see Section X.X) for
normality. Most regression packages have the option of constructing a cumulative distribution
plot and/or a quantile-quantile (or QQ) plot of the residuals to assess normality (see Section X.X
for a definition of these plots). Figure 1.4 illustrates both of these residual plots from a
regression of dependent variable Y in independent variable X when X has a lognormal
distribution and the true linear relationship is between Y and the log of X.
Figure 1.4 Example of residuals that are not normally distributed.
If the residuals in this case were normally distributed, then they would tend to form a
straight line. It is quite clear that they do not, so one should question this regression.
Often, the solution to non-normal residuals can be found in a transformation of the
independent and/or the dependent variable. For the present example, the solution is clear—
regress Y on the log of X and not on X itself. Figure 1.5 gives the residual plots for this
regression.
Figure 1.5 Example of normally distributed residuals.
1.1.3.3 Equality of Residual Variances

A final assumption of the regression model is a condition called homoscedasticity, which,
quite frankly sounds more like a medical condition than a statistical one. Homoscedasticity is
defined as the equality of variance around the regression line. It implies that the variance of the
residuals is the same for each and every value of Yˆ . The most common way to assess for
homoscedasticity is to plot the residuals from the regression as a function of the predicted values.
Figure 1.6 illustrates homoscedasticity (Panel A) and the lack of it (Panel B), a condition called
heteroscedasticity).
Figure 1.6 Examples of equal variance of residuals (homoscedasticity) and unequal
variance of residuals (heteroscedasticity).
Heteroscedasticity of the form in Panel B of Figure 1.6 usually results from a scaling
problem. Often taking a square root or log transform of Y will remove the heteroscedasticity.
1.1.4 Problems and Diagnostics
1.1.4.1 Outliers and Influential Data Points

To illustrate the effect of an outlier, consider a study in cognitive neuroscience aimed at
exploring retrieval from working memory. To recruit subjects, research techs post fliers
throughout the University offering a small honorarium for participation in the study. Naturally,
the majority of respondents will be students. On a whim, Ralph, a 75 year-old emeritus
professor signs up because he knows little about the field and would like to experience first-hand
the techniques in measuring cognition.
Figure 1.7 presents a scatter plot of age and a score of retrieval from working memory
and Figure 1.8 gives the results from regressing the working memory score on age. It is clear
from visual inspection that Ralph is a disconnected data point. Not only is he discrepant in age,
but it is apparent that his working memory has not retired even though Ralph has.
Figure 1.7 Example of a scatter plot containing an outlier.
Figure 1.8 Results from a regression with an outlier.
Parameter Standard
Intercept 1 26.31758 5.26743 5.00 <.0001
age 1 0.42592 0.20180 2.11 0.0450
The results of the regression analysis show a significant effect of age, and both the
regression coefficient and the regression line suggest that the relationship is positive—like a
good wine, retrieval from working memory improves with age. But these results depend entirely
on Ralph.
Figure 1.9 and Figure 1.10 repeat the scatter plot and the regression but this time after
removing Ralph from the data set. Notice how the scatter plot gives a very different impression
of the relationship between age and working memory. Even if one were to erase Ralph and the
regression line from the previous scatter plot, the remaining data points give little hint of the
negative relationship clearly apparent in Figure 1.9. Furthermore, this negative relationship is
significant.
Figure 1.9 The same scatterplot after removing the outlier.
Figure 1.10 Results of the regression with the outlier removed.
Parameter Standard
Intercept 1 82.71226 21.35160 3.87 0.0007
age 1 -2.13679 0.96343 -2.22 0.0363
This example, albeit extreme, should impress on the reader the importance of screening
data in preparation for data analysis. By happenstance, the sample contained an elderly
gentleman with an excellent memory, and including him in the analysis gives very misleading
information about the relationship between age and memory. In general, the effect of an outlier
on tests of statistical significance is unpredictable. In can—as it did in this example—retain
statistical significance but switch the direction of the effect. In other cases, outliers can give
statistical significance when there is none, or the outlier can result in failure to detect significant
findings when in fact there is significance. In almost all cases, however, inclusion of an outlier
can serious bias parameter estimates.
1.2 Multiple Regression

1.2.1 Background
The term multiple regression refers to the case in which one quantitative dependent
variable is predicted by more than one quantitative independent variable. For two independent
variables (which we denote as X1 and X2), the equation is
Yˆ = α + β1 X1 + β 2 X 2 ,
and in the general case with k independent variables, the equation is
Yˆ = α + β1 X1 + β 2 X 2 + Kβ k X k .
As in simple regression, the estimates of the parameters are those that minimize the sum of
squared prediction errors. The equation defined by the set of independent variables selected for
the analysis is called the model.
There are three reasons for adding independent variables in a multiple regression. The
first of these is for experimental designs that manipulate more than one factor. For example,
suppose a study examined the effect of a drug on plasma cortisol levels in rats who were either
stressed or not stressed. There are two factors in this design—drug and stress. Hence, X1 could
be coded as 0 = Control and 1 = Drug. Similarly, X2 could be coded as 0 = not stressed and 1 =
stressed.
A second reason for adding variables is scientific hypothesis testing—e.g., does variable
X2 add predictability above and beyond that of X1? In this situation, the independent variables
are often correlated and the major research question is “does X1 directly predict Y or does X1
predict Y because it is correlated with X2 and X2 predicts Y?” We will see an example of this use
of multiple regression below.
The third reason for adding independent variables is for statistical control. Here, X2 is
known to predict Y (or has a very strong likelihood of predicting Y) and by adding it to the
equation, one achieves a more powerful statistical test for X1. The best variables for statistical
control will be correlated with the dependent variable but not correlated with the other
independent variables in the model. A classic example of statistical control is the clinical trial
where baseline variables are entered into the regression model. If participants are truly
randomized to control and experimental conditions, then a baseline variable will not correlate
with the treatment variable but will usually correlate with the outcome measure.
A simple regression fit a one-dimensional model (a straight line) to data points
distributed in two-dimensional space. Similarly, a multiple regression with two independent
variables fits a two-dimensional model (a plane) to data points distributed in three-dimensional
space. A multiple regression with three independent variables fits a three-dimensional model to
data points located in four-dimensional space—a task that cannot be easily visualized but can be
dealt with in the world of mathematics.
The interpretation of the parameters is easiest to learn by considering two independent
variables. Parameter β1 gives the predicted change in the dependent variable per unit change in
X1 holding variable X2 constant. Expressed in different terms, if one fixed X2 at any value, then a
one-unit change in X1 predicts a change of β1 units in Y. Similarly, if one fixed X1 at any value,
then a one-unit change in X2 predicts a change of β2 units in Y. In general, βi gives the predicted
change in Y for a one-unit change in Xi holding all other independent variables in the model
constant. The term controlling for is usually used to refer to the phrase “holding all other
variables constant.” For instance, “βi measures the effect (predictive effect, not necessarily
causal effect) of variable Xi controlling for X1, X 2 ,K.”
We illustrate multiple regression by elaborating on the data set used in simple regression.
Suppose that the receptor was a nicotinic receptor. Use of nicotine could upregulate the number
of receptors and we all know that smokers die young. Could the relationship between receptor
number and age be due to the effects of smoking? To check for this, we will include among the
assays one for cotinine, a metabolite of nicotine. We can now use cotinine levels as a control
variable in a multiple regression that predicts the levels of nicotinic receptors in cortex from age
and cotinine levels.
1.2.2 How to Do It
1.2.2.1 Step 1: Check the Data

The purpose of the data check in multiple regression is the same as it is in simple
regression—examining the data for nonlinearity and outliers. Outlier detection by visual
inspection can be difficult in multiple regression because one cannot deal with more than three
variables at a time. Most statisticians recommend constructing a series of scatter plots for all
pairs of variables. Usually, a rouge or blunder will appear in one or more of these graphs.
After the regression model is fitted to the data, one can use additional procedures to
check for outliers and influential data points. These procedures are detailed below in Section
1.2.4.1.
1.2.2.2 Step 2: Compute the Regression

Fitting the model is the preferred phrase in multiple regression. The major decision on
fitting a multiple regression model is the method. There two major types of methods for fitting
models: (1) complete estimation, and (2) variable-selection methods. Complete estimation is
almost always the default method for statistical computer programs. Here, the model that you
specify—and only that model—gets fitted to the data. If the model has four independent
variables, then four regression coefficients (plus an intercept) will be estimated and tested. In
variable-selection methods (sometimes called stepwise regression), a series of models are fitted
to the data and independent variables are added, kept, or deleted based on some statistical
criteria. Complete estimation should be the choice for the vast majority of problems in
neuroscience and must always be used for planned experiments. 1
1.2.2.3 Step 3: Interpret the Results

Figure 1.11 gives the results from PROC REG in SAS of fitting the model to the data.
Here, the first item to inspect is the Analysis of Variance (ANOVA) table. This table assess
whether the model as a whole predicts better than chance. The crucial test statistic is the F
value. Because this F refers to the model as a whole, it is sometimes called the omnibus F. The
numerator degrees of freedom for this F equal the df for the model (2 in this case) and the
denominator df equal the df for error (24). Here the test statistic is F(2, 24) = 13.35 and its
associated p value is less than .0001. This suggests that the whole model does indeed predict
nicotinic receptor levels better than chance.
Figure 1.11 Multiple regression of receptor number on age and cotinine.
Sum of Mean
Model 2 15.02076 7.51038 13.35 0.0001
Error 24 13.49787 0.56241

Coeff Var 15.28527
Parameter Estimates
Parameter Standard
Intercept 1 7.33197 1.37229 5.34 <.0001
age 1 -0.05382 0.01661 -3.24 0.0035
cotinine 1 0.03823 0.01050 3.64 0.0013
The next item for inspection is R2, called the squared multiple correlation, the quantity R
being the multiple correlation. As in simple regression R is the correlation between the predicted
values and the observed values of the dependent variable. Squaring R gives the proportion of
1
Variable-selection techniques and stepwise regression were useful in the past when it took
considerable time to compute regressions by hand. In the modern era, it is very easy to compute
regressions for all possible subsets of IVs. One can then accept the model that best fits
predetermined criteria. For this reason, we will not discuss variable-selection methods in this
text.
variance in the dependent variable explained by the model (i.e., by all of the independent
variables). R2 is a measure of effect size (see Section X.X) of the whole model.
Here, the value of R2 is .53, so 53% of the variance in receptor levels if predictable from
the model, both age and cotinine. R2 values from nested models can be added or subtracted.
Two models are nested when all the independent variables in the smaller model are contained in
the larger model. R2 values for non-nested models should not be compared to each other (see
Section 1.3.1). The R2 from the simple regression of receptor on age was .27 (see Figure 1.2).
Because the simple regression is nested within the current model, their R2s can be compared.
Hence, we can say that adding cotinine to the prediction equation explains an addition (.53 - .27)
= .26 or 26% of the variance in nicotinic receptors. Model comparisons are such an important
part of regression that we devote a whole section to it (Section 1.3.1) and we explain the testing
of interactions and polynomial regression in multiple regression along these lines (Sections
1.3.2.3 and 1.3.3.1).
The R2 and its significance inform us that we have overall predictability much better than
chance would allow. The table of parameter estimates and their significance can help us decide
which independent variables contribute to that significant predictability. In Figure 1.11 both age
and cotinine are statistically significant. Hence, there are contributions from both age and
cotinine to density of nicotinic receptors. The decline in nicotinic receptors with age was not due
to the fact that smokers elevate their receptors levels and also die at younger ages.
1.2.2.4 Step 4: Communicating the Results

Rather than follow a rigid formula, the hypotheses of interest should always determine
the write-up of the results from multiple regression models. For example, consider the case of a
clinical trial where the first IV is a dummy code for control versus active treatment and the
second IV is baseline symptoms. The purpose of the whole study is to test whether or not the
treatment was efficacious. Hence, one would report only two pieces of information: (1) the test
statistic and significance level for the treatment variable; and (2) the effect size of the treatment.
There is no need to report anything about the extent to which baseline symptoms predict follow-
up symptoms. Nor is there any need to provide the reader with extraneous (and distracting)
information about the overall R2 or the significance of the omnibus F.
Absent detailed hypotheses of interest, three pieces of information from the multiple
regression should be conveyed to the reader. The first piece is whether or not the overall model
predicts better than chance. This entails saying something about the omnibus F statistic and its
associated p level.
The second is the estimate of overall effect size. Here, the statistical index is the R2.
Usually, the first piece of information (is the prediction better than chance?) and the second (how
well does the model predict the dependent variable?) can be combined into a single sentence.
For the current example, one might write, “Together, age and cotinine significantly predicted
individual differences in the number of receptors (R2 = .53, p < .0001).”
The third class of information conveys the extent to which each IV contributes to the
prediction. Although we chunk this into a single “piece” of information here, the overall purpose
is to explain to the reader which IVs are significant and which are not significant. Phrasing of
this information should always be done in terms of the purpose of the hypotheses that motivated
the model for the multiple regression in the first place. For example, in the current study one
might blithely write, “Both age and cotinine significantly predicted receptor concentration.”
That phrasing is uninformative. Instead, write the results in terms that emphasize the fact the
receptor concentration decreases with age even when controlling for cotinine levels. A good
write-up might be: “Increased cotinine levels significantly predicted increased receptor numbers
(b = .038, t(24) = 3.64, p = .001). Even controlling for effect of cotinine, however, receptor
concentration still significantly declined with age (b= -.054, t(24) = -3.24, p = .004). Hence the
relationship between age and receptor number cannot be explained solely in terms of cotinine
concentrations in brain.”
Note how this latter write-up expresses the significance of the regression coefficients,
gives the direction of their effect (positive for cotinine, negative for age), and provides a clear
summary of the results in terms of the reason for performing the regression in the first place
(despite controlling for cotinine, age is still significant).
Instead of following a slavish and formulaic write up (that is usually quite uninteresting),
the analyst is urged to be creative in terms of explaining the results in terms of the major
hypotheses responsible for the analysis.
1.2.3 Assumptions
The assumptions of multiple regression are the same as those for simple linear regression:
linearity, normality of residuals, and equality of residual variances. Two of these—normality of
residuals and equality of residual variances—can be checked in the same way as these
assumptions are assessed in simple regression.
1.2.3.1 Linearity
If there are k independent variables, then the inclusion of the dependent variable gives a
problem in (k + 1) dimensional space. It is not possible to construct plots for visual inspection in
more than three-dimensional space. So how does one assess linearity? The answer is that there is
no exact mathematical way. Some statisticians recommend constructing scatter plots of all
variables in the analysis taken two at a time. Indeed, software packages such as the Interactive
Data Analysis feature of SAS, allow one to do this with a only few point-and-clicks. A second
method is to construct plots of the residuals as a function of each independent variable. A U- or
inverted U-shaped plot suggests nonlinearity. Finally, an analysis of the residuals for outliers
and/or influential data points may give individuals of nonlinearity.
1.2.3.2 Normality of Residuals

Both simple and multiple regression have one and only one dependent variable. Because
the residuals apply to the dependent variable, examining this assumption is the same in both
simple and multiple regression. Hence, see Section 1.1.3.2 above for the techniques used to do
this.
1.2.3.3 Equality of Residual Variances

The procedure for testing equality of the variances of the residuals in multiple regression
is identical to that in simple regression. Hence, consult Section 1.1.3.3.
1.2.4 Problems and Diagnostics
1.2.4.1 Outliers and Influential Data Points

The identification of outliers and/or influential data points is simple when there is only
one independent variable—simply plot the Y values by the X values and visually inspect the
graph. The situation becomes more complicated as the number of IVs increases. For example,
when there are three X variables, then the data occupy a four-dimensional space (the three Xs
plus the Y variable), something very difficult for us humans to conceptualize, let alone plot.
Most statisticians recommend two processes to deal with outliers and influential data
points. The first is the inspection of the residuals or errors. The second is the inspection of
statistics designed to identify a multivariate outlier and/or influential data points. We shall
speak of each in turn.
1.2.4.1.1 Inspection of Residuals

There are two major reasons for the inspection of residuals. The first reason is purely
statistical—isolate those data points that clearly are outside of the prediction sphere. Here, one
might look at data recording errors or processing errors as a cause. If the discrepancy is severe,
one might even delete that observation from the analysis.
The second reason is substantive—uncovering a plausible reason why those data points
are outside the regression sphere. For example, suppose a study of restrictive eating disorders
included four males along with 37 females. If inspection of the residuals revealed that all of the
three potential outliers were male, one might conclude that the regression model is probably
different for males and females.
The first step in inspecting residuals is to determine which type of residual to inspect.
The raw residual is simply the prediction error, i.e., the difference between the observed value of
Y and the predicted value of Y or Y − Yˆ . The second type is the standardized residual. This is
computed as the raw residual divided by the standard error of the residuals. In effect, they are Z
scores of the residuals and hence, have a clear meaning to statisticians. For example, a
standardized residual of –2.3 implies that the observation is 2.3 standard deviations below its
predicted value. Finally, a studentized residual converts the residual into a t score using the t
distribution and taking into account the influence of that observation on the regression model
(technically, the leverage which will be discussed below). If an observation does not influence
the regression very much, then the standardized residual will be very similar to the studentized
residual. When that observation has a large influence on the regression, then the two will differ. 2
The question at hand should dictate which of these three quantities should be used. If you
are very familiar with the metric behind the dependent variable, then raw residuals are fine. A
raw residual of 6.3 will be meaningful to you. If you are not familiar with the metric or if you
have any doubts about the metric, then a residual of 6.3 might be large or small. Here, either the
standardized or the studentized residual should be preferred.
The second step is the manner for inspecting the residuals. If the number of observations
is small, then one can visually inspect the numerical values and flag observations with deviant
residuals. When the number of observations is large, then construct a boxplot or histogram of
the residual and inspect that plot for outliers.
2
There are several variations on how these residuals may be calculated. The major variation is
whether the observation in question is included or excluded from the regression model when its
error (residual) is calculated. Consult the manual for your software to make certain that you
know what the residuals mean.
1.2.4.1.2 Multivariate Outliers and Influential Data Points

The two main statistics for examining influence and potential multivariate outliers are
leverage and Cook’s D. The formula for leverage (also called h, the hat statistic, or the hat
value) is complicated, but the numerical values should range between 0 (the observation has little
influence on the regression and hence, is not a potential multivariate outlier) to 1 (the observation
has an extraordinary influence and is definitely to be a multivariate outlier). Belsley, Kuh, and
Welsch (1980) suggest a cutoff of 2k/N where k equals the number of predictors in the model and
N is the number of observations. For example, with three X variables and a sample of 24, then
observations with leverages above 2(3)/24 = .25 should be examined. This criterion is slightly
conservative, but works well with sample sizes typical in neuroscience. As N get very large,
however, one may spend unwanted time exploring observations that still fit the regression model.
Other’s recommend that observations with a leverage of than 0.5 or more should always be
examined while those with leverages less than .20 can be ignored.
Cook’s D for an observation measures the extent to which the regression parameters
change when that observation is not included in the calculation of the parameters. In short, D is
a measure of the effect of deleting that observation from the regression. D can range from 0 to a
very high, positive number. Values close to 0 suggest that there is little change when that
observation is deleted, and hence that observation is not an outlier or unduly influential.
Different criteria have been proposed to flag potential outliers: D > 4/N or D > 4/(N – k – 1) are
but two of them. Another recommended strategy is to examine the distribution of D and look for
outliers that are large positive values.
As in the examination of residuals, one can eyeball leverage and/or D or use plots or use
the mathematical and graphical ways to detect discrepant data points. Large data sets will
certainly require graphical plots or mathematical methods.
1.2.4.2 Multicollinearity
Multicollinearity is usually defined as a state that exists when two or more of the
independent variables are highly correlated. A more precise definition may be developed by
imaging that we computed a series of multiple regressions. In each regression, one of the
independent variables became the dependent variable and all of the other IVs were the predictors.
Then multicollinearity would occur when the R2 for at least one of these regressions was high.
Note that multicollinearity applies to the X variables only. It is not influenced in any way by the
extent to which the X variables correlate with the Y variable. Hence, R2 for the model is not
influenced by multicollinearity. Instead, multicollinearity increases the standard errors of the
regression coefficients, thus making it more difficult to detect whether a coefficient is in fact
significant.
As in many statistical phenomenon multicollinearity is not an either-or state, akin to
falling off a cliff. Instead, regressions descend gradually into multicollinearity as the
correlations among the X variables increase. The central issue for the analyst is to identify of the
situation when multicollinearity becomes such a problem that it compromises the interpretation
of a regression.
Most designs in neuroscience do not have to worry about multicollinearity, except for the
very important situation of statistical interactions (which we deal with below). Why? Most
designs are experimental and hence, the independent variables will not be correlated (if there are
an equal number of observations in each cell) or very weakly correlated (if the number is close to
being equal in each cell). Outside of statistical interactions, the most likely situation that could
induce multicollinearity in experimental designs occurs when two or more highly correlated
variables are entered into the equation as control variables. Generally, however,
multicollinearity is a problem most often encountered in observational studies.
1.2.4.2.1 Diagnosing multicollinearity.

If the model has an interaction term in it—regardless of whether or not the design is a
true experiment—then the interaction may induce some degree of multicollinearity. Rather than
apply the diagnoses and remedies described below, the reader is referred to Section 1.3.2.3 which
deals with interactions in a very straightforward manner. The diagnoses and remedies described
here apply to the situation in which single predictors—and not their interactions—are highly
correlated.
Recall that multicollinearity increases the standard errors of the regression coefficients
but does not affect the overall predictability of the model. Hence, one of its major effects is to
reduce the statistical power of the tests of the regression coefficients. (The test for the
significance of a regressor equals the regression coefficient divided by its standard error which
follows a t distribution. As the denominator for this t statistic increases, the value of t decreases.)
Thus, one of the major hints that multicollinearity might influence a regression occurs when the
whole models predicts well (i.e., R2 is large and significant), but few, if any, of the regression
coefficients are significant.
A second way to examine multicollinearity is to examine the tolerance or the variance
inflation factor (VIF) of the independent variables. The tolerance is simply the quantity (1 – R2)
when that independent variable is regressed on all of the other IVs. Hence, tolerance will range
from 0 to 1. A tolerance near 1.0 implies that the IV is close to being statistically independent of
the other IVs. Tolerances close to 0 indicate multicollinearity. There is no sharp dividing line for
a “good” versus a “bad” tolerance, but tolerances below .20 should alert the analyst to potential
problems with multicollinearity. The VIF equals the reciprocal of tolerance. Hence, a VIF close
to 1 denotes independence while large VIFs suggest multicollinearity.
1.2.4.2.2 Remedies for multicollinearity

Potential fixes for multicollinearity range from the simple to the esoteric. If the
multicollinearity involves only two variables, then one can delete one variable or combine the
two into a single variable. If the standard deviations for the two variables are similar, then
simply adding them together is satisfactory. If the standard deviations differ, then convert both
variables to Z scores before adding them.
In some cases, one or more sets of variables are responsible for multicollinearity. Again,
one can drop one of the variables (or create a new variable from the sums), rerun the model and
then assess the fit. In other cases, one might want to construct new variables on the basis of
existing theory or empirical evidence. For example, suppose a neuropsychologist used all of the
subscales of the WAIS (Wechsler Adult Intelligence Scale) as predictors in a study of CNS
lesions, and the regression results suggested multicollinearity among the WAIS subscales. In
such a case, one could use the WAIS norms to calculate total IQ or, perhaps, Verbal IQ and
Performance IQ. These variables could then be used as regressors. Here, the validation of the
WAIS and subsequent research on it are a guide for reducing a large number of variables to a
fewer number of variables.
If a set of intercorrelated variables creates multicollinearity and there is no theoretical or
empirical basis for data reduction, then the best solution is to subject those variables to a
principal component analysis and use the resulting principal component scores as the IVs. This
technique is sometimes referred to as regression on principal components.
1.3 Special Topics in Multiple Regression

1.3.1 Model Comparisons
Many tasks in GLM require a comparison between models. Does a model with an
additional IV predict significantly better than the model without that variable? Can I drop two
predictor variables from a model without a significant loss in fit? In experimental designs, the
question usually arises about whether models with or without interaction terms fit better. The
thrust of all of these questions is a comparison between two models. We want to know if the
larger of two nested models predicts significantly better than the smaller model. (Conversely, we
may ask whether the smaller of two nested models still gives satisfactory prediction without a
significant loss of fit.) Indeed, all of the General Linear Model can be viewed in terms of the
comparison of nested models (Judd & McClelland, 1989).
Note the use of the word nested in the above statements. Two linear models are nested
whenever all the variables in the smaller model are contained in the larger model. Only a
smaller nested model can be compared to a larger model. If the smaller model is not nested (i.e.,
if it has a predictor variable that is not in the larger model), then the two models cannot be
compared. 3
To illustrate nesting, suppose that a data set had four potential predictor variables which
we denote here as X1, X2, X3, and X4. Now consider a GLM that uses the first three of these:
Yˆ = α + β1 X 1 + β 2 X 2 + β 3 X 3 .
The following three models are smaller models that are nested within this larger model:
Yˆ = α + β1 X 1 + β 2 X 2
Yˆ = α + β X + β X
1 1 3 3
and
Yˆ = α + β 2 X 2 + β 3 X 3 .
Each of three smaller models is nested within the larger model because all of the predictor
variables on the right-hand side of the equations are predictor variables in the larger model.
In contrast, none of the following models are nested within the larger model:
Yˆ = α + β1 X 1 + β 4 X 4
Yˆ = α + β X + β X
2 2 4 4
Yˆ = α + β 3 X 3 + β 4 X 4
and
Yˆ = α + β 4 X 4 .
Even though each of the above four models are smaller than the larger model, they are not nested
within the larger model. Why? Because they all contain variable X4 which is not contained in the
large model. Hence, it is not possible to compare these four models with the larger model.
3
More advanced methods can permit the assessment of non-nested models. They are, however,
beyond the purview of this book.
To examine model comparisons, we will first develop the general case of comparing any
two nested models. After that, we will examine the special case of comparing two models that
differ in one and only one parameter.
1.3.1.1 Model Comparisons: The General Case

Suppose that we have a linear with k predictors in the equation:
Yˆ = α + β1 X 1 + β 2 X 2 + Κ β k X k
We want to compare this model to another model that has the same predictors X1 through Xk but
adds m new predictors, giving the model
Yˆ = α + β1 X 1 + β 2 X 2 + Κ β k X k + β k +1 X k +1 + β k + 2 X k + 2 + Κ β k + m X k + m .
We want to test whether the m new predictors significantly add to prediction. This, of course, is
the same as starting with the general model of all (k + m) predictors and testing whether a model
that drops the m predictors significantly worsens the fit.
The test statistical test involves the R2s of the two models. Let Rk2+ m denote the squared
multiple correlation for the general model and let Rk2 denote the squared multiple correlation for
the reduced model. Then, the test statistic for a significant difference between the two R2s is an
F statistic of the form
R2 − R2 N − k − m − 1
F (m, N − k − m − 1) = k + m 2 k • . (X.X)
1 − Rk + m m
where N equals the number of observations in the sample. This F statistic has m degrees of
freedom in the numerator and (N – k – m – 1) degrees of freedom in the numerator. If the p level
of the F reaches significance, then the larger model is a significantly better model in terms of
prediction. Otherwise, the smaller model should be preferred.
Most modern statistical packages have provisions to test for differences in R2. Typically,
these involve fitting the general model first and then testing whether one or more of the terms
can be set to 0. In SAS, for example, the TEST statement used with PROC REG allows one to
test for the significance of a set parameter estimates.
If your software does not have this option easily available, then run the two regression
models and use Equation X.X to calculate the F statistic. You can find the significance level of
the observed F in tables found at the back of statistics books.
1.3.1.2 Model Comparisons: The Special Case of One Predictor

In practical terms, if you test a general model against a smaller model that drops one and
only one predictor, then simply run the general model and examine the t statistic and its p level
for the predictor that you want to drop. Below, we give a demonstration of this principle.
Figure 1.12 presents the results of a regression of four predictors (X1 through X4) on
dependent variable Y. Suppose that we wanted to test this model against a smaller model that
sets β3 to 0. There are two different ways in which we can do that. First, we could compute the
regression without X3 as a predictor and use the R2 from this model along with the R2 from the
general model to compute an F statistic from Equation X.X. In terms of the notation used in
Equation X.X., N = 53; (k + m) = 4 (the total number of IVs in the general model given in Figure
1.12; and m = 1 (the number of predictors dropped from the general model). From Figure X.X,
the quantity Rk2+ m equals .425. From the regression that drops X3 (not shown), the quantity Rk2 =
.4084. Substituting these quantities into Equation X.X gives
.425 − .4084 53 − 3 − 1 − 1
F (1,48) = • = 1.39
1 − .425 1
The critical value for this F is 4.04. Because the observed F is less than its critical value, the
observed F is not significant. Hence, dropping X3 from the model does not significantly worsen
fit. (Note that if we had started with the reduced model and compared it to a model that added X3,
then we could have stated that adding X3 to the model does not significantly increase R2).
Figure 1.12 Model Comparisons: A General Model with Four Predictors,
Model Comparisons
The REG Procedure

Model: MODEL1
Dependent Variable: Y
Number of Observations Read 53

Number of Observations Used 53
Sum of Mean
Model 4 15.14742 3.78685 8.87 <.0001
Error 48 20.49560 0.42699

Coeff Var 13.13833
Parameter Estimates
Parameter Standard
Intercept 1 1.34183 0.66310 2.02 0.0486

X1 1 0.14587 0.06591 2.21 0.0317
X2 1 0.05578 0.02916 1.91 0.0617
X3 1 0.01578 0.01341 1.18 0.2453
X4 1 0.05944 0.04428 1.34 0.1858
A second method is to use the TEST statement. The SAS syntax for this along with the
results of the test statement is given in Figure 1.13. Note that the F statistic for the TEST
statement is the same (within rounding error) as the one calculated above. Note also that the p
value for the F statistic is identical to the p value for the t statistic in the original, general model
presented earlier in Figure 1.12. This is no coincidence. The F statistic in Figure 1.13 is actually
the square of the t statistic in Figure 1.12. Both statistics answer the same question: Can β3 be
set to 0 without sacrificing predictability? In mathematics, two methods that answer the same
question with the same data must give the same answer. Hence, the t statistic for a parameter in
a general model is the same as the F statistic in a model comparison that drops that parameter
and only that parameter.
Figure 1.13 Model Comparisons: Example of the TEST statement in SAS for a single
predictor variable.
SAS Syntax:
TITLE Model Comparisons;
PROC REG DATA=ModelComparison2;
MODEL Y = X1 X2 X3 X4;
RUN;
TITLE2 Test that beta3 = 0;

beta3_EQ_0: TEST x3=0;
RUN;
QUIT;
SAS Output (NOTE: Only results of the TEST statement are shown):
Model Comparisons
Test that beta3 = 0
The REG Procedure

Model: beta3_EQ_0
beta3_EQ_0 Results for Dependent Variable Y
Mean
Source DF Square F Value Pr > F
Numerator 1 0.59084 1.38 0.2453
Denominator 48 0.42699
1.3.1.3 Model Comparisons: An Example

Re-examine Figure 1.12 which presented the results of a regression of four predictors (X1
through X4) on dependent variable Y. Suppose that we wanted to compare this general model to
a nested, smaller model in which β2 = 0 and β4 = 0. Before proceeding with this example, let us
take a moment to dispel a commonly held myth of data analysis. Neither the t statistic for X2 nor
the t for X4 is significant. Novice analysts sometimes make the mistake of concluding that
therefore both X2 and X4 can be set to 0. This may be the case but it does not have to be the case.
A short exercise in logic can illustrate this.
From Figure 1.12, the non significant t statistic for X2 tells us that we can fit a model with
the three predictors X1, X3, and X4 without a significant worsening of fit. The non significant
value for X4 suggests that we can fit a model with the three predictors X1, X2, and X3 without a
significant worsening of fit. Note, however, that neither of these conclusions addresses the
question at hand—can we fit a model with only the two predictors X1 and X3 without a
significant loss of prediction? The t statistics for individual predictors apply to dropping one and
only one parameter from the model. They do not necessarily inform us of the effect of dropping
more than one parameter from the model.
As an analogy, imagine that your ship sinks and you are left with two floatation
devices—a life-jacket and a circular life-saver. You can discard the life-saver and still float for a
long time because of the life-jacket. Similarly, you can discard the life-jacket, keeping the life-
saver, and still float for a long time. But does this imply that you can safely throw away both the
life-jacket and the life-saver and still remain until rescue arrives?
Hence, it is quite legitimate to ask whether both X2 and X4 can be dropped from the
general model with a significant sacrifice in prediction. The first method of doing this is to use
the TEST statement with PROC REG (or, of course, an equivalent statement with another
statistical package). The SAS syntax is shown in the upper part of Figure X.X and the results are
provided in the lower part of that Figure. The F statistic is 3.67 and its p value is .03. Hence, we
cannot simultaneously set β2 to 0 and β4 to 0 at the same time. At least one of these—and
perhaps both of them—are important for prediction. In the original, general model, we either
lacked the power or had conditions such as multicollinearity that prevented us from detecting
significance.
Figure 1.14 Model Comparisons: Example of the TEST statement in SAS for a two
predictor variables.
SAS Syntax:
TITLE Model Comparisons;

PROC REG DATA=ModelComparison2;
MODEL Y = X1 X2 X3 X4;
RUN;
TITLE2 Test that beta2 = 0 and beta4=0;

beta2_AND_beta4_EQ_0: TEST X2=0, X4=0;
RUN;
QUIT;
SAS Output (NOTE: Only results of the TEST statement are shown):
Model Comparisons
Test that beta2 = 0 and beta4=0
The REG procedure

Model: beta2_AND_beta4_EQ_0
beta2_AND_beta4_EQ_0 Results for Dependent Variable Y
Mean
Numerator 2 1.56733 3.67 0.0329

The second method for model comparisons is to run the reduced model with only X1 and
X3 as predictors and then compute the F statistic using Equation X.X. We will not show all the
results from this reduced model, but simply give its R2 (.337). Hence, in terms of the algebraic
quantities in Equation X.X, Rk2+ m = .425, Rk2 = .337, N = 53, k = 2, and m = 2. The F statistic
becomes
.425 − .337 53 − 2 − 2 − 1
F (2,48) = • = 3.67
1 − .425 2
The critical value for F with 2 and 48 df is 3.19. Because the observed F is larger than the
critical value, we reject the smaller model. Eliminating both X2 and X4 from the model
significantly worsens fit.
1.3.2 Interactions
In everyday language, we often say that two variables “interact” in predicting a third
variable, meaning that both variables are important for the prediction. In the GLM, however, the
term “interaction” has a more precise meaning. A statistical interaction between two variables
implies that the slope (or curve) for one independent variable differs in shape as a function of the
second variable. For example, a statistical interaction between dose of drug and sex implies that
the dose-response curves for males and females differ in shape. In different words, an
interaction implies that the effect of a dose is different for men and women.
An interaction can be looked upon as a nonadditive contribution of two (or more)
variables to the prediction of Y. We explore this viewpoint on interactions further in the
discussion of a two-by-two factorial design (Section 1.3.2.1) but urge the reader to view
interactions as the extent to which nonadditive factors contribute to prediction.
Let us first consider the case of two independent variables. In multiple regression,
modeling an interaction begins by creating a new variable that is the product of the two variables
involved in the interaction: e.g., X3 = X1*X2 denotes that the new variable (X3) represents the
interaction between variables X1 and X2. Then the two original variables plus the new variable
become the independent variables in the regression model. Hence, the model with the interactive
term is
Yˆ = α + β1 X1 + β 2 X 2 + β 3 X 3
or, writing it in terms of the original variables,
Yˆ = α + β1 X1 + β 2 X 2 + β 3 X1 X 2 .
With three independent variables, we can construct three new variables that are the
product of any two of the three variables: X4 = X1*X2, X5 = X1*X3, and X6 = X2*X3. These are
called two-way interactions because they model the interaction between two variables. In
addition, we can model a three-way interaction by calculating yet another new variable that is
the product of all three independent variables: X7 = X1*X2*X3. The model, expressed in terms of
the original variables, is
Yˆ = α + β1 X1 + β 2 X 2 + β 3 X 3 + β 4 X1 X 2 + β 5 X1 X 3 + β 6 X 2 X 3 + β 7 X1 X 2 X 3 .
With four independent variables, there would be six possible two-way interactions, three
possible three-way interactions (X1*X2*X3, X1*X2*X4, and X2*X3*X4), and a four-way interaction
(X1*X2*X3*X4). Usually, higher order interactions such as a four-way interaction are ignored
because it is very difficult to interpret them.
Let us examine a specific problem to illustrate interactions. It has long been known that
testosterone induces sexual activity in castrated male rats. The recovery of sexual activity is also
a function of prior sexual experience—rats with high levels of previous experience have higher
post testosterone sexual activity than those with less experience. Suppose that a lab is
investigating a new compound with a resemblance to testosterone. The experimenters raised
male rats in two ways—one group had no opportunity to mate with females while the other was
allowed to have sexual activity. Rats were then surgically castrated and divided into three
groups: (1) controls, (2) those given 10 mgs of the new drug per unit weight, and (3) those given
15 mgs. All groups were then allowed access to females in estrus and a composite index of
sexual activity was derived. The results from this hypothetical study are depicted in Figure 1.15.
Figure 1.15 Mean sexual activity (± 1 standard error) of rats with and without prior sexual
experience as a function of dose of a testosterone-like compound.
In principle, the procedure for assessing an interaction model is to fit two regression
models. The first of these may be termed the main effects model, and it does not have the
interaction term. The second regression model has the same variables as the first but includes
the interaction term. The purpose is to assess the significance of the interaction term in the
second model. (In practice, one can attain the same result by fitting a model that includes the
main effects and the interaction and then assessing the t statistic for the regression coefficient for
the interaction term—see Section 1.3.1.2. The notion of fitting two models, however, greatly
aids in the interpretation of the regression coefficients, as the following algebraic exercise will
illustrate.)
For the main effects model, we start with two independent variables. The first,
Experience, is dummy coded by assigning a 0 to rats with no previous sexual experience and a 1
to rats with prior experience. The second variable is Dose with values of 0, 10, and 15. Now
examine the regression equation in this model:
Yˆ = α + β1 ⋅ Experience + β 2 ⋅ Dose . (X.X)
Substitute the numeric values for Experience to get the predicted values for rats with no
experience ( Yˆ0 ) and those with prior experience ( Yˆ1 ):
Yˆ0 = α + β 2 ⋅ Dose ,
Yˆ = (α + β ) + β ⋅ Dose .
1 1 2
Notice that the equation for the inexperienced rats is a simple regression with intercept α and
slope β2. The equation for the experienced rats is also a simple regression, but here the intercept
is now (α + β1) while the slope remains the same at β2. Hence, the main effects model fits two
simple regressions—one for the inexperienced, the other for the experienced group—allowing
the intercepts to differ between groups but constraining the slopes to be equal. Consequently, the
regression lines for main effects models will be parallel. Parallel regression lines are not
idiosyncratic to this example—they will always be predicted for a main effects model.
Note the careful phrasing above about the intercept. The main effect model permits or
allows the intercepts to differ. It does not force them to be different. Because the intercept for
the inexperienced group is α and the intercept for the experienced rats is (α + β1), a test of
parameter β1 is a test for equality of intercepts. If β1 = 0, then the intercepts are the same; if the
hypothesis that β1 = 0 is rejected, then there is evidence for different intercepts.
Now examine the regression equation that includes the interaction term:
Yˆ = α + β1 ⋅ Experience + β 2 ⋅ Dose + β 3 ⋅ Experience ⋅ Dose . (X.X)
Substituting the numeric values for Experience gives the equations for the two groups of rats as
Yˆ0 = α + β 2 ⋅ Dose (X.X)
Y1 = (α + β1 ) + (β 2 + β 3 ) ⋅ Dose .
ˆ (X.X)
Once again, the equation for the inexperience rats is a simple regression with intercept α and
slope β2. The equation for the experienced rats also remains a simple regression. The intercept,
however, is now (α + β1) and the slope is now (β2 + β3). Hence, the interaction term allows the
slopes for the two groups to differ in addition to the intercepts. Furthermore, parameter β3, the
coefficient for the interaction term in Equation X.X, provides the test for differing slopes. When
β3 is 0, then the slopes for the two groups are the same; if we reject the hypothesis that β3 = 0,
then we have evidence for different slopes. When the two groups have different slopes, their
regression lines will not be parallel. (See Figure 1.18 and Figure 1.19 for, respectively, examples
of parallel and nonparallel regression slopes).
To summarize, main effects models allow intercepts to differ among groups but forces
their slopes to be equal. Interaction models permit different intercepts but also allow different
slopes. 4 . Hence, a significant interaction rejects the null hypothesis that the slopes are parallel.
Although the example used groups, this principle extends to continuous independent variables.
For a continuous X1, the main effects model predicts that the slopes will be parallel for any set of
specific values of X1. The interaction model provides a test for parallel slopes.
Figure 1.16 provides output from PROC GLM in SAS for the main effects model. The
main effects model fit the data well—R2 = 56, F(2,57) = 36.09, p < .0001. Both of the regression
coefficients are significant. For Experience, b = 2.51, t(57) = 4.64, p < .0001, and for Dose, b =
0.31, t(57) =7.11, p < .0001. These finding agree with previous research, so there is evidence
that the new compound has physiological activity like testosterone.
4
The terms homogeneity of slopes and heterogeneity of slopes are sometimes used to refer to,
respectively, parallel and non-parallel slopes.
Figure 1.16 Regression results for the main effect model.
Dependent Variable: response Sexual Activity Index
Sum of Mean
Model 2 316.2206 158.1103 36.09 <.0001
Error 57 249.7392 4.3814
R-Square Coeff Var Root MSE response Mean

0.558733 13.81483 2.093177 15.15167
Standard
Parameter Estimate Error t Value Pr > |t|
Intercept 11.32785714 0.52578024 21.54 <.0001
experience 2.51000000 0.54045600 4.64 <.0001
dose 0.30825714 0.04333288 7.11 <.0001
Results from fitting the interactive model are given in Figure 1.17. This interaction
model is also significant, R2 = .59, F(3,56) = 27.19, p < .0001. The appropriate statistical test for
the interaction is for the coefficient for Experience*Dose. The coefficient is b = .182 and we can
reject the null hypothesis that this it is random draw from a sampling distribution with a mean of
0 (t(56) = 2.17, p = .034). Hence, we can reject the hypothesis that the regression lines are
parallel for the rats with and without prior sexual experience.
Figure 1.17 Regression results for the interaction model.
Dependent Variable: Response Sexual Activity Index
Sum of Mean
Model 3 335.5551 111.8517 27.19 <.0001
Error 56 230.4048 4.1144
R-Square Coeff Var Root MSE Response Mean

0.592896 13.38725 2.028391 15.15167
Standard
Intercept 12.08642857 0.61810091 19.55 <.0001
experience 0.99285714 0.87412669 1.14 0.2609
dose 0.21722857 0.05938521 3.66 0.0006
experience*dose 0.18205714 0.08398338 2.17 0.0344
Note that the coefficient for Experience in Figure 1.17 is not significant (β1 = .99, t(56) =
1.14, p = .26). This does not imply that Experience plays no role in the recovery of sexual
function. Why? Because the coefficients in interactive models do not have the same
interpretation as the coefficients in main effects models. To examine the meaning of a
coefficient in an interactive model, one must substitute numeric values into the regression
equation—as we did above—and then examine their meaning. In Equation X.X, the coefficient
for Experience is β1, and from Equations X.X and X.X, we see that β1 tests whether the
intercepts for the inexperienced and experienced rats differ. Consequently, the lack of
significance for Experience implies that the two groups have the same intercept. Substantively,
this means that in the absence of the drug (i.e., Dose = 0) there is no difference in sexual activity
for rats with and without prior sexual experience. Hence, the difference in means for the two
Vehicle groups in Figure 1.15 is not statistically significant.
By substituting the numeric values of the regression coefficients into Equations X.X, we
can examine the interaction between Experience and Dose:
Yˆ0 = 12.09 + .22Dose , (X.X)
Yˆ = 13.08 + .40Dose .
1 (X.X)
For naïve rats, a 1mg increase in the drug will increase sexual activity by .22 units. In
experienced rats, however, there will be almost a two-fold increase (.40 units). Hence,
experienced rats are more sensitive to the drug than inexperienced rats.
1.3.2.1 Interactions: Example 1: The two-by-two factorial

One of the classic examples in experimental design is the two-by-two factorial. This
design involves two variables (or, in ANOVA terminology, “factors”) each of which has two
levels, say a control level and a treatment. Table 1.1 illustrates the design.
Table 1.1 Schematic of a two by two factorial design.
Factor 2:
Factor 1: Control Treatment
Control
Treatment
In concrete terms, let us assume that the mean of the group receiving both Control
treatments is 10.3. Suppose that Treatment 1 increases the response by 4.6 units. Hence, the
mean for those observations who are Controls for Factor 2 but Treatments for Factor 1 will be
10.3 + 4.6 = 14.9. Let us further assume that Treatment 2 increases the response by 2.4 units.
Then the mean for the observations who are Controls for Factor 1 but Treatments for Factor 2
will be 10.3 + 2.4 = 12.7.
If the Treatments are additive, then the predicted value for those Observations receiving
both Treatments will be the base rate (10.3) plus the effect of Treatment 1 (4.6) plus the effect of
Treatment 2 (2.4) or 10.3 + 4.6 + 2.4 = 17.3. Hence, the additive model predicts the
Treatment/Treatment cell in Table 1.1. If the mean of this cell differs significantly from this
predicted value, then we have evidence that the additive model is false. Thus, the test of an
additive model (i.e., Main Effects model) versus a non-additive model (i.e., Interactive Model)
consists in how well the mean of the Treatment/Treatment cell is predicted by the Main Effects
of Treatment 1 and Treatment 2.
To place this design into a regression analysis, we code two independent variables: X1 is
the independent variable for Factor 1 and it is dummy coded as 0 = Control and 1 =
Experimental; X2 is the independent variable for the second Factor and it is similarly coded as 0
= Control and 1 -= Experimental. The Main Effects model (i.e., the additive model) is
Yˆ = α + β1 X1 + β 2 X 2 ,
and the interactive model is
Yˆ = α + β 1 X 1 + β 2 X 2 + β 3 X 1 X 2 .
Clearly, the main effects model is a subset of the interactive model that stipulates that β3 = 0.
We can now substitute the dummy codes for X1 and X2 to obtain the predicted values for the
dependent variable in any of the four cells in this design. For example, the predicted value for a
Control for Factor 1 (i.e., X1 = 0) and a Treatment for X2 is
Yˆ = α + β1 (0) + β 2 (1) + β 3 (0)(1) = α + β 2 .
Filling all the predicted values into the empty cells of Table 1.1 gives the algebraic expressions
in Table 1.2.
Table 1.2 Predicted values of the dependent variable in a two by two factorial design with
interaction.
Factor 2:
Factor 1: Control Treatment
Control α α + β2
Treatment α + β1 α + β1 + β 2 + β 3
Here, the intercept in the regression model (i.e., α) is the predicted value for the two
Control conditions. Parameter β1 gives the effect of Treatment for Factor 1, and parameter β2
gives the effect of Treatment for Factor 2. If the effect of both Treatments is additive (i.e., the
Main Effects model), then the predicted value for those observations that have receive both
Treatments is
Yˆ = α + β1 + β 2 .
If, on the other hand, the effects of both Treatments are non-additive (or interactive) then
the predicted value will differ from the Main Effects model. Hence, the predicted value of this
cell becomes
Yˆ = α + β 1 + β 2 + β 3 .
Consequently, a test that β3 = 0 effectively tests whether the relationship between Treatment 1
and Treatment 2 is additive (β3 is not significant) or non-additive (β3 is significant). If the test
that β3 = 0 is not significant, then we prefer the Main Effects model. Otherwise, we favor the
Interactive Model.
1.3.2.2 Interactions: Example 1: Two Continuous Independent Variables

The philosophy outlined above on homogenous versus heterogeneous slopes still holds in
the case of two continuous independent variables. If the additive, Main Effects model holds,
then the slope of the regression line of Y on X1 will remain the same irrespective of the value of
X2. That is, the predicted value of Y given any specific value of X2—a quantity that we denote as
(Yˆ | X 2 ) --will be
(Yˆ | X ) = (α + β X ) + β X .
2 2 2 1 1 (X.X)
In this equation, the intercept is the quantity (α + β2X2) which will indeed depend on X2, but the
slope of the regression line is always β1 which does not depend on X2. To illustrate this, let α =
.7, β1 = .4, and β2 = 1.3. Now fix X2 at any three values that you wish, substitute each of these
into Equation X.X, and draw the three regression lines. No matter what three values you select,
the regression lines will be parallel as illustrated in Figure 1.18.
Figure 1.18 Regression lines from a main effects model.
The interactive model, on the other hand, is

Yˆ = α + β 1 X 1 + β 2 X 2 + β 3 X 1 X 2 .
If X2 is fixed at any value, then this equation can be rearranged to become
(Yˆ | X 2 ) = (α + β 2 X 2 ) + ( β1 + β 3 X 2 ) X 1 .
Here, the intercept equals the quantity (α +β2X2) which is, as before, a function of X2. The slope,
however, now equals (β1 + β3X2) which is now a function of X2. Hence, the slope of regression
line of Y on X1 depends on the value of X2. To illustrate, keep α, β1 and β2 as before but let β3 =
.8. Then, for the same three values of X2 used to compute the lines in Figure 1.18, we have the
regression lines in Figure 1.19.
Figure 1.19 Regression lines from an interaction model.
In the example, we fixed the value(s) of X2 and then examined the slope of the regression
line of Y on X1. The same principles, however, hold if we were to fix values of X1 and examine
the slopes of the regression lines of Y on X2. For example, according to the additive, Main
Effects model, the equation is
Yˆ = α + β1 X1 + β 2 X 2 = (α + β1 X1 ) + β 2 X 2 .
Again, the intercept depends on the value of X1, but the slope does not.
The non-additive or Interactive Model is
Yˆ = α + β 1 X 1 + β 2 X 2 + β 3 X 1 X 2 = (α + β 1 X 1 ) + ( β 2 + β 3 X 1 ) X 2 .
Here, both the intercept and the slope depend on the value of X1.
As in the case of the two by two factorial, the procedure for dealing with interactions
involving continuous variables is to fit two different regression models. The first of these
contains the terms without an interaction. That is,
Yˆ = α + β1X1 + β2X2 .
The second model contains the interaction, or
Ŷ = α + β1X1 + β2X2 + β3X1X2 .

A test that β3 = 0 is used to decide between models. If β3 is significant, then one favors
the interactive model. If it is not significant, then one favors the model without the interaction.
At this point, the reader may protest, “Why not just fit the interactive model, and if the
interaction is not significant, just ignore it? What does one gain by also fitting the model without
the interaction?” The answer is that one can gain a lot, mostly in the form of avoiding errors.
Because an interactive term is formed as the multiple to two variables, it can create strong
multicollinearity among the IVs. 5 As a result, β1, say, may not be significant in the interactive
model while it is highly significant in the reduced model. If the interaction term is not
significant and β1 is not significant, then one might falsely conclude that variable X1 has no
influence on the dependent variable. To avoid such incorrect inferences, we now develop a
strategy for analyzing interactions using a model-comparison approach.
1.3.2.3 Assessing Interactions: A Model-Comparison Approach

The general rules for assessing interactions using a model-comparison approach are
stated in Figure 1.20. The general philosophy is to assess the highest-order interactions first. If
they are not significant, then drop them from the model and rerun the regression using the
reduced model. Note that we recommend this approach for both observational studies and for
experimental designs. The reason why it is needed for experimental designs is that interactions
may induce multicollinearity in the data. This can have the effect of masking the significance of
the lower-order terms in the model.
It is also possible to work using a bottom-up approach. Here, one starts with a simple
model and then tests whether adding terms to that model significantly improves prediction. See
Judd and McClelland (1989) for advice in implementing that approach. We emphasize the top-
down philosophy here because it is suitable for the “test every possible interaction” approach
sometimes used in the analysis of experimental data.
5
The operative phrase here is “can induce multicollinearity.” Interactions do not always have to
induce multicollinearity. Rather than learn all the conditions under which interactions do and do
not result in multicollinearity, we urge that the reader use the algorithm outlined in the text—
statistically test for interactions and if the test is not significant, then eliminate the interaction
term and rerun the model. This simple algorithm will always work.
Figure 1.20 Model comparison algorithm for interactions.
(1) Fit the most plausible general model first. The most plausible general model should
not necessarily be the one with the highest possible interaction term, but the model with
the highest plausible interaction term. In fitting the general model, always include all
lower-order interactions, and always include all of the variables that form the
interaction. For example, if the highest plausible interaction is a three-way interaction,
then the model should include all of the two-way interactions as well as the three
variables that compose the interaction.
(2) If the highest plausible interaction term is significant, then accept that model. (One
may, however, wish to test whether other terms in the model can be eliminated). If the
interaction is not significant, then remove it from the model, consider the reduced model
as the next plausible general model, and proceed to step one.
NOTE: In some cases, this algorithm may result in evaluating a model with several
interactions of the same order. For example, suppose that the initial general model
contained a non-significant three-way interaction. Then, the next model will have three
two-way interactions as the next highest interactive terms. What follows next is an
exercise in logic. If all three of these interactions are not significant, then we know that
we can eliminate any one of them without a significant loss of fit. We do not know,
however, whether or not we can eliminate any two of them (or even all three of them)
without a significant loss of fit. In this case, it is safest to test the fit by comparing the
model with all three interactions to the model with none of the two-way interactions. If
there is no significant difference between the two, then all of the two-way interactions
can be safely eliminated. If there is a significant loss of fit, then we know that at least
two of the interactions are needed. Here, the next step would be to compare all the three
models that have one and only one two-way interaction to the model with all three two-
way interactions.
At the end of the day, one should be left with a parsimonious model that still
satisfactorily predicts the dependent variable. Inferences about the predictor variables
should be based on the significance of their coefficients in this, reduced model.
1.3.2.4 An Example of the Model Comparison Approach to Interactions.

We illustrate this approach using fabricated data from a clinical trial of three treatments.
For each treatment, there is a control group along with an experimental group and each treatment
is crossed with the other two treatments. Hence, there are eight total groups in the study.
Examples of these groups would be: Control 1—Control 2—Control 3; Active 1—Control 2—
Control 3; Control 1—Active2—Active3; and Active1—Active 2—Active 3.
We can code this design by using dummy codes for each treatment. For treatment 1, we
construct a variable X1 with values of 0 for a control subject and 1 for an active treatment
subject. Variables X2 and X3 would contain the dummy codes for respectively treatments 2 and
3.
There will be three two-way interactions among the three treatments—X1 and X2, X1 and
X3, and X2 and X3. These may be calculated as the products of the two variables involved in the
interaction. For instance, X1X2 = X1 * X2.
Finally, there will the three-way interaction. Again, this will be the product of all three
variables: X1X2X3 = X1 * X2 * X3. Hence, the general model is
Yˆ = α + β1 X 1 + β 2 X 2 + β 3 X 3 + β 4 X 1 X 2 + β 5 X 1 X 3 + β 6 X 2 X 3 + β 7 X 1 X 2 X 3 .
The logic is to fit this general model first and then test for the significance of the three-way
interaction. If it is significant, it will be kept in all subsequent models. If it is not significant,
then it will be dropped from the model. Next, the two-way interactions will be evaluated. The
ultimate goal is to arrive at a parsimonious model that explains the data well without sacrificing
predictability.
The output from the general regression model is given in Figure 1.21. The coefficient for
the three-way interaction is .7625 and it is not close to being significant (t = 0.54, p = .59).
Hence, the next step is to create a reduced model without the three-way interaction. This will
serve as a baseline model for testing the two-way interactions. Before leaving Figure 1.21, note
that there is only one significant regression coefficient, the one for X3. The overall R2 for this
model is very high (R2 = .65, not shown in Figure 1.21). The lack of significance for the
coefficients coupled with the high R2 strongly suggests multicollinearity.
Figure 1.21 Interactions: Regression coefficients for the general model.
QMIN: Regression
Model Comparisons: Interactions in an experimental design
General Model
Parameter Standard
Intercept 1 18.52500 0.50302 36.83 <.0001

X1 1 0.87500 0.71138 1.23 0.2238
X2 1 1.07500 0.71138 1.51 0.1364
X3 1 -1.98750 0.71138 -2.79 0.0071
X1X2 1 1.58750 1.00605 1.58 0.1202
X1X3 1 0.50000 1.00605 0.50 0.6211
X2X3 1 0.82500 1.00605 0.82 0.4157
X1X2X3 1 0.76250 1.42277 0.54 0.5941
The coefficients for the first reduced model are presented in Figure 1.22. The coefficient
for the interaction term X1X2 is significant. Given that the effect of multicollinearity is usually to
make it difficult to detect significance when it is indeed present, this suggests that the interaction
between X1 and X2 is meaningful. But what of the other two-way interactions?
Figure 1.22 Interactions: Regression coefficients from the first reduced model.
QMIN: Regression
Reduced Model 1
Parameter Standard
Intercept 1 18.62031 0.46758 39.82 <.0001

X1 1 0.68437 0.61221 1.12 0.2683
X2 1 0.88437 0.61221 1.44 0.1541
X3 1 -2.17813 0.61221 -3.56 0.0008
X1X2 1 1.96875 0.70692 2.78 0.0073
X1X3 1 0.88125 0.70692 1.25 0.2176
X2X3 1 1.20625 0.70692 1.71 0.0934
It is clear that we can delete X1X3 from the model (t = 1.25, p = .22) or we can delete
X2X3 from the model (t = 1.71, p = .09). So the major question is whether we can delete both of
them without significantly worsening the fit of the model. (Recall that the lack of significance of
X1X3 and X2X3 does not necessarily imply that both can safely be removed from the model—see
Section 1.3.1.3).
All good statistical packages have provisions that allow one to delete more than one IV
from the model and then assess the fit. In PROC REG in SAS, that provision is given by the
TEST statement. The output from a test for setting the coefficient for X1X3 to 0 and
simultaneously setting the coefficient for X2X3 to 0 is given in Figure 1.23. Here the test statistic
is an F ratio and its associated p value (.12) is not significant. Hence, both of these coefficients
can be set to 0 without a significant loss of fit.
Figure 1.23 Interactions: Test of the significant of both interaction terms X1X3 and X2X3.
QMIN: Regression
Reduced Model 1
Test No_X1X3_X2X3 Results for Dependent Variable Y

Mean
Numerator 2 4.46328 2.23 0.1165
We now run a second reduced model, this time eliminating both of the two-way
interactions. The results from this reduced model are given in Figure 1.24. Note that all of the
coefficients are now significant. Hence, we cannot reduce this model any further. This second
reduced model becomes out final model.
Figure 1.24 Interactions: Regression coefficients from the second reduced model.
QMIN: Regression
Reduced Model 2
Parameter Standard
Intercept 1 18.09844 0.40335 44.87 <.0001

X1 1 1.12500 0.51021 2.20 0.0314
X2 1 1.48750 0.51021 2.92 0.0050
X3 1 -1.13438 0.36077 -3.14 0.0026
X1X2 1 1.96875 0.72154 2.73 0.0084
Note the difference in the significance of the main effects in the general model (Figure
1.21) versus the final model (Figure 1.24). In the final model, all three main effects were
significant. In the general model, only X3 was significant. Note that the interaction X1X2 was
also significant in the final, but not the general, model. The reason for this is multicollinearity.
The three main effects have a correlation of .38 with the three-way interaction, and each two-
way interaction correlates .65 with X1X2X3. Also, in the general model, the tolerance for X1X2X3
is quite low (.14). When the three-way interaction is dropped and the model is reduced to its
final form, then the maximum tolerance is .33—still small, but given that each coefficient is
significant, there is no way to reduce the model without sacrificing predictability.
If we had interpreted the coefficients from the general model, we would have erroneously
concluded that only Treatment 3 had an effect on the response. Both Treatments 1 and 2, as well
as their interaction, influenced the response. Hence, we state a cardinal rule for analyzing
interaction terms in regression, ANOVA, and GLM.
Never, under any circumstances, interpret the significance of lower-order coefficients in a
model in which a higher-order interaction is not significant. Always remove the higher-
order interaction from the model and rerun the GLM.
1.3.3 Polynomial Regression

An important case of multiple regression is polynomial regression. Here, the
independent variables consist of an original independent variable and a series of new
independent variables that are powers of the original independent variable 6 . Let X1 denote the
original independent variable. Then in the data set, we create additional independent variables so
that X2 = X 12 , X 3 = X 13 , and so on. The GLM written in terms of the original independent
variable is
Yˆ = α + β1 X 1 + β 2 X 12 + β 3 X 13 + Κ β k X 1k . (X.X)
In order to understand polynomial regression, we must first understand the mathematical
implications of regressing a dependent variable on a single independent variable. As we have
shown before, this regression fits a straight line to the data points. In mathematical terms, the
6
An “original independent variable” may be a variable originally recorded in the data set or
some transform of that variable (e.g., a log or square root transform).
straight line ranges from negative infinity to positive infinity. For any concrete data set,
however, the straight line applies only to the range of values of the independent variable in the
data set.
To illustrate, consider a linear regression of weight on height in humans. Fitting a
straight line through the data points might generate the mathematical prediction that a person
who is -1.7 meters in stature should weigh -322 kilograms. Mathematically, this prediction is
correct, but it is quite illogical from the common sense view that no human can be -1.7 meters in
height.
In a similar way, when we fit a polynomial model with a square term for an independent
variable, i.e.,
Yˆ = α + β1 X + β 2 X 2 ,
we fit a parabola to the data. Mathematically, that parabola takes the form of the curve in Figure
1.25a (or that curve flipped on its head), and that curve extends from negative infinity to positive
infinity alone the horizontal axis and from the minimum of the curve to positive (or negative)
infinity along the vertical axis. For any specific data set, however, the range of values on the
horizontal axis will be limited. Hence, the practical effect of fitting a model with the X2
independent variable to the data will be to take a “slice” from the parabola in Figure 1.25a (or a
“slice” from an inverted form of that parabola). Examples of those slices—i.e., the types of
curves that could be fitted to real data—are illustrated in panels (b) through (d) of Figure 1.25.
Figure 1.25 Examples of quadratic curves.
(a) (b)
(c) (d)
Similarly, a cubic regression fits the model

Y = α + β1 X + β 2 X 2 + β 3 X 3 + E ,
and an example of the general form of the cubic polynomial is provided in panel (a) of Figure
1.26. Specific “slices” of different cubic curves are illustrated in panels (b) through (d) of that
Figure. Note that a cubic can be used to model relationships that asymptote (panels b and c) as
well as curves in which the rate of acceleration to a maximum differs from the deceleration rate
to baseline (panel d).
Figure 1.26 Examples of cubic curves.
(a) (b)
(c) (d)
1.3.3.1 Fitting Polynomial Models

We illustrate polynomial regression with the serotonin data set. Here, the purpose is to
examine the time course of the effect of administering a drug that purports to increase levels of
serotonin (5-HT) in the CNS. Groups of rats were administered a standard dose of the agonist
and sacrificed at various time points after administration. CSF was then assayed for 5-HIAA, the
principle metabolite of 5-HT. Figure 1.27 presents the mean level of 5-HIAA for each group of
rats. (The predicted means in the Figure will be discussed later).
Figure 1.27 Assays of 5-HIAAA in CSF as a function of time.
The first step in fitting polynomial regression is to formulate an a priori hypothesis about
the form of the curve. In experimental studies this hypothesis should guide the original design of
the study. For example, in the 5-HT study, we assume that 5-HT levels will increase, reach a
peak, and then decrease to baseline. Hence, we expect at least a quadratic relationship, but we
should also be prepared to fit a cubic or even a quartic to test whether the rate of increase in the
initial stage equals the rate of decrease in the latter stages.
In terms of fitting polynomial models, think of the term X2 as X * X or an interaction
between X and itself. Similarly, X3 may be viewed as the three-way interaction of variable X
with itself. According to this perspective, polynomial regression amounts to a series of
interactions involving one variable with itself.
Hence, the algorithm developed for interactions also applies to polynomial regression. In
concrete terms this algorithm is stated in Figure 1.28
Figure 1.28 Algorithm for fitting polynomial regression models.
(1) Fit the model with the highest plausible polynomial term.
(2) If the coefficient for the highest plausible polynomial terms is significant, then
accept that model (although one may wish to test the significance of lower-order
coefficients in that model). If the coefficient for the highest plausible polynomial term
is not significant, then drop that term from the model. Consider the next highest
polynomial term as the most plausible polynomial tern and go to (1).
In short, one starts with the largest plausible polynomial and then tests for the significant of the
largest term and only the largest term. If that coefficient is significant then accept the model
(although one may subsequently test for the significance of other terms in the model). If the term
is not significant, then eliminate it from the model, rerun the regression, and examine the
significance of the next highest term. Continue this procedure to reduce the polynomial to the
lowest significant order.
In the 5-HIAA data, we would start with a quartic and then test the quartic term. If the
quartic is significant, then we stop there and accept that model. If it is not significant, then we
delete the quartic term, fit a cubic polynomial, and examine the significance of the regression
coefficient for the cubic term. If it is significant, then we accept the cubic polynomial as the
final model. If it is not significant, then we delete the cubic term and fit a quadratic model. If
the quadratic term is significant, then we accept the quadratic. Otherwise, we drop the quadratic
term and fit the simple linear model. If the linear model is significant, then we accept that model.
Otherwise, we conclude that there is no change in 5_HIAA over time.
Figure 1.29 gives the output from PROC REG in SAS that fitted up to a quartic
polynomial to the 5-HT data. In this output, variable time2 is the square of time, time3 is the
cube of time, etc. Only that section of the output that pertains to the regression coefficients is
shown. For completeness, all regressions from the linear to the quartic are shown.
We would begin with the quartic model given in the last part of the table. The coefficient
for the quartic is not significant (b = .0043, t = .40, df = 91, p = .69). Hence, we would eliminate
the quartic and fit a cubic. Recall here the importance of the algorithm. Our interest is only in
the quartic term. If we examined the significance of all terms in the model, we may have
erroneously concluded that there was not change of 5-HIAA over time.
The next step is to eliminate the quartic term and run the cubic model. The coefficient
for the cubic is significant (b = .0399, t = 2.14, df = 92, p = .03), so we would accept this model
as the most parsimonious polynomial that does not sacrifice predictability. The plot of the
predicted means in Figure 1.27 was derived from the parameters of the cubic model.
The inclusion of the linear and the quadratic regressions in Figure 1.29 suggest an
important lesson—be wary of starting with the lowest term and working up. Had we fitted the
linear model first, then we might have been tempted to say there was no significant change over
time because the linear term was not significant (b = -.133, t = -1.68, df = 94, p = .10). It is quite
true that this coefficient is not significant, but the only thing that this tells us is that the best
fitting straight line through the means in Figure 1.27 has a slope near 0. There may be nonlinear
effects of time on 5-HIAA.
Indeed, fitting a quadratic term substantially increases predictability—R2 increases from

.03 to .22, and the coefficient for the square term (-.1713) is highly significant (t = -4.80, df = 93,
p < .0001). This implies that there is a significant “bend” to the plot of means in Figure 1.27.
But this quadratic effect may have been missed if one stopped the analysis after fitting only the
linear model.
Figure 1.29 Output from regression of 5-HIAA on the polynomials of time.
Linear Model: R2 = .029

Intercept 1 12.89554 0.40067 32.18 <.0001
time 1 -0.13317 0.07934 -1.68 0.0966
Quadratic Model: R2 = .222

Intercept 1 10.32589 0.64576 15.99 <.0001
time 1 1.40861 0.32924 4.28 <.0001
time2 1 -0.17131 0.03571 -4.80 <.0001
Cubic Model: R2 = .259

Intercept 1 8.34881 1.11909 7.46 <.0001
time 1 3.46558 1.01261 3.42 0.0009
time2 1 -0.71051 0.25400 -2.80 0.0063
time3 1 0.03994 0.01863 2.14 0.0347
Quartic Model: R2 = .260

Intercept 1 9.07845 2.13917 4.24 <.0001
time 1 2.39322 2.86166 0.84 0.4052
time2 1 -0.24312 1.19340 -0.20 0.8390
time3 1 -0.03745 0.19393 -0.19 0.8473
time4 1 0.00430 0.01072 0.40 0.6894
1.3.3.1.1 Advanced Topic: Fitting polynomial models: A sneaky shortcut.

In Section X.X, we spoke of two different sums of squares—hierarchical sums of squares
and partial sums of squares. The algorithm outlined above will always work regardless of the
types of sums of squares used. If, however, the regression or GLM software that you use allows
easy calculation of the hierarchical sums of squares, then there is an easy shortcut for computing
polynomial regression.
The shortcut is to fit the polynomial model and request the hierarchical sums of squares
from the software. Then select the highest order of the polynomial that is significant. Figure
1.30 presents the hierarchical solution (labeled as the “Type I SS”) from fitting a quartic to the 5-
HIAA data set referenced above 7 . (Recall that variable “time1” is the linear term, “time2” is the
quadratic, and so on.)
Figure 1.30 Hierarchical solution (Type I SS) for fitting a quartic polynomial model to the
5-HIAA data set.
Source DF Type I SS Mean Square F Value Pr > F

time1 1 8.93867937 8.93867937 3.58 0.0617
time2 1 59.16345714 59.16345714 23.68 <.0001
time3 1 11.37122475 11.37122475 4.55 0.0356
time4 1 0.40152468 0.40152468 0.16 0.6894
The quartic term (time4) is not significant (F(1, 94) = 0.16, p = .69). The cubic term
(Time3), however, is significant (F(1, 94) = 4.55, p = .04). Hence, we select the cubic
polynomial as the best model. We would then run the cubic model and use the coefficients from
this model to obtain predicted values of 5-HIAA as a function of time. The results would be the
same as if we had followed the algorithm presented in Figure 1.28.
1.3.3.2 Advanced Topic: Maximum and Minimums in Polynomial Regression

One of the advantages of polynomial regression is that it can provide precise estimates of
maximal (or minimal) response. In designing experiments such as the time-course one given
above, selection of the initial time intervals can be problematic, especially when investigating a
new drug. Also, the time intervals are usually round numbers (e.g., 5 minutes, 10 minutes) while
the physical action of drugs respects no arbitrary time unit and can peak or trough anyplace.
Hence, coefficients from a polynomial equation can be helpful in “filling in” those portions of a
curve between adjacent groups.
The problem of finding the maxima and minima of a function is one of differential
calculus—get the first derivative of the equation, set the function value to 0, and then take the
roots of the derivative. For example, the first derivative of a quadratic equation with respect to
the independent variable is
∂Yˆ
= β1 + 2β 2 X .
∂X
Setting this to 0 gives
β1 + 2β 2 X = 0 ,
and solving for X gives
β1
X =− .
2β 2
Substituting the numeric estimates of β1 and β2 into this equation provides an estimate of the
maximum (or minimum). Simply look at the curve of predicted values to determine whether the
result is a maximum or a minimum.
Table 1.3 gives the first derivatives for up to a fifth-order polynomial. The easiest way to
solve these equations is to use mathematical software that solves for roots (i.e., the values of X)
7
This solution was generated from PROC GLM in SAS.
of polynomials. Good statistical packages will always have such a feature 8 . For a polynomial of
order k, there will be (k – 1) roots. Select only those roots that are within (or very close to being
within) the range of values of X in the data set.
Table 1.3 Equations for finding the maxima and minima of some polynomial functions.
Order of
Polynomial Equation
2 β1 + 2β 2 X = 0
3 β1 + 2β 2 X + 3β 3 X 2 = 0
4 β1 + 2β 2 X + 3β 3 X 2 + 4β 4 X 3 = 0
5 β1 + 2β 2 X + 3β 3 X 2 + 4β 4 X 3 + 5β 5 X 4 = 0
As an example, substituting the regression coefficients for the serotonin time-course
example into the equation in Table 1.3 for a cubic polynomial gives
3.46558 − 2(.71051) X + 3(.03994) X 2 = 0 ,
3.46558 − 1.42102 X + .11982 X 2 = 0 .
Note that it is recommended practice to use several decimal places in solving for the roots of
polynomials. Using the POLYROOT function in PROC IML of SAS, we find that the time unit
giving maximal response was 3.4. Minimal response was at time unit 8.4.
1.3.3.3 Example: Polynomial Regression with Ordered Groups

An important application of polynomial regression occurs with ordered groups (see
Section X.X). Many studies of substance abuse in humans place subjects into ordinal categories.
Cigarette use within the past month, for example, be coded into levels of “never smoked,”
“previous user, currently abstinent,” “occasional, but not a daily smoker” “daily smoker, less
than 20 cigarettes per day,” “daily smoker, 20 to 40 cigarettes,” and “daily smoker, more than 40
cigarettes.” Suppose that a study on the effects of nicotine withdrawal placed subjects into these
categories. The purpose of the study was to examine the effects of a specified period of
abstinence from nicotine on a variety of biological measures. Participants entered the lab and
spent a fixed amount of time waiting—without the opportunity for smoking—before being
outfitted with a series of electrodes. The dependent variable is the percentage of time spent
during a relaxation session in alpha EEG wave activity. The more time spent in recorded alpha
during this session is taken as an overall measure of relaxation.
Figure 1.31 plots the observed means of percent time in alpha activity as a function of the
level of cigarette smoking. (Ignore for the moment, those lines in this Figure that represent
predicted values. We deal with them later.) It is obvious from the observed means that those
subjects with high cigarette consumption spend have lower measures of relaxation than those
with little or no consumption. But can we extract more information from these data? The
answer is “Yes.”
Analysis of these data would begin by assigning ordinal numbers to the categories: 1 for
“never smoked,” 2 for “previous user, currently abstinent,” and so on, up to 6 for “daily smoker,
more than 40 cigarettes.” Next, create new variables that are the polynomials for this ordered
8
Sometimes the routine will be described as finding “zeros” of a polynomial.
group variable. Finally, perform the polynomial regressions as outlined above in Sections
1.3.3.1 and 1.3.3.1.1.
Figure 1.31 Mean (+/- one standard error) relaxation scores of six categories of cigarette
use after a brief period of abstinence. Also shows are the best fitting linear, quadratic, and
cubic polynomials.
In analyzing these data, we used the hierarchical method specified in Section 1.3.3.1.1
and started with a quartic model. The polynomial variables were called Poly1 (the linear term)
through Poly4 (the quartic term). Output from the hierarchical SS from PROC GLM in SAS is
given in Figure 1.32.
Figure 1.32 Hierarchical solution (Type I SS) from fitting a quartic polynomial to the
cigarette-abstinence data.
Source DF Type I SS Mean Square F Value Pr > F

Poly1 1 28588.12723 28588.12723 51.30 <.0001
Poly2 1 7158.86418 7158.86418 12.85 0.0005
Poly3 1 4888.27163 4888.27163 8.77 0.0036
Poly4 1 1430.65951 1430.65951 2.57 0.1115
The quartic model over fits the data because the quartic term is not significant. The first
three terms, however, are significant, so we settle on the cubic polynomial as the best model.
As should be clear from this description, the mechanics of fitting polynomials to ordered
groups are the same as fitting polynomials where the independent variable is measured on an
interval or ratio scale. The difference lies in the interpretation of the results. With an interval or
ratio scale, one can extrapolate between groups and calculate maxima and minima response
points. With ordered groups, however, such extrapolations and calculations are valid only to the
extent that the underlying order of the groups approaches an interval scale.
The polynomial functions for ordered groups should be used to interpret differences in
the group means. Hence, the model can be used to make inferences about the groups, but the
nature of those inferences depends on the nature of the groups. Let us illustrate. Return to
Figure 1.31 and compare the predicted values for the linear model to those from the quadratic
model. The linear model predicts that increasing exposure to tobacco is associated with
decreased relaxation. The quadratic model—which, recall, fits better than the linear model—
reveals something more. The curve for the first three groups is flat. This suggests that those
who have never smoked, are ex-smokers, or occasionally smoke have similar levels of
relaxation. Once we get to daily smokers, however, the curve descends in a “dose-dependent”
fashion. This pattern could be interpreted as a difference between those currently addicted to
nicotine and those not currently addicted.
The cubic curve adds to prediction in two ways. First it suggests a meaningful difference
among the first three groups. Those who have never smoked may be different in their tendency
to relax from those who have taken up smoking—either in the past (the abstinent group) or only
occasional. This may have less to do with the additive and physiological properties of nicotine
and more to do with the participants’ environments and personalities during the period of
maximal risk for sampling cigarettes. The statistical analysis cannot prove this, but it acts as a
good heuristic than can guide future research into this area.
The second difference between the cubic and the quadratic curve is the “dose-response”
portion of the curve for daily smokers. The quadratic curve predicts an almost linear decrease in
relaxation from the 3rd (daily smokers, less than 1 pack) to the 6th (daily smokers, more than 2
packs). The cubic curve, on the other hand, agrees with the observed means in suggesting that
the “dose-response” curve flattens after a certain point. Because the data consist of ordered
groups and not a firm quantitative estimate of dose, one should not make strong claims about
where the predictions asymptote. One could, however, use the form of the curve to guide the
design of further studies into this area.
1.3.3.4 Example: Polynomial Regression with Interaction

Figure 1.33 presents the mean dose response curves for two groups of mice—a control
group and a group receiving a drug agonist. (The predicted values in this Figure come from the
best fitting regression model which will await further explication below). It is clear from this
Figure that the agonist has the effect of increasing the response by roughly two units. But does
the overall shape of the dose-response curve for the agonist differ from that of the controls?
Figure 1.33 Observed means and means predicted from the best fitting regression model
for dose-response curves from a control group and a group administered a drug agonist.
To answer this question, we write a model that includes both polynomial terms (in order
to model the dose-response) and interactive terms (in order to test whether the shape of the dose-
response curve differs between the Control and the Agonist group). Let us begin by dummy
coding the group variable such that 0 = Control and 1 = Agonist. We will then fit a series of
regression models, starting with the most general model and then reducing it to a parsimonious
one that still predicts well. Finally, after we settle on the best model, we will substitute into the
regression equation a value of 0 for the Control group and 1 for the Agonist group to interpret the
meaning of the model’s parameters.
Figure 1.34 presents the statistics for overall fit and the parameter estimates from two
regression models. The first order of business is to ascertain the most parsimonious model that
explains the data without sacrificing parsimony to explanatory power. Model 1 fits the main
effect for agonist, the polynomial for dose, the an interaction term between agonist and the linear
term for dose (agonist*dose), and an interaction between agonist and the quadratic term for dose
(agonist*dose*dose).
Figure 1.34 Results of regressing dependent variable Response on independent variables
group (Control vs. Agonist) and dose of drug, including the polynomial effects of drug and
the interactions of group and dose of drug.
Model 1: F(5,174) = 16.05, p < .0001, R2 = .316

Standard
Intercept 16.75001587 0.41305636 40.55 <.0001
agonist -0.13864286 0.58414990 -0.24 0.8127
dose 0.20909714 0.11697777 1.79 0.0756
dose*dose -0.01479873 0.00743560 -1.99 0.0481
agonist*dose 0.36138286 0.16543155 2.18 0.0303
agonist*dose*dose -0.01326095 0.01051552 -1.26 0.2090
Model 2: F(4,175) = 19.60, p < .0001, R2 = .309

Standard
Intercept 16.54281349 0.37961019 43.58 <.0001
agonist 0.27576190 0.48376998 0.57 0.5694
dose 0.30855429 0.08653941 3.57 0.0005
dose*dose -0.02142921 0.00526662 -4.07 <.0001
agonist*dose 0.16246857 0.04996355 3.25 0.0014
In the general model, the three way interaction is not significant (t(174)= -1.26, p = .21).
Hence, this term is removed form the model and the regression is rerun.
In the reduced model, both of quadratic term for dose (dose*dose) and the interaction
between agonist and dose (agonist*dose) are significant. Hence, we will accept this model.
(Note: One could also drop the variable agonist from the model and rerun it. We hold off doing
that in order to explain the meaning of the coefficient for agonist).
Having arrived at a satisfactory model, we must now perform some algebraic
manipulations to derive the meaning of the parameters. The general equation for Model 3 is
Yˆ = α + β1Agonist + β 2 Dose + β 3Dose 2 + β 4 Agonist * Dose .
Because we have coded Controls as 0 and Agonists as 1, we have the following two equations
for the Control and the Agonist groups:
YˆControl = α + β 2 Dose + β 3Dose 2 , (X.X)
and
YÂgonist = (α + β1 ) + (β 2 + β 4 )Dose + β 3Dose 2 . (X.X)

Using the logic outlined above for the interactions, the intercept for the Control group is
α, while the intercept for the Agonist group is (α + β1). Hence, the regression coefficient β1 tests
whether the intercepts for the two groups differ. From the preferred model (Model 2) in Figure
1.34 the t test for this parameter is not significant (t(175) = 0.54, p = .57), so we conclude that
the intercepts for the two groups are the same. In concrete terms, we conclude that the mean for
the Controls who have not received the active drug is the same as the mean for the Agonist group
who have not received the active drug.
Having explained the mean of coefficient β1, let us now drop variable “agonist” from the
model and rerun it. The overall R2 changes very little—it drops from .309 to .308. Similarly, the
parameter estimates hardly change. Keeping the same subscripts as in Equations X.X and X.X,
they are: Intercept = 16.681, β2 = .296, β3 = -.021, and β4 = .187.
We can now substitute 0 for β1 into Equations (X.X) and (X.X), giving,
YˆControl = α + β 2 Dose + β 3 Dose 2 ,
and
YÂgonist = α + (β 2 + β 4 )Dose + β 3Dose 2 .
Now ask the following question—as long as the dose of the drug is not 0 (a situation that we
have already described), what is the predicted difference between the Control and the Agonist
groups? To answer this query, we can subtract the predicted value of the Controls from the
predicted value of the Agonists, giving
YÂgonist − YˆControl =
α + (β 2 + β 4 )Dose + β 3 Dose 2 − α − β 2 Dose − β 3 Dose 2 = β 4 Dose.
Thus, the predicted mean difference between the two groups depends on the dose of the
drug. The estimate of β4 in the accepted model is .189. If the dose is 5 mg/weight, then the
predicted difference in response is .187(5) = .0.94 units; if the dose is 10 mgs/weight, then the
predicted different is .187(10) = 1.87 units; and if the dose is 15 mgs/weight, then the predicted
difference is .187(15) = 2.81 units.
Finally, we can use the equations in Section 1.3.3.2 (max/min response) to calculate the
maximum response for the Control and the Agonist group. The equation for the control group is
YˆControl = 16.81 + .296 ⋅ Dose − .021 ⋅ Dose 2 ,
so the maximum response occurs when dose is
.296
Dose = − = 7.05 mgs/weight .
2(−.021)
The equation for the Agonist group is
YÂgonist = 16.81 + .296 ⋅ Dose − .021 ⋅ Dose 2 + .187 ⋅ Dose =
,
16.81 + .483 ⋅ Dose - .021 ⋅ Dose 2
so the maximal predicted response for this group is with
.483
Dose = − = 11.5 mgs/weight .
2(−.021)
1.4 References
Cohen, J. & Cohen, P. (1983). Applied Multiple Regression/Correlation Analysis for the
Behavioral Sciences, 2nd Ed. Hillsdale, NJ: Lawrence Erlbaum.
Judd, C.M. & McClelland, G.H. (1989). Data Analysis: A Model-Comparison Approach. New
York: Harcourt, Brace, Jovanovich.
Belsley, D.A., Kuh, E., and Welsch, R.E. (1980). Regression Diagnostics. New York: John
Wiley & Sons, Inc.
1.5 Tables
Table 1.1 Schematic of a two by two factorial design............................................................... 1-28
Table 1.2 Predicted values of the dependent variable in a two by two factorial design with
interaction. ................................................................................................................................. 1-29
Table 1.3 Equations for finding the maxima and minima of some polynomial functions......... 1-44
1.6 Figures:
Figure 1.1 Example of a scatter plot and the regression line (line of best bit). ........................... 1-2
Figure 1.2 Output from a simple regression predicting the quantity of receptors in human cortex
as a function of age. ..................................................................................................................... 1-4
Figure 1.3 Examples of nonlinear relationships. ......................................................................... 1-5
Figure 1.4 Example of residuals that are not normally distributed.............................................. 1-6
Figure 1.5 Example of normally distributed residuals................................................................. 1-7
Figure 1.6 Examples of equal variance of residuals (homoscedasticity) and unequal variance of
residuals (heteroscedasticity). ...................................................................................................... 1-7
Figure 1.7 Example of a scatter plot containing an outlier. ......................................................... 1-8
Figure 1.8 Results from a regression with an outlier. .................................................................. 1-9
Figure 1.9 The same scatterplot after removing the outlier. ........................................................ 1-9
Figure 1.10 Results of the regression with the outlier removed. ............................................... 1-10
Figure 1.11 Multiple regression of receptor number on age and cotinine. ................................ 1-12
Figure 1.12 Model Comparisons: A General Model with Four Predictors,............................... 1-20
Figure 1.13 Model Comparisons: Example of the TEST statement in SAS for a single predictor
variable....................................................................................................................................... 1-21
Figure 1.14 Model Comparisons: Example of the TEST statement in SAS for a two predictor
variables. .................................................................................................................................... 1-23
Figure 1.15 Mean sexual activity (± 1 standard error) of rats with and without prior sexual
experience as a function of dose of a testosterone-like compound............................................ 1-25
Figure 1.16 Regression results for the main effect model. ........................................................ 1-27
Figure 1.17 Regression results for the interaction model. ......................................................... 1-27
Figure 1.18 Regression lines from a main effects model........................................................... 1-30
Figure 1.19 Regression lines from an interaction model. .......................................................... 1-31
Figure 1.20 Model comparison algorithm for interactions. ....................................................... 1-33
Figure 1.21 Interactions: Regression coefficients for the general model. ................................. 1-34
Figure 1.22 Interactions: Regression coefficients from the first reduced model....................... 1-35
Figure 1.23 Interactions: Test of the significant of both interaction terms X1X3 and X2X3........ 1-35
Figure 1.24 Interactions: Regression coefficients from the second reduced model. ................. 1-36
Figure 1.25 Examples of quadratic curves................................................................................. 1-38
Figure 1.26 Examples of cubic curves....................................................................................... 1-39
Figure 1.27 Assays of 5-HIAAA in CSF as a function of time. ................................................ 1-40
Figure 1.28 Algorithm for fitting polynomial regression models.............................................. 1-41
Figure 1.29 Output from regression of 5-HIAA on the polynomials of time. ........................... 1-42
Figure 1.30 Hierarchical solution (Type I SS) for fitting a quartic polynomial model to the 5-
HIAA data set. ........................................................................................................................... 1-43
Figure 1.31 Mean (+/- one standard error) relaxation scores of six categories of cigarette use after
a brief period of abstinence. Also shows are the best fitting linear, quadratic, and cubic
polynomials................................................................................................................................ 1-45
Figure 1.32 Hierarchical solution (Type I SS) from fitting a quartic polynomial to the cigarette-
abstinence data. .......................................................................................................................... 1-46
Figure 1.33 Observed means and means predicted from the best fitting regression model for
dose-response curves from a control group and a group administered a drug agonist. ............. 1-47
Figure 1.34 Results of regressing dependent variable Response on independent variables group
(Control vs. Agonist) and dose of drug, including the polynomial effects of drug and the
interactions of group and dose of drug. ..................................................................................... 1-48

Regression 2006-03-01

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Regression 2006-03-01

Uploaded by

Copyright:

Available Formats

QMIN Regression 2006-03-01: 1-1

1.1.2 How to Do It.

1.1.2.1 Step 1: Check the Data

1.1.2.2 Step 2: Compute the Regression

1.1.2.3 Step 3: Interpret the Results

1.1.2.4 Step 4. Communicating the Results

Figure 1.3 Examples of nonlinear relationships.

1.1.3.2 Normality of Residuals

Figure 1.5 Example of normally distributed residuals.

1.1.3.3 Equality of Residual Variances

1.1.4 Problems and Diagnostics

1.1.4.1 Outliers and Influential Data Points

Figure 1.8 Results from a regression with an outlier.

Figure 1.10 Results of the regression with the outlier removed.

1.2 Multiple Regression

1.2.2.1 Step 1: Check the Data

1.2.2.2 Step 2: Compute the Regression

1.2.2.3 Step 3: Interpret the Results

Root MSE 0.74994 R-Square 0.5267

1.2.2.4 Step 4: Communicating the Results

1.2.3.2 Normality of Residuals

1.2.3.3 Equality of Residual Variances

1.2.4 Problems and Diagnostics

1.2.4.1 Outliers and Influential Data Points

1.2.4.1.1 Inspection of Residuals

1.2.4.1.2 Multivariate Outliers and Influential Data Points

1.2.4.2.1 Diagnosing multicollinearity.

1.2.4.2.2 Remedies for multicollinearity

1.3 Special Topics in Multiple Regression

1.3.1.1 Model Comparisons: The General Case

1.3.1.2 Model Comparisons: The Special Case of One Predictor

The REG Procedure

Number of Observations Read 53

Root MSE 0.65345 R-Square 0.4250

Intercept 1 1.34183 0.66310 2.02 0.0486

TITLE2 Test that beta3 = 0;

The REG Procedure

beta3_EQ_0 Results for Dependent Variable Y

1.3.1.3 Model Comparisons: An Example

TITLE Model Comparisons;

TITLE2 Test that beta2 = 0 and beta4=0;

The REG procedure

beta2_AND_beta4_EQ_0 Results for Dependent Variable Y

Numerator 2 1.56733 3.67 0.0329

Figure 1.16 Regression results for the main effect model.

Dependent Variable: response Sexual Activity Index

R-Square Coeff Var Root MSE response Mean

R-Square Coeff Var Root MSE Response Mean

1.3.2.1 Interactions: Example 1: The two-by-two factorial

Table 1.1 Schematic of a two by two factorial design.

1.3.2.2 Interactions: Example 1: Two Continuous Independent Variables

The interactive model, on the other hand, is

Ŷ = α + β1X1 + β2X2 + β3X1X2 .

1.3.2.3 Assessing Interactions: A Model-Comparison Approach

Figure 1.20 Model comparison algorithm for interactions.

1.3.2.4 An Example of the Model Comparison Approach to Interactions.

Intercept 1 18.52500 0.50302 36.83 <.0001

Intercept 1 18.62031 0.46758 39.82 <.0001

Test No_X1X3_X2X3 Results for Dependent Variable Y

Intercept 1 18.09844 0.40335 44.87 <.0001

1.3.3 Polynomial Regression

Figure 1.25 Examples of quadratic curves.

Similarly, a cubic regression fits the model