10, page 1

Math 445 Chapter 10 Inferences About Regression Coefficients

Chapter 10 concerns statistical inferences about individual regression coefficients, about linear

combinations of coefficients, and about sets of coefficients. All these inferences, which are based on

either the t or F distribution, are dependent on the assumptions of normality of the residuals, constant

variance, and independence. Assessment of these assumptions is covered in Chapter 11.

Consider the additive model (a model without interactions is called “additive” because the effects of

the variables are additive and don’t depend on the levels of the other variables):

Assume that the linear regression model assumptions are satisfied; that is, that the model fits, that the

residuals are normal with constant variance and that the observations are independent. The formal

inferences we make below are valid only if these assumptions are satisfied.

Coefficientsa

Unstandardized

Coefficients 95% Confidence Interval for B

Model B Std. Error t Sig. Lower Bound Upper Bound

1 (Constant) -97.557 24.554 -3.973 .0005 -148.028 -47.085

Latitude (degrees) 3.428 .667 5.139 .0000 2.057 4.800

Altitude (ft) .00115 .00085 1.352 .1880 -.00060 .00290

Rainshadow -19.688 3.439 -5.725 .0000 -26.758 -12.619

a. Dependent Variable: Precipitation (in)

• The t statistic and P-value for each coefficient are for a two-sided test of the hypothesis that the

true coefficient is 0.

of the hypothesis H 0 : β1 = 0 . There is convincing evidence (P=.0005) that β1 is greater than

0. In addition, we estimate that mean precipitation rises about 3.43 inches for every one degree

increase in latitude (95% confidence interval: 2.06 to 4.80 inches) given that altitude and rain

shadow remain the same.

• The test of H 0 : β1 = 0 (and the confidence interval) is for the model which also has Altitude

and Rainshadow in it. Thus, it is a test of the effect of Latitude after the linear effects of

Altitude and Rainshadow have been adjusted for. This is different than a test of H 0 : β1 = 0

without Altitude and Rainshadow in the model.

• We do not have convincing evidence (P = .188) that mean precipitation changes with altitude,

given that latitude and rain shadow remain fixed. We estimate that mean precipitation

increases by 1.15 inches for every 1000 foot increase in altitude (95% confidence interval, 0.60

inch decrease to 2.90 inch increase).

• Do locations in the rain shadow differ from those not in the rain shadow, after adjusting for the

effects of latitude and altitude? (In other words, is there evidence that β 3 ≠ 0 ?) There is

Chap. 10, page 2

completely convincing evidence (P<.00005) that locations in the rain shadow receive less

precipitation on average than locations of the same latitude and altitude not in the rain shadow.

What is more interesting is that locations in the rain shadow are estimated to have mean

precipitation 19.7 inches less (95% confidence interval: 26.8 inches to 12.6 inches less) than

equivalent locations (on altitude and latitude) not in the rain shadow.

When interactions are present in a model, the test of significance for the coefficient on a term which is

involved in a higher order interaction is not useful because we must always include this term in the

model anyway. In addition, the coefficient on this term does not have a meaningful interpretation.

Example: In the Chapter 9 notes, we fit the following model to the rainfall data:

Coefficientsa

Unstandardized Standardized

Coefficients Coefficients

Model B Std. Error Beta t Sig.

1 (Constant) -175.457 26.177 -6.703 .000

Latitude (degrees) 5.581 .705 .895 7.912 .000

Rainshadow 139.839 39.019 4.240 3.584 .001

Latitude*Rainshadow -4.315 1.051 -4.871 -4.105 .000

a. Dependent Variable: Precipitation (in)

The coefficient on rainshadow is large and positive – but it does not mean that locations in the

rainshadow are estimated to have mean precipitation 139.8 inches greater than locations of the same

latitude not in the rainshadow! Why not?

The statistical significance of the coefficients on the first-order terms (Latitude and Rainshadow) is

also irrelevant since they are both involved in the second-order term. In particular, if either coefficient

were not statistically significantly different from 0 (large P-value), that would not mean that we had no

evidence of an effect of that variable. For example, if the coefficient for Latitude in the above model

had had a statistically nonsignificant coefficient, that would not mean that we had no evidence of an

effect of latitude, because the effect of latitude also comes through the Latitude*Rainshadow

interaction, which is statistically significant.

+ β 5 Altitude * Rainshadow + β 6 Latitude * Rainshadow + β 7 Altitude * Latitude * Rainshadow

• We must include all two-way interactions which are part of the 3-way interaction.

Chap. 10, page 3

• The coefficient on the 3-way interaction is interpreted as the difference between the effect of the

two-way interaction between any pair of variables for different levels of the third variable. For

example, β 7 represents the difference in the effect of the Altitude by Latitude interaction for

locations in and not in the rain shadow.

• The coefficients on all the terms below the 3-way interaction have no useful interpretation as long

as the 3-way interaction is in the model, and the tests of significance of these terms are not

meaningful.

• The test of significance on the coefficient on the 3-way interaction is meaningful: we have no

evidence that there is a 3-way interaction among these variables in their association with

precipitation. That’s good: we generally don’t want to include a 3-way interaction unless we

have strong evidence to the contrary.

Coefficientsa

Unstandardized Standardized

Coefficients Coefficients

Model B Std. Error Beta t Sig.

1 (Constant) -178.154 26.390 -6.751 .000

Altitude (ft) .0248 .0172 3.129 1.444 .163

Latitude (degrees) 5.5929 .7191 .897 7.778 .000

Rainshadow 72.7033 50.9637 2.205 1.427 .168

Altitude*Latitude -.0006 .0004 -2.953 -1.358 .188

Altitude*Rainshadow .0067 .0233 .572 .289 .776

Latitude*Rainshadow -2.4465 1.3797 -2.761 -1.773 .090

Alt*Lat*Raindshadow -.0002 .0006 -.746 -.376 .711

a. Dependent Variable: Precipitation (in)

Sometimes, the effect of interest is a linear combination of parameters.

There are two binary variables: Sex and Continent. Suppose they are coded as indicator variables as

follows:

Continent: 0 = NA, 1 =EU

This model implies the following relationships between Wing size and Latitude:

Chap. 10, page 4

Female, NA: µ (Wing Latitude, Sex = 0, Continent = 0 ) = β 0 + β1Latitude

Female, EU: µ (Wing Latitude, Sex = 0, Continent = 1) = β 0 + β1Latitude + β 3

Male, NA: µ (Wing Latitude, Sex = 1, Continent = 0 ) = β 0 + β1Latitude + β 2

Male, EU: µ (Wing Latitude, Sex = 1, Continent = 1) = β 0 + β1Latitude + β 2 + β 3 + β 4

• The slope coefficients are identical for all four groups since there are no interactions with

Latitude.

• The intercepts are different and the differences represent the vertical distances between the

parallel lines relating Wing size to Latitude.

• β 3 represents the difference between mean Wing size for females in NA and EU; a test of

H 0 : β 3 = 0 and a confidence interval for β 3 can be obtained directly from the regression

output.

• The difference between mean Wing size for males in NA and EU is β 3 + β 4 . An estimate of

this difference is βˆ + βˆ ; however, the SE and a confidence interval cannot be easily obtained

3 4

from the regression output. SE( βˆ3 + βˆ 4 ) depends on the SE’s of βˆ3 and β̂ 4 individually, but

also on the covariance of βˆ3 and βˆ 4 . Although you can obtain the needed covariance from the

SPSS regression output to calculate SE( βˆ + βˆ ), it is easier to simply reparameterize the

3 4

model to obtain this directly from the regression output.

• Reparameterization: reverse the coding on Sex: let 0 be male and 1 be female. The “Male” and

“Female” labels are then switched in the above set of equations and β 3 in this new model

represents the difference in mean wing size for males in NA and EU; i.e., it is the same as

β 3 + β 4 in the old model. The SE of the estimated difference can be obtained directly from the

regression output.

• Reparameterizing changes the interpretation of individual parameters but it doesn’t change the

model.

The estimated mean of Y at any combination of X’s is obtained by plugging in these values into the

estimated regression equation. The standard error of the mean response can be obtained in SPSS by

including an extra case in the data file which has the desired X’s but a missing value for Y. Then, as

with simple linear regression, on the regression dialog box, choose Save…SE of mean predictions for

the SE of the mean, and choose Prediction Intervals Mean for confidence intervals for the mean

response and Prediction Intervals…Individual for prediction intervals for an individual response.

These are individual confidence intervals and prediction intervals, not simultaneous.

Example 1: Rainfall data. Here are some results when the additive model was fit.

µ (Precip Latitude, Altitude, Rainshadow ) = β 0 + β1Latitude + β 2 Altitude + β 3Rainshadow

Chap. 10, page 5

µˆ (Precip Latitude, Altitude, Rainshadow ) = −97.557 + 3.428 * Latitude + 0.00115 * Altitude − 19.688 * Rainshadow

The predicted values, standard error of the mean (SEP), 95% confidence interval for the mean (LMCI,

UMCI) and 95% prediction interval (LICI, UICI) are shown for cases 26-30 plus two new sets of X

values. These confidence intervals are valid only if the assumptions of the regression model are

satisfied; we have not checked these assumptions yet.

Case Precip Altitude Latitude Shadow Pred SEP LMCI UMCI LICI UICI

26 9.94 19 32.7 0 14.574 3.846 6.669 22.479 -6.151 35.299

27 4.25 2105 34.1 1 2.047 3.184 -4.499 8.593 -18.198 22.292

28 1.66 -178 36.5 1 7.687 2.565 2.415 12.959 -12.183 27.557

29 74.87 35 41.7 0 45.448 4.460 36.281 54.615 24.210 66.686

30 15.95 60 39.2 1 17.217 2.989 11.072 23.362 -2.902 37.336

. 1000 35.0 0 23.586 2.892 17.640 29.531 3.527 43.645

. 3000 40.0 1 23.337 3.126 16.911 29.763 3.130 43.544

According to this model, the estimated mean annual precipitation for locations at 3000 feet and 40

degrees latitude which are in the rain shadow is 23.34 inches (95% confidence interval 16.9 to 29.8

inches). A 95% prediction interval for the annual precipitation at an individual location like this is

3.13 to 43.5 inches.

We sometimes want to test a hypothesis about a set of parameters in a regression model. Recall that

we did this in an ANOVA model where the overall F test tested H 0 : µ1 = µ 2 = … = µ I and where an

extra sum of squares F test was used to compare two models. This test is valid only if the assumptions

of the regression model (normality, constant variance, independence) are satisfied.

Suppose we fit the model regressing number of Flowers on Timing (binary variable; early or late) and

Light Intensity where Light Intensity is treated as a factor with 6 levels. Thus there is an indicator

variable for Timing called early (1 for early, 0 for late) and 5 indicator variables for Intensity, called

L300, L450, L600, L750, L900 with 150 treated as the reference level. There are no interactions so the

model is:

A shorthand way of describing the model (see Section 9.3.5, p. 249) is:

Suppose we want to test the hypothesis that there is no effect of light intensity given that the Timing

variable is in the model.. What hypothesis about the regression parameters do we want to test?

Chap. 10, page 6

To test this hypothesis, we fit a full model with early and all the indicator variables for LIGHT in the

model. Then we fit a reduced model with just early in the model and carry out an extra sum-of-squares

F-test just as we did in Chapter 5.

ANOVAb

Sum of

Model Squares df Mean Square F Sig.

1 Regression 3570.464 6 595.077 13.181 .000a

Residual 767.472 17 45.145

Total 4337.936 23

a. Predictors: (Constant), Early, L900, L750, L600, L450, L300

b. Dependent Variable: Flowers

Coefficientsa

Unstandardized Standardized

Coefficients Coefficients

Model B Std. Error Beta t Sig.

1 (Constant) 67.196 3.629 18.518 .000

L300 -9.125 4.751 -.253 -1.921 .072

L450 -13.375 4.751 -.371 -2.815 .012

L600 -23.225 4.751 -.644 -4.888 .000

L750 -27.750 4.751 -.769 -5.841 .000

L900 -29.350 4.751 -.814 -6.178 .000

Early 12.158 2.743 .452 4.432 .000

a. Dependent Variable: Flowers

ANOVAb

Sum of

Model Squares df Mean Square F Sig.

1 Regression 886.950 1 886.950 5.654 .027a

Residual 3450.986 22 156.863

Total 4337.936 23

a. Predictors: (Constant), Early

b. Dependent Variable: Flowers

Coefficientsa

Unstandardized Standardized

Coefficients Coefficients

Model B Std. Error Beta t Sig.

1 (Constant) 50.058 3.616 13.845 .000

Early 12.158 5.113 .452 2.378 .027

a. Dependent Variable: Flowers

Carry out the F-test (the coefficients above are not necessary for this test, only the ANOVA table).

